当前位置: 动力学知识库 > 问答 > 编程问答 >

hadoop - Amazon Elastic Cloud Fails to Launch on Subnet

问题描述:

I'm trying to launch an EC2 cluster on our own VPC. I can use the command to launch it w/in AWS just fine, but if I specify our own VPC/Subnet, it fails to launch the cluster (so, we're not talking about the jobs that would run upon it--we're talking about launching a default cluster itself).

Obviously, this must have something to do with the sub and AWS' Hadoop (though it is not the usual "Cannot find route to InternetGateway in main RouteTable" error).

I am unable to determine anything from the logs as to the cause. This happens both on the command-line and using the AWS Web Console.

We are not performing any customization of actions/environment on the cluster.

Here's the subnet detail

Destination Target

10.0.0.0/16 local

0.0.0.0/0 igw-2235d249

10.3.0.0/16 eni-b989b091

Here's the command line used to launch (removing the --subnet will allow the command to succeed, but we need it on this VPC to have access to some particular resources):

elastic-mapreduce --create

--alive

--name "BMVE on Subnet 0BF3BB23"

--instance-type m1.medium

--num-instances 3

--key-pair hadoop

--subnet subnet-0bf3bb23

--visible-to-all-users true

master.log:

2014-03-31 18:24:48,848 INFO i-3e4ce71d: new instance started

2014-03-31 18:24:49,920 INFO i-3e4ce71d: bootstrap action 1 completed

2014-03-31 18:35:40,352 ERROR i-3e4ce71d: failed to start. hadoop JobTracker/NameNode process failed to launch.

1/controller.log:

2014-03-31T18:24:48.849Z INFO Fetching file 's3://elasticmapreduce/bootstrap-actions/configure-hadoop'

2014-03-31T18:24:49.408Z INFO Working dir /mnt/var/lib/bootstrap-actions/1

2014-03-31T18:24:49.408Z INFO Executing /mnt/var/lib/bootstrap-actions/1/configure-hadoop --site-key-value io.file.buffer.size=65536

2014-03-31T18:24:49.917Z INFO Execution ended with ret val 0

2014-03-31T18:24:49.918Z INFO Execution succeeded

1/stderr.log:

1/syslog:

Processing default file /home/hadoop/conf/hadoop-site.xml with overwrite io.file.buffer.size=65536

/home/hadoop/conf/hadoop-site.xml does not exist, assuming empty configuration

'io.file.buffer.size': default does not have key, appending value '65536'

Saved /home/hadoop/conf/hadoop-site.xml with overwrites. Original saved to /home/hadoop/conf/hadoop-site.xml.old

daemons-jobtacker-log (filtered for WARN|ERROR):

2014-03-31 18:25:00,906 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl (main): Source name ugi already exists!

. . .

2014-03-31 18:25:08,059 WARN org.apache.hadoop.hdfs.DFSClient (Thread-18): DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /mnt/var/lib/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1569)

. . .

2014-03-31 18:25:08,059 WARN org.apache.hadoop.hdfs.DFSClient (Thread-18): Error Recovery for block null bad datanode[0] nodes == null

2014-03-31 18:25:08,060 WARN org.apache.hadoop.hdfs.DFSClient (Thread-18): Could not get block locations. Source file "/mnt/var/lib/hadoop/tmp/mapred/system/jobtracker.info" - Aborting...

2014-03-31 18:25:08,060 WARN org.apache.hadoop.mapred.JobTracker (main): Writing to file hdfs://10.0.7.65:9000/mnt/var/lib/hadoop/tmp/mapred/system/jobtracker.info failed!

2014-03-31 18:25:08,060 WARN org.apache.hadoop.mapred.JobTracker (main): FileSystem is not ready yet!

2014-03-31 18:25:08,084 WARN org.apache.hadoop.mapred.JobTracker (main): Failed to initialize recovery manager.

. . .

2014-03-31 18:35:32,239 WARN org.apache.hadoop.hdfs.DFSClient (Thread-125): Error Recovery for block null bad datanode[0] nodes == null

2014-03-31 18:35:32,239 WARN org.apache.hadoop.hdfs.DFSClient (Thread-125): Could not get block locations. Source file "/mnt/var/lib/hadoop/tmp/mapred/system/jobtracker.info" - Aborting...

2014-03-31 18:35:32,239 WARN org.apache.hadoop.mapred.JobTracker (main): Writing to file hdfs://10.0.7.65:9000/mnt/var/lib/hadoop/tmp/mapred/system/jobtracker.info failed!

2014-03-31 18:35:32,239 WARN org.apache.hadoop.mapred.JobTracker (main): FileSystem is not ready yet!

2014-03-31 18:35:32,244 WARN org.apache.hadoop.mapred.JobTracker (main): Failed to initialize recovery manager.

org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /mnt/var/lib/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1569)

daemons-namenode-log (filtered again):

2014-03-31 18:25:07,693 INFO org.apache.hadoop.security.ShellBasedUnixGroupsMapping (IPC Server handler 1 on 9000): add hadoop to shell userGroupsCache

2014-03-31 18:25:08,042 ERROR org.apache.hadoop.security.UserGroupInformation (IPC Server handler 11 on 9000): PriviledgedActionException as:hadoop cause:java.io.IOException: File /mnt/var/lib/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1

2014-03-31 18:25:08,043 INFO org.apache.hadoop.ipc.Server (IPC Server handler 11 on 9000): IPC Server handler 11 on 9000, call addBlock(/mnt/var/lib/hadoop/tmp/mapred/system/jobtracker.info, DFSClient_678715989, null) from 10.0.7.65:36607: error: java.io.IOException: File /mnt/var/lib/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1

java.io.IOException: File /mnt/var/lib/hadoop/tmp/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1

Any assistance would be greatly appreciated.

网友答案:

It works fine for me. In your VPC you can try associating your Route Table to the subnet in your VPC:

网友答案:

It looks like it has to do with the nature of DNS our our corporate VPC--we wound up having to create an additional VPC and then clone the DB resources into that (not sure why--my access to the VPC-admin is restricted, so I'm trusting what the admin said).

The above errors are fairly obtuse, so hopefully that == DNS issue will help others.

Some references:

http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-troubleshoot-error-vpc.html#emr-troubleshoot-error-dhcp

http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-dns.html

Hadoop on VPC requires that the VPC's DHCP options be configured for default EC2 settings, e.g. "Use Amazon DNS servers" and "Register hosts in DNS". Without using the Amazon DNS servers, the Hadoop clusters cannot contact each other and launching a cluster will fail. This is incompatible with our VPC settings which push custom DNS server information via DHCP options.

分享给朋友:
您可能感兴趣的文章:
随机阅读: