Cloudera's Distribution for Hadoop EC2 with EBS
When launching a cluster with the beta EBS scripts I'm running into the following issues:
1) Launch script is unable to mount the EBS volume due to the drive not being formatted.
- The snapshot was generated using the described method on the guide (create-formatted-snapshot).
- I manually formated the EBS volumes and was able to get past this point.
2) Scripts use hardcoded devices in the launch script that attempt to mount 2 volumes per node instead of using the JSON configuration.
3) After the nodes are up, the master cannot connect to HDFS. I opened all the correct ports via security groups, still having connection issues.
4) This is more a question than an issue. When tailing /var/log/messages on one of the slave nodes I noticed it was trying to connect to the master via public dns. Doesn't that method end up costing more for bandwidth than using the private dns / ip?
1) Launch script is unable to mount the EBS volume due to the drive not being formatted.
- The snapshot was generated using the described method on the guide (create-formatted-snapshot).
- I manually formated the EBS volumes and was able to get past this point.
2) Scripts use hardcoded devices in the launch script that attempt to mount 2 volumes per node instead of using the JSON configuration.
3) After the nodes are up, the master cannot connect to HDFS. I opened all the correct ports via security groups, still having connection issues.
4) This is more a question than an issue. When tailing /var/log/messages on one of the slave nodes I noticed it was trying to connect to the master via public dns. Doesn't that method end up costing more for bandwidth than using the private dns / ip?
6
people have this problem
I have this problem, too!
Tell me when someone solves it.
The more people who report this problem, the more it gets noticed.
The more people who report this problem, the more it gets noticed.
-
Inappropriate?Hi Ryan,
1. That sounds like a bug. I'll see if I can reproduce it.
2. Can you see where the devices are hardcoded in the launch scripts? They should be read from the JSON file, as you say.
3. Are you creating new security groups? Can you post the command you are trying to run and the error you get please?
4. Public DNS resolves to the local IP addresses within EC2 so it should stay within Amazon and hence not cost anything.
Cheers,
Tom -
Inappropriate?Thanks for following up, Tom.
2) It appears it's hardcoded in hadoop-ec2-init-remote-cloudera-ebs.sh on lines 79-80.
3) The script created the base security groups, and I manually added full group access between them. It could just be that the hdfs daemon isn't launching correctly, I can supply a dump of /var/log/messages if that'd help.
If it'd be helpful, I can also send you all the config files I'm using.
I’m confident
-
Inappropriate?2) Yes, you're right that it's hardcoded in the remote script. I'm updating the scripts so that they dynamically pass the EBS volumes to the remote instance. In the meantime, you can edit the remote script if you have a different number of EBS volumes to attach.
3) It would be useful to see the error that you are getting, so please send the logs from /var/log/messages and /var/log/hadoop.
Cheers,
Tom -
Inappropriate?same issue with #1 looks like for some reason the create-formatted-snapshotscript is unable to mount or format the EBS
Creating volume of size 100 in us-east-1c
Created volume Volume:vol-92fe10fb
Attaching volume to i-01a78468
Running ssh -i ~/ec2/keys/KeyPair2.pem -o StrictHostKeyChecking=no root@ec2-67-202-23-27.compute-1.amazo... 'mkfs.ext3 -F -m 0.5 /dev/sdj'
Warning: Permanently added 'ec2-67-202-23-27.compute-1.amazonaws.com,67.202.23.27' (RSA) to the list of known hosts.
mke2fs 1.41.3 (12-Oct-2008)
mkfs.ext3: No such file or directory while trying to determine filesystem size
-
Inappropriate?Guy,
It looks like the EBS volume hasn't attached to /dev/sdj by the time mkfs is run. I think this is a timing-related problem: sometimes it works, but sometimes it doesn't. The script should really wait until the volume is attached. Until this is patched, you can manually make a snapshot with the command line tools by starting and instance, attaching a newly-created volume, formatting it, then creating a snapshot.
Cheers,
Tom -
Inappropriate?I've managed to get through all the paces of bringing up the cluster and here is a quick hack I've used to get past this formatting problem:
[root@newconversions1 hadoop-ec2]# git show 4d02fa143d70e51309e1d923ab482113c67fc368
commit 4d02fa143d70e51309e1d923ab482113c67fc368
Author: root <root@newconversions1.scribd.com>
Date: Wed Jul 1 14:29:15 2009 -0500
Wait for storage before formatting it
diff --git a/hadoop/ec2/storage.py b/hadoop/ec2/storage.py
index 7dbf5a6..b03d9aa 100644
--- a/hadoop/ec2/storage.py
+++ b/hadoop/ec2/storage.py
@@ -64,7 +64,14 @@ def create_formatted_snapshot(cluster, size, availability_zone, image_id, key_na
print "Attaching volume to %s" % instance.id
volume.attach(instance.id, '/dev/sdj')
- run_command_on_instance(instance, ssh_options, 'mkfs.ext3 -F -m 0.5 /dev/sdj')
+ run_command_on_instance(instance, ssh_options, """
+ while :; do
+ echo 'Waiting for /dev/sdj...';
+ if test -e /dev/sdj; then break; fi;
+ sleep 1;
+ done;
+ mkfs.ext3 -F -m 0.5 /dev/sdj
+ """)
print "Detaching volume"
conn.detach_volume(volume.id, instance.id)
I’m excited
-
Inappropriate?Thanks for the fix Alexey!
I will patch it into the latest version.
Cheers,
Tom
Loading Profile...



EMPLOYEE
