Waiting for Jobtracker to Stop Causes Script to Hang...
I'm having trouble using the EC2 scripts... the only modification I made was to tell the main hadoop-ec2 script to use python 2.6 instead of 2.5... the script successfully launches the instances (master+slaves) but hangs when it gets to "Waiting for jobtracker to stop" .... any ideas?
1
person has this problem
I have this problem, too!
Tell me when someone solves it.
The more people who report this problem, the more it gets noticed.
The more people who report this problem, the more it gets noticed.
-
Inappropriate?Hi Zaki,
We haven't tested the scripts using Python 2.6, but they should work ok in theory. If the JobTracker is failing to start the first place to look is /var/log/messages on the cluster master. This log contains a history of the things the cluster master did while starting up, including launching the JobTracker. If the messages log indicates that the JobTracker was started, the next place to look is the JobTracker log files located in /var/log/hadoop/
Let us know what you find. Hope this helps. -
Inappropriate?Hey Carl,
Thanks for the reply. So I found some other problems (tried this on 3 different machines py2.5 and 2.6). After turning the debugging on, I saw that the options from ec2-clusters.cfg are not being passed for some reason (I also know this bc it always fails to launch a cluster unless an ami is specified on the command line). Clearly, something is wrong with ConfigParser or somewhere else along the line. Your help here would reallly be appreciated.
I’m confused
-
Inappropriate?Hi Zaki,
I think the first step is to double check that the ec2-cluster.cfg file is in the right location.
What happens when you execute the following command?
% cat $HOME/.hadoop-ec2/ec2-clusters.cfg
If you see the contents of your config file, then everything should be good.
Sounds like you already enabled debug messages by setting HADOOP_EC2_LOGGING_LEVEL=DEBUG. What log messages are produced when you run the script with logging enabled?
Here's what I see when launching a cluster with the log level set to DEBUG:
% hadoop-ec2 launch-cluster carl-cluster 4
hadoop-ec2 launch-cluster carl-cluster 4
DEBUG:root:Read 1 configuration files: /Users/carl/.hadoop-ec2/ec2-clusters.cfg
....
1 person says
this solves the problem
-
Inappropriate?Yes, ec2-clusters.cfg is in the right location (.hadoop-ec2 in the home dir) and I just copied the basic config from the cloudera site (with the path to my EC2 key)
The entire debug output is obviously wayyy too long to cover here, but I think I've identified the main problem. For whatever reason, the options specified in the ec2-clusters.cfg file are not getting read by the hadoop-ec2 script (in the debug output theres an options={} line). I confirmed this as the script always fails to run unless an ami flag is specified on the command line (--ami = ami-id).
I thought this might be a Py 2.5/2.6 issue, but I got the same output using Py2.5. I was able to duplicate this behavior on a remote unix box as well as my local machine running os x 10.6 and on another mac as well. Not sure if it has to do with the hadoop-ec2 impl of the config parsing in Python (I'm not familiar with the Options/Config Parser modules) but maybe I'm doing something wrong as it doesn't look like anyone else has reported this problem.
I’m frustrated
-
Inappropriate?Hi Zaki,
You mentioned that with HADOOP_EC2_LOGGING_LEVEL=DEBUG you see an "options={}" line in the log output. Do you see a line immediately before it saying something like "Read x configuration files: /xxxxxx"? If not, I think you are using an older version of the hadoop-ec2 scripts and should download a new copy from here:
http://cloudera-packages.s3.amazonaws...
The ConfigParser class that we use to parse the configuration file is part of the Python standard library. We call it from hadoop-ec2 in the function parse_options. If you know how to use the python debugger you might want to try stepping through this function to see where it's failing, or you could trying adding some print statements.
Another thing worth trying is to run the hadoop-ec2 script from the directory that contains your ec2-clusters.cfg file. The ConfigParser is run so that it first looks in the current directory for ec2-clusters.cfg before looking in $HOME/.hadoop-ec2/ -
Inappropriate?Yes, I'm using the correct version of the hadoop-ec2 scripts... and I do see the line before saying it's reading the config files... however the options remains empty...
heres the ec2-clusters.cfg file (i have a different one that i was using before with my own custom settings, this is just the one from the cloudera site)
[my-hadoop-cluster]
ami=ami-6159bf08
instance_type=c1.medium
key_name=kikin
availability_zone=us-east-1c
private_key=/Users/wenjiexu/Downloads/kikin.pem
ssh_options=-i %(private_key)s -o StrictHostKeyChecking=no
I'm launching it with the following command:
% hadoop-ec2 launch-cluster [my-hadoop-cluster] 3
Ive pasted the debug output here....
http://pastebin.com/m36a205b0
Not sure what else to do other than try and debug/refactor the config options reader or just hardcode my settings (which seems really inefficient) -
Inappropriate?Hey Carl/everyone else...
Any ideas? Still stuck... -
Inappropriate?I fixed the problem with the launch-cluster not reading the config (stupidly left on the brackets in the cluster-name) ... now its back to just hanging when it gets to the waiting for jobtracker to start.... from what i can tell the jobtracker is running on the master so i guess theres something else going on here...?
-
Inappropriate?Hi Zaki,
Try checking the jobtracker daemon logs located in /var/log/hadoop/ on the cluster master. If the jobtracker is failing to start you will likely be able to determine the cause by looking at these logs. -
Inappropriate?Yea, the jobtracker seems to be starting fine and I can access it by ssh-ing into the master and going to localhost:50030. however when I try to access it form http://publicDNS:50030 on my local machine, the server doesn't respond and I suspect this is where the launch-cluster command is hanging... (i see the waiting for jobtracker to start ...... keep going forever) ... i checked the security group settings and 50030 is accessible
-
Inappropriate?resolved finally i think.... still getting some strange behavior at times
http://getsatisfaction.com/cloudera/t...
basically just added client_cidr option and so the urllib2 request is finally able to connect to the jobtracker once its running...
still having some weird issue on my own laptop as it is not reading in the config file.... (ive set it to have all permissions its in the right $HOME/.hadoop-ec2 dir and doesnt work even if i run hadoop-ec2 from this dir)
Loading Profile...


