Recent activity
Subscribe to this feed
Carl Steinbach replied on November 11, 2009 18:12 to the problem "Waiting for Jobtracker to Stop Causes Script to Hang..." in Cloudera:
Carl Steinbach replied on November 05, 2009 21:48 to the question "How to read hdfs outside ec2 ?" in Cloudera:
The hadoop-site.xml file in $HOME/.hadoop-ec2/your_cluster/ has a property called hadoop.rpc.socket.factory.class.default which should be automatically set to org.apache.hadoop.net.SockSocketFactory, and another property called hadoop.socks.server that is automatically set to localhost:6666. In combination these two properties tell Hadoop that it should use the SOCKS proxy on your local machine when connecting to the EC2 cluster. The hadoop-ec2 script auto-generates this configuration file for you when you start a cluster. You do not need to modify it, but you do need to tell Hadoop where to find it by setting the HADOOP_CONF_DIR environment variable to point to $HOME/.hadoop-ec2/<your_cluster_name>
You should be able to access your EC2 cluster remotely using the Java API as long as the following points are satisfied.
- HADOOP_HOME points to a local installation of Hadoop with a version matching Hadoop on the EC2 cluster. If you're using Cloudera's AMIs you need to make sure 'hadoop version' on your local machine returns "0.18.3".
- HADOOP_CONF_DIR is set to $HOME/.hadoop-ec2/your_cluster/
Hope this helps.
Carl</your_cluster_name>
Carl Steinbach replied on November 05, 2009 21:03 to the problem "NullPointerException when starting Hadoop service deamons" in Cloudera:
Hi Theo,
The set of files required for configuration changed between Hadoop 0.19 and 0.20. My initial response referred to the pre 0.20 configuration files.
Given that you're working with 0.20, the configuration files you mentioned and the steps you outlined all sound good.
Two things:
- /etc/hadoop-0.20/conf should point to /etc/hadoop-0.20/conf.pseudo. Can you please verify that this is the case.
- Can you post the file contents of /etc/hadoop-0.20/conf.pseudo to pastebin and post the link to this ticket? I'll take a look and let you know if I find anything strange.
Thanks.
Carl
Carl Steinbach replied on November 05, 2009 19:22 to the problem "NullPointerException when starting Hadoop service deamons" in Cloudera:
Carl Steinbach replied on November 02, 2009 23:22 to the problem "Waiting for Jobtracker to Stop Causes Script to Hang..." in Cloudera:
Hi Zaki,
You mentioned that with HADOOP_EC2_LOGGING_LEVEL=DEBUG you see an "options={}" line in the log output. Do you see a line immediately before it saying something like "Read x configuration files: /xxxxxx"? If not, I think you are using an older version of the hadoop-ec2 scripts and should download a new copy from here:
http://cloudera-packages.s3.amazonaws...
The ConfigParser class that we use to parse the configuration file is part of the Python standard library. We call it from hadoop-ec2 in the function parse_options. If you know how to use the python debugger you might want to try stepping through this function to see where it's failing, or you could trying adding some print statements.
Another thing worth trying is to run the hadoop-ec2 script from the directory that contains your ec2-clusters.cfg file. The ConfigParser is run so that it first looks in the current directory for ec2-clusters.cfg before looking in $HOME/.hadoop-ec2/
Carl Steinbach replied on November 01, 2009 19:24 to the question "How to read hdfs outside ec2 ?" in Cloudera:
Carl Steinbach replied on October 31, 2009 22:24 to the question "How to read hdfs outside ec2 ?" in Cloudera:
Hi Jeff,
You will need to use the methods provided by the org.apache.hadoop.fs.FileSystem class. Chapter 3 of Tom White's Hadoop book provides many good examples of reading HDFS files using the Java API. The source code for all of the examples in the book are available here: http://www.hadoopbook.com/htdg-exampl...
Carl Steinbach replied on October 30, 2009 23:45 to the problem "Waiting for Jobtracker to Stop Causes Script to Hang..." in Cloudera:
Hi Zaki,
I think the first step is to double check that the ec2-cluster.cfg file is in the right location.
What happens when you execute the following command?
% cat $HOME/.hadoop-ec2/ec2-clusters.cfg
If you see the contents of your config file, then everything should be good.
Sounds like you already enabled debug messages by setting HADOOP_EC2_LOGGING_LEVEL=DEBUG. What log messages are produced when you run the script with logging enabled?
Here's what I see when launching a cluster with the log level set to DEBUG:
% hadoop-ec2 launch-cluster carl-cluster 4
hadoop-ec2 launch-cluster carl-cluster 4
DEBUG:root:Read 1 configuration files: /Users/carl/.hadoop-ec2/ec2-clusters.cfg
....
Carl Steinbach replied on October 30, 2009 23:02 to the problem "Waiting for Jobtracker to Stop Causes Script to Hang..." in Cloudera:
Hi Zaki,
We haven't tested the scripts using Python 2.6, but they should work ok in theory. If the JobTracker is failing to start the first place to look is /var/log/messages on the cluster master. This log contains a history of the things the cluster master did while starting up, including launching the JobTracker. If the messages log indicates that the JobTracker was started, the next place to look is the JobTracker log files located in /var/log/hadoop/
Let us know what you find. Hope this helps.
Carl Steinbach replied on October 30, 2009 22:24 to the question "How can I run pig script when my client is outside of ec2 ?" in Cloudera:
Hi Jeff,
If you use the SOCKS proxy all requests will be proxied through the cluster master, which has access to the EC2 internal DNS table. When you launch a cluster using the hadoop-ec2 script it automatically creates an appropriate hadoop-site.xml located in .hadoop-ec2/<cluster_name>/hadoop-site.xml
If you set HADOOP_CONF_DIR to point to this directory, and also add this directory to your PIG_CLASSPATH, everything should just work.
% hadoop-ec2 launch-cluster mycluster 4
% hadoop-ec2 proxy mycluster
% export HADOOP_CONF_DIR=~/.hadoop-ec2/mycluster
% export PIG_CLASSPATH=$HADOOP_CONF_DIR:$PIG_CLASSPATH
% pig -x mapreduce
</cluster_name>
Carl Steinbach replied on October 30, 2009 19:35 to the question "The web UI of hdfs is not accessible on ec2 ?" in Cloudera:
Hi Jeff,
The first step is to verify that the Namenode is actually running. The easiest way to do this is to login to your master node and try accessing the namenode UI from there:
[root@ip-10-250-126-178 ~]# curl localhost:50070
<meta />
<html>
<head>
<title>Hadoop Administration</title>
...
If the namenode servlet does not respond you should next check the namenode logs in /var/log/hadoop for error messages.
</head></html>
Carl Steinbach replied on October 30, 2009 18:19 to the question "How to read hdfs outside ec2 ?" in Cloudera:
You need to run the SOCKS proxy on your client machine:
% hadoop-ec2 launch-cluster mycluster 4
% hadoop-ec2 proxy mycluster
% export HADOOP_CONF_DIR=~/.hadoop-ec2/mycluster
% hadoop fs -ls /
% hadoop fs -cat /output.txt
Opening more ports on your ec2 cluster will not fix this problem since the namenode uses internal IP addresses when it refers to the Datanodes, e.g. things that look like ip-10-242-205-155.ec2.internal instead of ec2-174-129-75-73.compute-1.amazonaws.com. You can't access the former from outside of ec2's perimeter, which is why you need to tunnel in using the SOCKS proxy.
Loading Profile...
