Recent activity
Subscribe to this feed-
Dan Milstein started following the problem "Problem with large number of input files from S3" in Cloudera.
Dan Milstein asked a question in Cloudera on June 30, 2009 18:06:
Why are logs for EC2 defaulting to the small root partition?In the continuing theme of "Using the EC2 scripts for long-running clusters".
With the default set up, all the hadoop logs are going to /var/log/hadoop, which is part of the root partition, which is quite small (rather than /mnt, which is plenty big). As a result, with my cluster, I fill up that partition within about a week or two, bringing the system to a halt.
So:
1- Should /var/log/hadoop be on /mnt (aka /dev/sda2)?
2- If so, can the scripts/images be configured that way by default?
3- Also, what's the best way for me to adjust this?
Also, just to make sure I understand -- the default config is for those job files to be kept forever, correct? I can adjust that in conf/log4j.properties, but by default they are never removed?
Thanks,
-Dan
Dan Milstein marked one of Tom's replies in Cloudera as useful. Tom replied to the question "Can the Cloudera EC2 distro be used for long-running clusters?".
Dan Milstein asked a question in Cloudera on June 26, 2009 20:17:
Can the Cloudera EC2 distro be used for long-running clusters?I launched a cluster up on EC2 about 6-8 weeks ago, using the Cloudera EC2 scripts. Lately, I've been wondering if Cloudera really means those scripts to be used for long-running clusters.
Here are the issues:
- The daemons are started via the service mechanism, but pids are all written to /tmp
When I went to use service to restart the various daemons, the pids had disappeared from /tmp (which is, y'know, temporary).
- Old versions of the RPM's don't seem to be kept around, which makes adding a node impossible
Or, should I say: are the old versions of the RPM's somewhere? All I could find were the most recent or 2nd most recent revs, and, if I deployed a new node, it was a different 'hadoop version', and thus was unable to join the cluster.
I had to upgrade my entire cluster, which was... okay, but I don't want to be forced to do that any time I have to take out a node and add a new one.
Side, somewhat disturbing note: for some reason (I have no idea why), calling 'hadoop version' on the master and existing slaves returned all "Unknown" for the version (this was for the -7 release, and I don't think it was always doing so).
In an ideal world, the scripts would store the precise version info with the cluster name on the machine I trigger the launch from. That way, I could launch new nodes, and they could just join the cluster.
Dan Milstein marked one of Tom's replies in Cloudera as useful. Tom replied to the question "Where are slaves specified for the EC2 version of the distribution?".
Dan Milstein asked a question in Cloudera on May 13, 2009 17:08:
Where are slaves specified for the EC2 version of the distribution?I'm running the EC2 version of the distribution (installed with the cloudera scripts, which I like very much). However, many of the basic hadoop cluster scripts (to run on the master) depend on the slaves file (in {HADOOP_CONF_DIR}/slaves, in general).
But the slaves file there (/etc/hadoop/conf/slaves) just has one line: 'localhost'.
Should I not be using those scripts (e.g. start-all/stop-all)? Do I need to manually set up the slaves file myself?
A comment on the question "Is it possible to build the Cloudera distribution on/for Mac OS X?" in Cloudera:
Thanks. Tarballs would be fine -- haven't played with source RPM's, so didn't know to look there. Can do that in the short term, and good to know that the links will be up soon. – Dan Milstein, on May 05, 2009 03:16
Dan Milstein marked one of Alex Loddengaard's replies in Cloudera as useful. Alex Loddengaard replied to the question "Is it possible to build the Cloudera distribution on/for Mac OS X?".
Dan Milstein asked a question in Cloudera on May 04, 2009 18:23:
Is it possible to build the Cloudera distribution on/for Mac OS X?I'll be deploying on Linux/EC2, but develop on OS X, and would like to use the identical distribution in both environments.
Loading Profile...
