Jobs restarted?
Did hadoop die and have all the processes restarted? I went to bed around 2am with 100% mapping done, and 72% reducing done, and woke up to find 99% mapping and only 25% reducing done...
Also, trying to view the job's details for any running job is broken...
Also, trying to view the job's details for any running job is broken...
1
person has this question
I have this question, too!
Tell me when someone answers.
The more people who ask this question, the more it gets noticed.
The more people who ask this question, the more it gets noticed.
Create a customer community for your own organization
Plans starting at $19/month
-
Inappropriate?I can't even access the palantir site for now.
-
Inappropriate?I think the 30k output should be sufficient for an A
-
Inappropriate?We've been trying to run our program all morning but it doesn't get past 0% and it doesn't show our job on the admin page.
I’m sad
-
Inappropriate?re-running again.. now with more reduces!
-
Inappropriate?I have a question, how did you change the number of reduces and what is the effect of doing that?
-
Inappropriate?You can change it in your main class. I think its how many tasks it will be broken into. If you change it, you might be disrupting the load balancing. One dude put 250, which I think will negatively impact all of us.
-
Inappropriate?It is a setting in your jar file: JobConf.setReduce() or something like that (check the Hadoop API)
If you make it more than 26 then get rid of the custom Partioner which is designed to only use 26.
It shouldn't mess up the load balancing that badly, but you get one output file per reducer. I used 100 and was happy with it during testing. Then I wrote the instructions for 26 and didn't see a drastic change in performance. Mitch claims that more reducers improves performance. We haven't had a controlled experiment yet to validate.
-
Inappropriate?I already experienced 3 major job failures due to Server crashes over last four days. Every time when my job is about to finish (reducing more than 85%), server restarted.
I have 30000 ones ready to be turned in, yet I thought I was going to turn in 500000 one without problem ever since I started this assignment early.
Oh well, I hope one that I am currently running finishes without any porblem.
-
Inappropriate?Ki, Sorry that's it's been so stressful. I agree that making a robust system is hard enough without having Hadoop be flaky as well. Unfortunately we had ICS's systems go down this weekend and have to work with the network infrastructure that we've got. Do your best. At least the point about "robust" being a desired characteristic is clear now....
-
Inappropriate?I really think the collection makes huge different. It works for me when I switch to more efficient collection in the reduce class to store the docids. Before I switched the collection, it tooks hours to do the reducing tasks after maps were completed and ended up failed due to whatever the unknown reasons were, but after switching the collection, the reducing tasks can be done in 10 minutes after map tasks in my luckiest run (for 500,000 urls). Of course there were also some other factors changed compare to both jobs, but I believe the collection plays an important role when you need to store and sort around 500,000 or more records for some of the terms in reducing process.
-
Inappropriate?Is there any way we can put more resources into reducing? It seems like these tasks take the longest and are the hang ups...
-
Inappropriate?I am already running 2 tasktrackers on each of the openlab nodes, So I've just about maxed out the task capability.
Loading Profile...



EMPLOYEE


