Server down?
Since 3 or 4 am today, the server is keep giving a connection refused error so none of the job can be executed normally. Here are one of the examples of the errors:
09/03/01 09:30:45 INFO ipc.Client: Retrying connect to server: palantir.ics.uci.edu/128.195.58.105:1750. Already tried 0 time(s).
09/03/01 09:30:46 INFO ipc.Client: Retrying connect to server: palantir.ics.uci.edu/128.195.58.105:1750. Already tried 1 time(s).
09/03/01 09:30:47 INFO ipc.Client: Retrying connect to server: palantir.ics.uci.edu/128.195.58.105:1750. Already tried 2 time(s).
09/03/01 09:30:48 INFO ipc.Client: Retrying connect to server: palantir.ics.uci.edu/128.195.58.105:1750. Already tried 3 time(s).
09/03/01 09:30:49 INFO ipc.Client: Retrying connect to server: palantir.ics.uci.edu/128.195.58.105:1750. Already tried 4 time(s).
09/03/01 09:30:50 INFO ipc.Client: Retrying connect to server: palantir.ics.uci.edu/128.195.58.105:1750. Already tried 5 time(s).
09/03/01 09:30:51 INFO ipc.Client: Retrying connect to server: palantir.ics.uci.edu/128.195.58.105:1750. Already tried 6 time(s).
09/03/01 09:30:52 INFO ipc.Client: Retrying connect to server: palantir.ics.uci.edu/128.195.58.105:1750. Already tried 7 time(s).
09/03/01 09:30:53 INFO ipc.Client: Retrying connect to server: palantir.ics.uci.edu/128.195.58.105:1750. Already tried 8 time(s).
09/03/01 09:30:54 INFO ipc.Client: Retrying connect to server: palantir.ics.uci.edu/128.195.58.105:1750. Already tried 9 time(s).
java.io.IOException: Call to palantir.ics.uci.edu/128.195.58.105:1750 failed on local exception: Connection refused
09/03/01 09:30:45 INFO ipc.Client: Retrying connect to server: palantir.ics.uci.edu/128.195.58.105:1750. Already tried 0 time(s).
09/03/01 09:30:46 INFO ipc.Client: Retrying connect to server: palantir.ics.uci.edu/128.195.58.105:1750. Already tried 1 time(s).
09/03/01 09:30:47 INFO ipc.Client: Retrying connect to server: palantir.ics.uci.edu/128.195.58.105:1750. Already tried 2 time(s).
09/03/01 09:30:48 INFO ipc.Client: Retrying connect to server: palantir.ics.uci.edu/128.195.58.105:1750. Already tried 3 time(s).
09/03/01 09:30:49 INFO ipc.Client: Retrying connect to server: palantir.ics.uci.edu/128.195.58.105:1750. Already tried 4 time(s).
09/03/01 09:30:50 INFO ipc.Client: Retrying connect to server: palantir.ics.uci.edu/128.195.58.105:1750. Already tried 5 time(s).
09/03/01 09:30:51 INFO ipc.Client: Retrying connect to server: palantir.ics.uci.edu/128.195.58.105:1750. Already tried 6 time(s).
09/03/01 09:30:52 INFO ipc.Client: Retrying connect to server: palantir.ics.uci.edu/128.195.58.105:1750. Already tried 7 time(s).
09/03/01 09:30:53 INFO ipc.Client: Retrying connect to server: palantir.ics.uci.edu/128.195.58.105:1750. Already tried 8 time(s).
09/03/01 09:30:54 INFO ipc.Client: Retrying connect to server: palantir.ics.uci.edu/128.195.58.105:1750. Already tried 9 time(s).
java.io.IOException: Call to palantir.ics.uci.edu/128.195.58.105:1750 failed on local exception: Connection refused
2
people have this question
I have this question, too!
Tell me when someone answers.
The more people who ask this question, the more it gets noticed.
The more people who ask this question, the more it gets noticed.
Create a customer community for your own organization
Plans starting at $19/month
-
Inappropriate?The server is up and running for me. What server are you trying this from?
-
Inappropriate?I tried many servers. here are two of the recent servers I tried:
stewie-griffin
barbara-pewterschmidt
I’m confused
-
Inappropriate?Now the error changed:
09/03/01 10:08:41 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=hongfuh, access=WRITE, inode="":mdempsey:supergroup:rwxr-xr-x -
Inappropriate?I know. I am bringing the whole system back up.
-
Inappropriate?Same happens to me, I cannot connect to the server... I have been very unlucky because in the last days, when my job was about to finish, I always had some problem with the server and it crashed... I hope to be able to complete this, I have been trying for a week!
I’m frustrated
-
Inappropriate?Same here, I began the assignment at the beginning of the week, but now I am still not able to finish job with 500000 urls. I added all the counters and status update into all possible places which might cost sometimes to run, but my jobs still got killed when they were very closed to the end. Anyway, I got my 30000 input finished and processed so at least I have somethings to turn in tomorrow.
I’m frustrated
-
Inappropriate?Now it doesn't find the libraries:
java.io.FileNotFoundException: File does not exist: /lib/commons-codec-1.3.jar
-
Inappropriate?Ok, the libraries should be up now.
-
Inappropriate?I don't think all of the dfs is up.
When I do bin/hadoop dfs -ls, I get:
ls: Cannot access .: No such file or directory.
Also, when I do bin/hadoop dfs -ls /user, nothing shows up.
However, /assignment06 does show up. -
Inappropriate?It looks like you are able to connect now.
-
Inappropriate?Looks like the jar errors are still happening, and mine hangs on reduce...are they related?
Also, I'm getting worried that I won't be able to finish the project on time, with all of the server errors we are having :-(
09/03/01 12:39:19 INFO mapred.JobClient: Task Id : attempt_200903011033_0025_r_000002_0, Status : FAILED
Error initializing attempt_200903011033_0025_r_000002_0:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/jobcache/job_200903011033_0025/jars
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:335)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:753)
at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1636)
at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:102)
at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1602)
I’m worried
-
Inappropriate?That looks more like a local error, try deleting your temp folder
rm -rf /tmp/hadoop-USERNAME
Those errors also don't mean the job is failing. Tasks that fail out will just be run again on another node. If it is failing out on every single task then it could be problematic, but otherwise, it should be ok. -
Inappropriate?I killed that directory, and I still keep getting that error, any more ideas?
I'm pretty sure this is what's stopping my reduce from running... -
Inappropriate?Focus on getting your 30,000 input.
I've been able to run two 500,000 jobs this week without problem, so there is probably something wrong with your understanding of hadoop (at least prior to this weekend)
-
Inappropriate?I finally got the 500,000 job yesterday. Now, I only have to post-process the results ^_^
I’m happy
Loading Profile...






EMPLOYEE