Few problems
I finally completed my first map/reduce of 10... phew. I have a few problems that I need to correct before moving on:
I get this error at the beginning of MAP:
09/03/01 22:31:43 INFO mapred.JobClient: Task Id : attempt_200903011033_0082_m_000000_0, Status : FAILED
java.lang.NoClassDefFoundError: edu/uci/ics/crawler4j/crawler/HTMLParser
at ir.assignment05.Downloader.<init>(Downloader.java:11)
at ir.assignment05.PostingList$Map.map(PostingList.java:80)
at ir.assignment05.PostingList$Map.map(PostingList.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
Caused by: java.lang.ClassNotFoundException: edu.uci.ics.crawler4j.crawler.HTMLParser
but, the funny thing is that MAP finishes and so does reduce. Everything looks fine but that exception is scary.
Second, I output some of my statistics (URLs containing Member 1's name, etc) to a file like this:
String pathString = "./" + args[1];
String saveFile = pathString + "/report.txt";
PrintStream writer;
writer = new PrintStream(saveFile);
This works on pseudo-distributed mode. But when I do it on the cluster, there is nothing outputted in the output directory. How do I fix this?</init>
I get this error at the beginning of MAP:
09/03/01 22:31:43 INFO mapred.JobClient: Task Id : attempt_200903011033_0082_m_000000_0, Status : FAILED
java.lang.NoClassDefFoundError: edu/uci/ics/crawler4j/crawler/HTMLParser
at ir.assignment05.Downloader.<init>(Downloader.java:11)
at ir.assignment05.PostingList$Map.map(PostingList.java:80)
at ir.assignment05.PostingList$Map.map(PostingList.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
Caused by: java.lang.ClassNotFoundException: edu.uci.ics.crawler4j.crawler.HTMLParser
but, the funny thing is that MAP finishes and so does reduce. Everything looks fine but that exception is scary.
Second, I output some of my statistics (URLs containing Member 1's name, etc) to a file like this:
String pathString = "./" + args[1];
String saveFile = pathString + "/report.txt";
PrintStream writer;
writer = new PrintStream(saveFile);
This works on pseudo-distributed mode. But when I do it on the cluster, there is nothing outputted in the output directory. How do I fix this?</init>
1
person has this question
I have this question, too!
Tell me when someone answers.
The more people who ask this question, the more it gets noticed.
The more people who ask this question, the more it gets noticed.
Create a customer community for your own organization
Plans starting at $19/month
-
Inappropriate?Oh and also are we supposed to use nohup for this, or is it OK to just use
bin/hadoop jar hadooptest.jar ir.assignment05.PostingList /assignment06/input10 output
And close the terminal?
P.S. I have a static class keeping track of the URLs found that contain our name, and also the most common 10 terms. however, when I run the thing on the DFS, it outputs to my /extra/ugrad_space folder with nothing registered. How do we keep track of this stuff?
I’m stupid
-
Inappropriate?it is ok to use bin/hadoop jar hadooptest.jar ir.assignment05.PostingList /assignment06/input10 output since the process is not actually running on your local terminal.
-
Inappropriate?This is not the right way of doing this. Your job is being executed on the distributed file system and you cannot dump your outputs to the local file system. Use this process instead:
1) Crawl the pages.
2) Create the posting lists.
3) In the generated posting lists find the line which is associated with term which is your name and report the list of the docids in that line.
1 person says
this answers the question
Loading Profile...


