Recent activity
Subscribe to this feed
infection0 replied on March 02, 2009 07:22 to the question "Question for URLs" in LUCI:
infection0 replied on March 02, 2009 07:13 to the question "Few problems" in LUCI:
Oh and also are we supposed to use nohup for this, or is it OK to just use
bin/hadoop jar hadooptest.jar ir.assignment05.PostingList /assignment06/input10 output
And close the terminal?
P.S. I have a static class keeping track of the URLs found that contain our name, and also the most common 10 terms. however, when I run the thing on the DFS, it outputs to my /extra/ugrad_space folder with nothing registered. How do we keep track of this stuff?
infection0 replied on March 02, 2009 07:13 to the question "Question for URLs" in LUCI:
infection0 replied on March 02, 2009 07:03 to the question "Question for URLs" in LUCI:
infection0 asked a question in LUCI on March 02, 2009 06:56:
Few problemsI finally completed my first map/reduce of 10... phew. I have a few problems that I need to correct before moving on:
I get this error at the beginning of MAP:
09/03/01 22:31:43 INFO mapred.JobClient: Task Id : attempt_200903011033_0082_m_000000_0, Status : FAILED
java.lang.NoClassDefFoundError: edu/uci/ics/crawler4j/crawler/HTMLParser
at ir.assignment05.Downloader.<init>(Downloader.java:11)
at ir.assignment05.PostingList$Map.map(PostingList.java:80)
at ir.assignment05.PostingList$Map.map(PostingList.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child.main(Child.java:155)
Caused by: java.lang.ClassNotFoundException: edu.uci.ics.crawler4j.crawler.HTMLParser
but, the funny thing is that MAP finishes and so does reduce. Everything looks fine but that exception is scary.
Second, I output some of my statistics (URLs containing Member 1's name, etc) to a file like this:
String pathString = "./" + args[1];
String saveFile = pathString + "/report.txt";
PrintStream writer;
writer = new PrintStream(saveFile);
This works on pseudo-distributed mode. But when I do it on the cluster, there is nothing outputted in the output directory. How do I fix this?</init>
infection0 replied on March 02, 2009 05:00 to the question "Keeping count..." in LUCI:
infection0 replied on March 01, 2009 21:13 to the question "Extension possible?" in LUCI:
infection0 replied on February 27, 2009 23:19 to the question "Keeping count..." in LUCI:
Map class - Downloads the text, parses it into tokens,
output.collect(token, new Text(docID + ""));
I think I have Map down and I am just messing with Reduce.
I used to filter out duplicate words in the map function so that only the first occurrence of a term in a document is added to the output collector.
We are considering removing that and counting the number of duplicate docIDs in the Reducefunction... but for some reason I am having an ungodly amount of trouble with that. You want us to count the number of occurrences of a term for each docID right? I am trying to tack on that number to the end of the docID before collection, but I think that is screwing everything up when it calls reduce again.
I have no idea why but my output looks like this:
Term (0) (0) [ ]
when without the changes (before I tried to add this stuff) it looks like:
Term [24, 48, ...]
This is how I'm trying to format the final output:
Term (3 occurrences) [24 (2), 48(3), ...]
I am doing this by adding the CF to the Term text, and adding the DF into each docID collected. I think this is a weird way of doing it, but I don't know what else to do.
Here's a small code snippet from Reduce that illustrates the current state of my program:
ArrayList<string> docIDOutputs = new ArrayList<string>();
String docIDPlusCount = docID + " (" + documentTermCount + ")";
docIDOutputs.add(docIDPlusCount);
String outputString = (iterating through the list of docIDs and adding them to the string one by one)
String termPlusCorpusCount = term.toString() + corpusTermCount;
output.collect(new Text(termPlusCorpusCount), new Text(outputString));
I know I'm doing something wrong, I just don't know how to do it right...</string></string>
infection0 replied on February 27, 2009 23:00 to the question "Keeping count..." in LUCI:
infection0 asked a question in LUCI on February 27, 2009 21:11:
Keeping count...How do I keep track of the CF, DF, and number of times each term occurs in a document? I'm not sure where to put it, or if I need a separate class. I am thinking of having a separate class with a hashmap containing keys of terms and counts, but that seems like a huge bottleneck.
infection0 replied on February 21, 2009 01:33 to the question "Barrack Obama?" in LUCI:
Strange, now my crawler fails to fetch half of all the pages... did Wikipedia just go down? I can't load any of its pages from my computer either.
Holy crud, it DID JUST GO DOWN. My phone, my computer, my friends cant load it either! What's next? Google?
Wikimedia Foundation
Error
العربية Bahasa Indonesia Česky Dansk Deutsch Eesti Ελληνικά English Español فارسی Français עברית Italiano 日本語 한국어
Nederlands Norsk (Bokmål) Polski Português Română Русский Српски Suomi Svenska ไทย Tiếng Việt Türkçe 繁體中文 简体中文
English
Our servers are currently experiencing a technical problem. This is probably temporary and should be fixed soon. Please try again in a few minutes.
You may be able to get further information in the #wikipedia channel on the Freenode IRC network.
infection0 replied on February 21, 2009 01:23 to the question "2 Problems" in LUCI:
infection0 asked a question in LUCI on February 21, 2009 01:21:
Barrack Obama?Document ID 2 is misspelled (sp?). It is "Barrack Obama" instead of "Barack Obama" like it should be. My crawler failed to fetch the page. I swallowed the error and kept crawling however. I think others have this problem as well. Can anyone confirm?
infection0 asked a question in LUCI on February 20, 2009 23:41:
2 Problems1) Here is a warning Hadoop gives me:
WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
It continues to work correctly despite this warning, but I can't help but wonder if I will be having problems later because of it.
2) How do we correctly format our output? Is REDUCE run twice? If I try to put brackets around my set of numbers, it always puts it like,
term1 [[1][2]]
term2 [[2]]
term3 [[1]]
when I've only specified ONE bracket.
infection0 replied on February 20, 2009 22:50 to the question "error running jar on openlab" in LUCI:
Anyone solve this one? I'm getting it also.
EDIT:
lol stupid error, I just had the JAR file name wrong. For me, it was:
bin/hadoop jar hadooptest.jar PostingList input output
INSTEAD OF
bin/hadoop jar hadoop-0.19.0-hadooptest.jar PostingList input output
My error was that I thought the hadoop-0.19.0- was part of the directory structure (which it sort of is, that's the name of my folder where hadoop is in).
infection0 replied on February 16, 2009 23:49 to the question "For some reason I can't change Java version on Putty." in LUCI:
I have just emailed helpdesk@ics.uci.edu to wipe my account. Hopefully they can return everything to normal before the assignment deadline. I edited the files per your instructions and it did not help. Thanks anyway, at least I know what the problem is.
infection0 replied on February 16, 2009 21:48 to the question "Can't get the example to run..." in LUCI:
infection0 replied on February 16, 2009 21:45 to the question "Can't get the example to run..." in LUCI:
infection0 replied on February 16, 2009 21:43 to the question "For some reason I can't change Java version on Putty." in LUCI:
This happens across the board:
$ module load nano
ModuleCmd_Load.c(199):ERROR:105: Unable to locate a modulefile for 'nano'
$ nedit changes.txt
NEdit: Can't open display
I suspect there's a problem with my ICS account in general... I'll ask them on Tuesday if they can "reset" my account or anything like that. Right now it's like typing to a vegetable... there's so many things that don't work.
infection0 replied on February 16, 2009 21:08 to the question "For some reason I can't change Java version on Putty." in LUCI:
| next » « previous |
Loading Profile...

