Recent activity
Subscribe to this feed
Camdawg asked a question in LUCI on March 08, 2008 01:22:
Multi or single word queries for Assignment 7?For assignment 7, are we going to be asked multi-word queries, or are the queries going to just be a single term? If the query is going to be multi word, it is going to take forever to load up each serialized file.
Camdawg replied on March 01, 2008 18:37 to the problem "Problems with the provided Full Data Set." in LUCI:
Camdawg reported a problem in LUCI on March 01, 2008 00:54:
Problems with the provided Full Data Set.I'm using the full data set provided by Prof. Patterson to test my parser, but there seems to be a lot of malformed lines. For example, in the first text file for terms (term_doc_pairs_0001.txt), there is this line:
glouceste##NEWURL:http://en.wikipedia.org/wiki/Hawaii_%28island%29
Where you can see that the crawler was trying to put gloucester:*count* and another page URL, but got cut off. Should I just fix the errors as they pop up and continue, or what would be the best course of action?
Loading Profile...
