Get your own customer support community

Recent activity

Subscribe to this feed
  • question

    Camdawg asked a question in LUCI on March 08, 2008 01:22:

    Camdawg
    Multi or single word queries for Assignment 7?
    For assignment 7, are we going to be asked multi-word queries, or are the queries going to just be a single term? If the query is going to be multi word, it is going to take forever to load up each serialized file.
  • problem

    Camdawg replied on March 01, 2008 18:37 to the problem "Problems with the provided Full Data Set." in LUCI:

    Camdawg
    It's not totally unreasonable, but there are enough errors in each file for it to be a pain. I've already started my own crawl, so I won't need the sample anymore, but I don't know if there is anyone else that is using the sample data set that will need this fixed. Apparently not.
  • problem

    Camdawg reported a problem in LUCI on March 01, 2008 00:54:

    Camdawg
    Problems with the provided Full Data Set.
    I'm using the full data set provided by Prof. Patterson to test my parser, but there seems to be a lot of malformed lines. For example, in the first text file for terms (term_doc_pairs_0001.txt), there is this line:

    glouceste##NEWURL:http://en.wikipedia.org/wiki/Hawaii_%28island%29

    Where you can see that the crawler was trying to put gloucester:*count* and another page URL, but got cut off. Should I just fix the errors as they pop up and continue, or what would be the best course of action?