Get your own customer support community
 

Questions about assignment 3

Hi, I have some questions regarding the third assignment. Maybe they will be covered next Monday during discussion, but I'd like to spend the weekend doing this assignment. My questions are the following:

- We should crawl the pages which match the regular expression but, should we also check if they are inside wikipedia? I mean, if the URL starts with http://en.wikipedia.org.
- Does it exist any method in the libraries to get the size of the downloaded content from a site? And for the text? My idea is just count a byte for each character in the text (as well as in the HTML).
- How can we restore the process if the crawling fails?
- Do we have to save all the doc ids? I know we just need the one of the article about Obama, but you may want more for the forth assignment.
- Do we have to change something in the way we measure the length of palindromes, lipograms and rhopalics?
- You said we are receiving a grade according to the longest known sequence, found by whom? By you or by our peers? If it's your sequence, could you give us those lengths for us to know when we may stop crawling?

So far, I think those are all my questions. Thanks in advance.
 
happy I’m thankful
Inappropriate?
1 person has this question

User_default_medium