Get your own customer support community
 

How do I make my crawl faster?

I can only crawl at a max of 10 pages per second. My average is about 7 or 8 (in the beginning, too!)

Currently our structure is to have each crawler maintain its own counts until I ask for them to save and aggregate. Our parsing is done in each individual crawler thread and the 10 largest lipogram/palindrome/rhopalics are kept in each crawler until I ask for them.

This aggregation is performed every 2 minutes. Our parsing is done inside the Crawler class. Will it be faster to put it in another thread somehow? What other tips do you have for making it faster (besides checking over the algorithms we use to parse)

EDIT:
Oh god, it's gotten much worse after just 30 minutes of crawl. Now the pages fetched per second are just returning binary... 0's and 1's...
 
sad I’m inefficient
Inappropriate?
1 person has this question

User_default_medium