Get your own customer support community
 

Getting web sphynx to work on wikipedia

Web sphynx _does_ work on wikipedia. However some pages are over web sphynx's default max size of 100 kb. In fact, both Irvine California and Bubonic Plague are. In the web sphynx UI click the "advanced" button in the top right, go to the "limits" tab, and increase the limit. In your code use the setDownloadParameters method to increase the page size.

public class WikipediaCrawler extends Crawler {
public WikipediaCrawler(){
this.setDownloadParameters(DownloadParameters.DEFAULT.changeMaxPageSize(1000));
}
}
Inappropriate?
1 person likes this idea

User_default_medium