Making the Binary File
About how long should it take to make the binary file for the 2.8GB posting list? Maybe we're just being inefficient or something, but our program is taking forever.
2
people have this question
I have this question, too!
Tell me when someone answers.
The more people who ask this question, the more it gets noticed.
The more people who ask this question, the more it gets noticed.
Create a customer community for your own organization
Plans starting at $19/month
-
Inappropriate?Always try to make your programs responsive by printing a status message once a while. For example, in a loop that should handle 1 million items, it is a good idea to print the number of processed items after processing each 10K items.
-
Inappropriate?It took about 2-3 hours on my computer to convert the posting list with all the terms (I didn't exclude the common terms from this process), so I think it is just normal :)
-
Inappropriate?Well, we're printing out every 1k terms. At the time this was written, it had processed 1,154,000 terms. That term was "cctexas". I've been running it since 2:00pm yesterday (March 12).
There has to be something I'm doing wrong...
I’m confused
-
Inappropriate?Originally I was using scanner to parse line by line but using buffered reader significantly increased performance, perhaps this is the problem...
I still use scanner to parse individual lines but not the file as a whole...
1 person says
this answers the question
-
Inappropriate?I used the randomAccessFile and it took about 2 hours. And then used scanner for the lines.
-
Inappropriate?Question: How many terms did you guys parse and how large was the output? I got ~ 3,900,000 terms, 1.3gb data, 60mb offset over 1.5 hours. Just wanted to verify that I didnt terminate too early.
-
Inappropriate?I also got ~3.93 million terms. My postings.data file was a little over 2GB, which doesn't make sense to me... We wrote everything as ints. But even then, with all the terms out of there, it probably should be smaller.
I’m confused
-
Inappropriate?My posting list was 2.8GB, I was able to build it in 49 minutes and 30 seconds.
(I used a 4 core Xeon at 3.0ghz) I had to set my heapsize to 4GB. -
Inappropriate?Mines around 674 MB... I used shorts and eliminated terms with more than 50000 and it took around 2-3 hours... I am starting to think it's a bit small!
I’m amused
Loading Profile...



EMPLOYEE
