Java out of memory error
Im trying to create the postings.data from the 2.5 GB results, and after about 2 hours or so of running time i get the out of memory: java heap space error.
I changed the Eclipse config file to use 3.25GB of ram, with these operators:
-Xms40m (minimum)
-Xmx3300m ((maximum)
Something im doing wrong?
I changed the Eclipse config file to use 3.25GB of ram, with these operators:
-Xms40m (minimum)
-Xmx3300m ((maximum)
Something im doing wrong?
1
person has this question
I have this question, too!
Tell me when someone answers.
The more people who ask this question, the more it gets noticed.
The more people who ask this question, the more it gets noticed.
Create a customer community for your own organization
Plans starting at $19/month
-
Inappropriate?I am not sure why it will have this problem even you have already setup to use 3.25 GB of the ram, but I think the conversion process shouldn't need to consume this much amount of the memory since you actually can write out the result right after you process each term, so you don't need any data structure to store the processed data.
Yasser's example in the discussion 7 slide really helps a lots:
http://www.ics.uci.edu/~djp3/classes/... -
Inappropriate?art is right, don't try and store the file in memory.
As you read in a chunk, write out a chunk.
Time should be the problem, not memory
-
Inappropriate?but in terms of your question, make sure you are giving those resources to your program, not to Eclipse. Giving the resources to Eclipse doesn't give them to a program that you run *in* Eclipse. To give resources to a program, put the command line parameters in the project settings for your program. Then Eclipse will give those to the Java Virtual Machine when it runs your program.
-
Inappropriate?Alright i have reserved 1.25GB of ram in the JVM, will report back
-
Inappropriate?still not working. I am not writing it to memory (i think), i am basically doing this:
buffered reader read in file,
while (buffered_reader_file readline() != null)
{
string newstring = buffered_reader_file.readLine()
clean/split and add to random_access_file
}
help plox?
I’m confused
-
Inappropriate?try this one....
string newstring = "";
int totalTerms = 0;
while ((newstring = buffered_reader_file.readLine()) != null)
{
// do whatever you want here
totalTerms++;
if (totalTerms % 10000 == 0)
{
System.out.println("Processing count: " + totalTerms + "...");
}
} -
Inappropriate?trying that now.
Thanks a lot Art. -
Inappropriate?You might not want to read the whole line. If the term has more than 50000 pages you just want to skip it. So instead of saving those gigantic lines in memory you can skip them.
So first read till you find the document frequency and if that is lower than 50000 save the line in a String.
Also if you find one with more than 50000 pages use
buffered_reader_file.skip(20*df) (The 20 is something like the minimum number of bytes per docId=termfrecuency pair)
so you skip a lot of bytes and then read the rest of the line. Saves tons of time on those long lines! -
Inappropriate?hmm i thought we are only skipping in the cosine scoring part.
so basically if the term appears more than 50k times, i shouldnt even include it in the posting.data?
I’m unsure
-
Inappropriate?you can exclude those either in the process of generating posting.data or in the cosine scoring part.
-
Inappropriate?Great!
i think its finally running correctly now.
did you guys get any null pointer errors?
thanks for the help guys!
-
Inappropriate?No need to include it in the binary file b/c you won't use it.
That will reduce the size of the binary file.
When you go to look up the query term in the posting list, you will find out the term doesn't exist and skip to the next query term
I’m full
Loading Profile...



EMPLOYEE