Confused about how to write binary file
I'm working on it alone and I just started on it tonight... I am confused about how to write the binary file. I thought the binary file was supposed to be a representation of the posting list, but Yasser's code on the slides refers to writing URLs. What do I need that code for?
Also, am I correct in assuming that I have to write my own code that models off Yasser's code in order to write the posting list (as in Yasser's code won't work for the posting list, it's just an example of how to write?)?
The document table is the list of URLs and DocIDs provided on the assignment page, right? Am I required to encode that into a random access file as well, for a total of 4 files on disk?
Ooh one more thing: Should I parse the posting list
(000000000 , 12 , 5 : {203645=1, 231518=4, 270201=4, 476437=2, 592873=1})
and write it into individual elements (the CF, DF, docID, tf, docID, tf, docID, tf....)
or should I write the entire string and parse it later?
Also, am I correct in assuming that I have to write my own code that models off Yasser's code in order to write the posting list (as in Yasser's code won't work for the posting list, it's just an example of how to write?)?
The document table is the list of URLs and DocIDs provided on the assignment page, right? Am I required to encode that into a random access file as well, for a total of 4 files on disk?
Ooh one more thing: Should I parse the posting list
(000000000 , 12 , 5 : {203645=1, 231518=4, 270201=4, 476437=2, 592873=1})
and write it into individual elements (the CF, DF, docID, tf, docID, tf, docID, tf....)
or should I write the entire string and parse it later?
1
person has this question
I have this question, too!
Tell me when someone answers.
The more people who ask this question, the more it gets noticed.
The more people who ask this question, the more it gets noticed.
Create a customer community for your own organization
Plans starting at $19/month
-
Inappropriate?the binary file is for the binary representation of the posting list.
In addition to that you need an offset table which maps terms to offsets into that file, a mapping from docID to url,
You shouldn't be writing any "strings" to your binary file everything should be written as "bytes"
Loading Profile...



EMPLOYEE