Recent activity
Subscribe to this feed
infection0 shared an idea in LUCI on March 16, 2009 07:31:
This class was awesome.It felt soooo good when I executed my first (working) query to my search engine. I'll be RAR'ing my files up so that I can look back on this. It's the coolest program I've ever written...
infection0 replied on March 16, 2009 04:46 to the idea "I made a diagram." in LUCI:
infection0 replied on March 16, 2009 03:31 to the idea "I made a diagram." in LUCI:
mmk here is how I thought things were going to go:
For each term in the query...
CALCULATE: IDF of the term.
for each <docid>=<tf> in the postings list...
CALCULATE: TF-IDF of the term/document.
SUM: Score(docid) += TF-IDF
So Score(docid) would get added to once for each term in the query, and at the end the docid with the highest score would be at the top?
Is that right? Or am I missing this whole big thing with vectors?
Oh and how do I calculate a query vector and document vector?
"The score of a document d is the
sum, over all query terms, of the number of times each of the query terms
occurs in d. We can refine this idea so that we add up not the number of
occurrences of each query term t in d, but instead the tf–idf weight of each
term in d."</tf></docid>
infection0 replied on March 16, 2009 03:14 to the idea "I made a diagram." in LUCI:
infection0 shared an idea in LUCI on March 16, 2009 01:47:
infection0 asked a question in LUCI on March 15, 2009 22:07:
RandomAccessFile: readUTF() and writeUTF()Anybody notice the "readUTF" and "writeUTF" methods in the RandomAccessFile?
I used them and they were a godsend. I was able to reconstruct my posting list from them so they look to be fine. Just something for people who were about to do the whole byte conversion thing.
infection0 replied on March 15, 2009 11:46 to the question "Making the Binary File" in LUCI:
infection0 replied on March 15, 2009 09:11 to the question "Making the Binary File" in LUCI:
infection0 asked a question in LUCI on March 15, 2009 04:23:
Confused about how to write binary fileI'm working on it alone and I just started on it tonight... I am confused about how to write the binary file. I thought the binary file was supposed to be a representation of the posting list, but Yasser's code on the slides refers to writing URLs. What do I need that code for?
Also, am I correct in assuming that I have to write my own code that models off Yasser's code in order to write the posting list (as in Yasser's code won't work for the posting list, it's just an example of how to write?)?
The document table is the list of URLs and DocIDs provided on the assignment page, right? Am I required to encode that into a random access file as well, for a total of 4 files on disk?
Ooh one more thing: Should I parse the posting list
(000000000 , 12 , 5 : {203645=1, 231518=4, 270201=4, 476437=2, 592873=1})
and write it into individual elements (the CF, DF, docID, tf, docID, tf, docID, tf....)
or should I write the entire string and parse it later?
infection0 replied on March 04, 2009 08:25 to the question "Space full???" in LUCI:
infection0 asked a question in LUCI on March 04, 2009 07:44:
Tail: Did you guys get crap too?I get non-ASCII characters. Should I submit only 1000 lines of the last ascii results or do I just submit the crap?
------
ð¦ , 1 , 1 : 20826=1
ð§ , 1 , 1 : 20826=1
ð ̈ , 1 , 1 : 20826=1
ð© , 1 , 1 : 20826=1
ða , 1 , 1 : 20826=1
ð« , 1 , 1 : 20826=1
ð¬ , 1 , 1 : 20826=1
ð , 1 , 1 : 20826=1
ð® , 1 , 1 : 20826=1
ð ̄ , 1 , 1 : 20826=1
ð° , 1 , 1 : 20826=1
ð± , 1 , 1 : 20826=1
ð€ , 1 , 1 : 22374=1
ð , 1 , 1 : 22374=1
ð‚ , 1 , 1 : 22374=1
ðƒ , 1 , 1 : 22374=1
infection0 replied on March 04, 2009 07:30 to the question "Space full???" in LUCI:
infection0 replied on March 04, 2009 07:12 to the question "Space Full:Cant write to disk. grrr...." in LUCI:
I'm sorry, I can't resist...
http://lmgtfy.com/?q=space+full+luci
All in good fun! It's amazing that Google indexed my question already! It's only been 3 hours!
But seriously, log onto a simpsons.ics.uci.edu server, that fixed it for me.
infection0 replied on March 04, 2009 06:31 to the question "Space full???" in LUCI:
infection0 replied on March 04, 2009 05:51 to the question "Space full???" in LUCI:
infection0 replied on March 04, 2009 05:47 to the question "Space full???" in LUCI:
infection0 replied on March 04, 2009 05:42 to the question "Space full???" in LUCI:
infection0 replied on March 04, 2009 04:59 to the question "Space full???" in LUCI:
Ah I logged onto a simpsons.ics.uci.edu server and it merged it correctly.
HINTS: I had to delete the _Logs folder and I had to execute the sort from one folder up from the folders my files were stored in.
So my output folder would be like:
>/extra/ugrad_space/leecf/hadoop/output
I delete the _Logs directory, then i type: "up" into console.
>/extra/ugrad_space/leecf/hadoop/output
>up
>/extra/ugrad_space/leecf/hadoop/
>sort -m output/* > unity.txt
infection0 asked a question in LUCI on March 04, 2009 04:39:
Space full???$ sort -m output500000/* > unity.txt
sort: write failed: /tmp/sortqDVsRW: No space left on device
The files are in /extra/ugrad_space/. I asked it to do that and that was my error.
What am I doing wrong?? Or is it really out of space?
infection0 replied on March 03, 2009 21:31 to the question "Jobs restarted?" in LUCI:
| next » « previous |
Loading Profile...


