Cleaning up the text and words
Assignment 5-
Do we need to clean up the text/words in any way?
Punctuation kind of gets in the way. It seems out of scope of this assignment to try and eliminate punctuation, but I thought I'd ask anyways.
Examples:
" is "
" is. "
" is, "
" "is "
There are many instances of a word containing punctuation where I would imagine you would want to keep the punctuation in, and creating functionality to sift through only "legitimate" punctuation seems like an entirely different project.
Thanks
Do we need to clean up the text/words in any way?
Punctuation kind of gets in the way. It seems out of scope of this assignment to try and eliminate punctuation, but I thought I'd ask anyways.
Examples:
" is "
" is. "
" is, "
" "is "
There are many instances of a word containing punctuation where I would imagine you would want to keep the punctuation in, and creating functionality to sift through only "legitimate" punctuation seems like an entirely different project.
Thanks
1
person has this question
I have this question, too!
Tell me when someone answers.
The more people who ask this question, the more it gets noticed.
The more people who ask this question, the more it gets noticed.
Create a customer community for your own organization
Plans starting at $19/month
-
Inappropriate?You should only keep the terms. Otherwise you would have terms like "Jordan," and it's posting list would not appear in results of query "Jordan". You can use the regular expression that Don used for this: https://eee.uci.edu/wiki/index.php/IN...
-
Inappropriate?ah, I misinterpreted the \\w thinking it was whitespace. Thanks.
for reference:
\w A word character: [a-zA-Z_0-9]
\W A non-word character: [^\w] -
Inappropriate?Tutorial on regular expressions:
http://java.sun.com/docs/books/tutori...
I’m catching up
-
Inappropriate?What about the numbers? Are they considered as "words"? Because many of my first results are numbers, so I'm going to report a lot of this data in that case.
-
Inappropriate?Yes numbers are also considered as words.
Loading Profile...


EMPLOYEE
EMPLOYEE
