Palindromes
The followings are questions of one of your classmates and my answers. Please post all of your questions here. I will not answer non-personal questions in email.
==============================
> I am having some difficulty with Assignment #2 dealing with finding the longest palindrome.
>
> First of all, in Java, how can we detect if a text file consists of a non-ASCII
> character? Upon opening the text file, I was visually able to find the
> non-ASCII characters because they are square-like symbols.
> However, in Java, when using the Scanner class, it does not read in
> any of the non-ASCII characters.
The following snippet shows how you can check the characters in a string for being ASCII or not.
String str = "ویکی Is پدیا";
char[] chars = str.toCharArray();
for (int i = 0; i < chars.length; i++) {
if (chars[i] > 127) {
System.out.println(chars[i] + " is non-ascii");
} else {
System.out.println(chars[i] + " is ascii");
}
}
If you don't know how to read the content of a unicode file in Java you can download my IO.java class from http://www.ics.uci.edu/~yganjisa/TA/I... and use this line:
String content = IO.getFileContent("in.txt", "UTF-8");
> What do you mean by "split the text whenever you see a non-ASCII
> character"? Does this simply mean to remove all non-ASCII characters
> or does this mean that we need to literally split the text. For example:
>
> ABC[NON-ASCII]CBAC[NON-ASCII]DDD
>
> If we split the text when finding non-ascii characters here, then we will be left with 3 different strings: ABC, CBAC, and DDD. When you use the term "split" does this mean that we can only find the longest palindrome from each set of strings? Would this imply that "DDD" is the longest palindrome, or would "ABCCBA" still qualify to be the longest palindrome?
It means that your palindromes should not contain non-ascii characters. In your example DDD would be the longest palindrome. As I suggested in the discussion class, instead of splitting the text, try to find candidate palindromes and then remove from the list those containing non-ascii characters. Then you can report the longest one in your list.
>
> Also, one of the problems that I am having is that it seems as if this problem has a very high level of complexity. For example, in a very simple case:
>
> ZZZZABCDDCBAXXYYZZ
>
> If a text file that has been cleaned and removed of all spaces, punctuation, etc., this means that in order to find the longest palindrome of the text file, we would have to go through all possible words.
>
> This means we would have to check the following words if it is a palindrome:
>
> ZZZZABCDDCBAXXYYZZ
> ZZZZABCDDCBAXXYYZ
> ZZZZABCDDCBAXXYY
> ZZZZABCDDCBAXXY
> ZZZZABCDDCBAXX
> ZZZZABCDDCBAX
> ZZZZABCDDCBA
> ZZZZABCDDCB
> ZZZZABCDD
> ZZZZABCD
> ZZZZABC
> ZZZZAB
> ZZZZA
> ZZZ
> ZZ
> Z
> Then, continue onto the next character:
> ZZZABCDDCBAXXYYZZ
> ZZZABCDDCBAXXYYZ
> ZZZABCDDCBAXXYY
> ZZZABCDDCBAXXY
> ZZZABCDDCBAXX
> ZZZABCDDCBAX
> ZZZABCDDCBA
> ZZZABCDDCB
> ZZZABCDDC
> ZZZABCDD
> ZZZABCD
> ZZZABC
> ZZZAB
> ZZZA
> ZZ
> Z
> Then, continue again---
>
> There seems to be a couple ways to optimize this by: when the current longest palindrome is found, keep track of the character length. This way, we will not have to check strings that are smaller than the current longest palindrome length.
>
> However, in a text file that consists of 10,000 characters, this seems to still take a long time to run. Right now, I have implemented the code, however my machine takes a long time to run everything (over a couple hours). Do you have any suggestions? Thanks in advance.
Yes! As you mentioned the straightforward implementation requires a long time to execute. You should think about it and find a fast solution. If it takes more than a second to find the results it is not a good solution.
==============================
> I am having some difficulty with Assignment #2 dealing with finding the longest palindrome.
>
> First of all, in Java, how can we detect if a text file consists of a non-ASCII
> character? Upon opening the text file, I was visually able to find the
> non-ASCII characters because they are square-like symbols.
> However, in Java, when using the Scanner class, it does not read in
> any of the non-ASCII characters.
The following snippet shows how you can check the characters in a string for being ASCII or not.
String str = "ویکی Is پدیا";
char[] chars = str.toCharArray();
for (int i = 0; i < chars.length; i++) {
if (chars[i] > 127) {
System.out.println(chars[i] + " is non-ascii");
} else {
System.out.println(chars[i] + " is ascii");
}
}
If you don't know how to read the content of a unicode file in Java you can download my IO.java class from http://www.ics.uci.edu/~yganjisa/TA/I... and use this line:
String content = IO.getFileContent("in.txt", "UTF-8");
> What do you mean by "split the text whenever you see a non-ASCII
> character"? Does this simply mean to remove all non-ASCII characters
> or does this mean that we need to literally split the text. For example:
>
> ABC[NON-ASCII]CBAC[NON-ASCII]DDD
>
> If we split the text when finding non-ascii characters here, then we will be left with 3 different strings: ABC, CBAC, and DDD. When you use the term "split" does this mean that we can only find the longest palindrome from each set of strings? Would this imply that "DDD" is the longest palindrome, or would "ABCCBA" still qualify to be the longest palindrome?
It means that your palindromes should not contain non-ascii characters. In your example DDD would be the longest palindrome. As I suggested in the discussion class, instead of splitting the text, try to find candidate palindromes and then remove from the list those containing non-ascii characters. Then you can report the longest one in your list.
>
> Also, one of the problems that I am having is that it seems as if this problem has a very high level of complexity. For example, in a very simple case:
>
> ZZZZABCDDCBAXXYYZZ
>
> If a text file that has been cleaned and removed of all spaces, punctuation, etc., this means that in order to find the longest palindrome of the text file, we would have to go through all possible words.
>
> This means we would have to check the following words if it is a palindrome:
>
> ZZZZABCDDCBAXXYYZZ
> ZZZZABCDDCBAXXYYZ
> ZZZZABCDDCBAXXYY
> ZZZZABCDDCBAXXY
> ZZZZABCDDCBAXX
> ZZZZABCDDCBAX
> ZZZZABCDDCBA
> ZZZZABCDDCB
> ZZZZABCDD
> ZZZZABCD
> ZZZZABC
> ZZZZAB
> ZZZZA
> ZZZ
> ZZ
> Z
> Then, continue onto the next character:
> ZZZABCDDCBAXXYYZZ
> ZZZABCDDCBAXXYYZ
> ZZZABCDDCBAXXYY
> ZZZABCDDCBAXXY
> ZZZABCDDCBAXX
> ZZZABCDDCBAX
> ZZZABCDDCBA
> ZZZABCDDCB
> ZZZABCDDC
> ZZZABCDD
> ZZZABCD
> ZZZABC
> ZZZAB
> ZZZA
> ZZ
> Z
> Then, continue again---
>
> There seems to be a couple ways to optimize this by: when the current longest palindrome is found, keep track of the character length. This way, we will not have to check strings that are smaller than the current longest palindrome length.
>
> However, in a text file that consists of 10,000 characters, this seems to still take a long time to run. Right now, I have implemented the code, however my machine takes a long time to run everything (over a couple hours). Do you have any suggestions? Thanks in advance.
Yes! As you mentioned the straightforward implementation requires a long time to execute. You should think about it and find a fast solution. If it takes more than a second to find the results it is not a good solution.
1
person likes this idea
I like this idea!
Tell me when this idea gets some attention.
The more people who like this idea, the more it gets noticed.
The more people who like this idea, the more it gets noticed.
Create a customer community for your own organization
Plans starting at $19/month
-
Inappropriate?In terms of optimization, remember that this operation is going to be run on every page in wikipedia in the next assignment. So "how fast does it have to be?", well it depends on how long you have to crawl wikipedia.
I’m fast
-
Inappropriate?"Yes! As you mentioned the straightforward implementation requires a long time to execute. You should think about it and find a fast solution. If it takes more than a second to find the results it is not a good solution. "
Hmmm.. if this is the case, I think we will need to rewrite some of the code to make it even faster. The fastest one we have right now takes 4 minutes to get the result on my laptop.
I’m frustrated
-
Inappropriate?Just as a hint/suggestion:
Each palindrome of size K >= 4, contains another palindrome of size K-2. You can use this fact to slide a window of limited size over the text and just after finding a reasonable size palindrome in that window, try to expand it. -
Inappropriate?Thank you so much!! Now I am finally able to reduce the time to a second!
I’m thankful
-
Inappropriate?Are spaces (blanks) considered punctuation in the 30% limit for palindromes? This would kind of affect my method of counting.
-
Inappropriate?Everything not in [A-Za-z] would be part of the 30% limit.
Loading Profile...


EMPLOYEE
EMPLOYEE