Ignoring Hebrew Vowels When Comparing Strings

Question

Ignoring Hebrew Vowels When Comparing Strings

Good evening, I hope you can help me with this problem, as I struggle to find solutions.

I have a provider of words that give me, for example, words from Hebrew, -

Vocals - ב ַּ יִת not vowel - בית
Vowel - ה ַ ב ַּ יְתָה Non-Vowel - הביתה

Unlike my provider, my user cannot normally enter Hebrew vowels (and I do not want him to do this). A user story is a user searching for a word in the provided words. The problem is the comparison between vowels and unglazed words. Since each of them is represented by a different byte array in memory, the equals method returns false.

I tried to understand how UTF-8 handles Hebrew vowels, and it seems like these are just normal characters.

I want to present vowels to the user, so I want to keep the string as it is in memory, but when comparing, I want to ignore them. Is there an easy way to solve this problem?

+4

java encoding hebrew

user1708860 Oct 6 '12 at 20:17

source share

2 answers

AFAIK no. Vowel characters. Even some combinations of letters and dots are symbols. See the wikipedia page.

http://en.wikipedia.org/wiki/Unicode_and_HTML_for_the_Hebrew_alphabet

You can save the search key for your words as characters only in the range 05dx-05ex. You can add another field for a word with vowels.

Of course, you should expect the following:

You will need to consider words that have different meanings according to nikkud.
You should consider the “wrong names” י and ו, which are common.

0

user1658078 Oct 6 '12 at 20:39

source share

chooban · Accepted Answer · 2012-10-06T20:37:52+0000

You can use Collator . I can't tell you exactly how it works, since it is new to me, but this seems to do the trick:

public static void main( String[] args ) { String withVowels = "בַּיִת"; String withoutVowels = "בית"; String withVowelsTwo = "הַבַּיְתָה"; String withoutVowelsTwo = "הביתה"; System.out.println( "These two strings are " + (withVowels.equals( withoutVowels ) ? "" : "not ") + "equal" ); System.out.println( "The second two strings are " + (withVowelsTwo.equals( withoutVowelsTwo ) ? "" : "not ") + "equal" ); Collator collator = Collator.getInstance( new Locale( "he" ) ); collator.setStrength( Collator.PRIMARY ); System.out.println( collator.equals( withVowels, withoutVowels ) ); System.out.println( collator.equals( withVowelsTwo, withoutVowelsTwo ) ); }

From this, I get the following output:

 These two strings are not equal The second two strings are not equal true true

Ignoring Hebrew Vowels When Comparing Strings

More articles: