I am trying to match a regular expression to the textbook definitions that I get from the website. There is always a word with a new line in the definition, followed by the definition. For instance:
Zither Definition: An instrument of music used in Austria and Germany It has from thirty to forty wires strung across a shallow sounding board which lies horizontally on a table before the performer who uses both hands in playing on it Not to be confounded with the old lute shaped cittern or cithern
In my attempts to get only the word (in this case "Zither"), I continue to receive the newline character.
I tried both ^(\w+)\s and ^(\S+)\s without much luck. I thought that maybe ^(\S+)$ would work, but that doesn't seem to fully match this word. I tested with rubular, http://rubular.com/r/LPEHCnS0ri ; which seems to successfully match all my attempts as I want, even though Java does not.
Here is my fragment
String str = ...; //Here the string is assigned a word and definition taken from the internet like given in the example above. Pattern rgx = Pattern.compile("^(\\S+)$"); Matcher mtch = rgx.matcher(str); if (mtch.find()) { String result = mtch.group(); terms.add(new SearchTerm(result, System.nanoTime())); }
This is easy to solve by trimming the resulting string, but it seems like this should be unnecessary if I already use regex.
All help is much appreciated. Thanks in advance!
source share