I have a piece of code that basically translates English to text.
I am currently using the String.split() method and using \\\W as a delimiter, removing all non-word characters.
In its current form, this is what I get:
input:I hate text speak!:) output:I h8 txt spk
In any case, I do not lose the delimiters?
EDIT: Here is a method that does parsing. As he claims, it replaces the delimiter with space, at least with its still readable ...
public static String engToText(String text){ text=text.toLowerCase(); String translated=" "; //breaks string into tokens String[] tokens = text.split("\\W"); for(int x=0;x<tokens.length;x++){ if(wordMapEng.containsKey(tokens[x])){ translated+=" "+wordMapEng.get(tokens[x]); }else{ translated+=" " + tokens[x]; } } return translated.trim(); }
source share