Splitting a line break in a String? (Points)

I am currently extracting some information from a text file (.txt) that contains several paragraphs. When I extract a String from a text file, I want to split it so that each paragraph is in a String object.

Here is the text I get from the text file: http://www.carlowweather.com/plaintext.txt

I tried breaking the String using line breaks and carriage returns, but none of them work, see my code below:

int pCount=0; public void parseData(String data){ String regex = "(\\n)"; String split[] = data.split(regex); for(int i = 0; i<split.length; i++){ Log.e("e", pCount + " " + split[i]); pCount ++; } } 

I also tried "\ r" and various combinations that I found while searching the net, but no one works on Android with this text file, I assume the file does not contain line breaks or carriage returns? But just empty lines?

What is the best way to split paragraphs into String objects?

+3
source share
4 answers

The code below tells where a new paragraph break exists. After that you will have to solve the problem. It just searches for strings with only "". This is a characteristic of the file you indicated. I included the process used to read the file in the sample code below, since you did not specify this in your original question. I thought you read the file line by line and then tried to make regEx on each line. I would suggest that the previous sentences would work if you read the entire text file in one line.

Alternatively, you can break the code below into another function.

  try { BufferedReader in = new BufferedReader(new FileReader("plaintext.txt")); String inputDataLine; while ((inputDataLine = in.readLine()) != null) { if (!(inputDataLine.contentEquals(" "))) { System.out.println("What you want to do with a paragraph line"); } else { System.out.println("What you want to do with a paragraph seperator"); } } in.close(); } catch (IOException e) { } 
+2
source

I think the easiest way to do this is Scanner .

 Scanner sc = new Scanner(new File("donal.txt"), "UTF-8"); sc.useDelimiter("\n[ \t]*\n"); List<String> result = new ArrayList<String>(); int lineCount = 0; while (sc.hasNext()) { String line = sc.next(); System.out.printf("%n%d:%n%s%n", ++lineCount, line); result.add(line); } System.out.printf("%n%d paragraphs found.%n", lineCount); 

The first and last paragraphs will actually be the header and footer; I don’t know what you want to do with it.

For readability, I assume that the line separator is always a Unix \n style, but to be safe, you must enable the Windows \r\n style and the older Mac \r style as Well. This will make the regex:

 "(?:\r\n|[\r\n])[ \t]*(?:\r\n|[\r\n]) 
+4
source

I think the problem is that between paragraphs (spaces, newlines and carriage returns) there are several different characters. Try the following:

 int pCount=0; public void parseData(String data){ String regex = "([ \\t\\r]*\\n[ \\t\\r]*)+"; // Only this line is changed. String split[] = data.split(regex); for(int i = 0; i<split.length; i++){ Log.e("e", pCount + " " + split[i]); pCount ++; } } 
+2
source

I can't try it in Java right now, but it seems that the source file has an empty space at the beginning of each line (including empty ones) and a combination of <cr><lf> to go to the next line. A standard regex to match occurrences of such an empty string that is on the safe side of relatively empty space (quotation marks for defining a Java string):

"^ *$"

+1
source

Source: https://habr.com/ru/post/955751/


All Articles