Reading text file fragments using Java 8 Stream

In Java 8, there is a way to create a stream from file lines. In this case, foreach will go through the lines. I have a text file with the following format.

bunch of lines with text $$$$ bunch of lines with text $$$$ 

I need to get each rowset that precedes $$$$ into one element in the stream.

In other words, I need a stream of strings. Each line contains content that precedes $$$$ .

What is the best way (with minimal overhead) for this?

+5
source share
5 answers

I could not come up with a solution that processes strings lazily. I am not sure if this is possible.

My solution is creating an ArrayList . If you need to use Stream , just name stream() on it.

 public class DelimitedFile { public static void main(String[] args) throws IOException { List<String> lines = lines(Paths.get("delimited.txt"), "$$$$"); for (int i = 0; i < lines.size(); i++) { System.out.printf("%d:%n%s%n", i, lines.get(i)); } } public static List<String> lines(Path path, String delimiter) throws IOException { return Files.lines(path) .collect(ArrayList::new, new BiConsumer<ArrayList<String>, String>() { boolean add = true; @Override public void accept(ArrayList<String> lines, String line) { if (delimiter.equals(line)) { add = true; } else { if (add) { lines.add(line); add = false; } else { int i = lines.size() - 1; lines.set(i, lines.get(i) + '\n' + line); } } } }, ArrayList::addAll); } } 

File contents:

  bunch of lines with text
 bunch of lines with text2
 bunch of lines with text3
 $$$$
 2bunch of lines with text
 2bunch of lines with text2
 $$$$
 3bunch of lines with text
 3bunch of lines with text2
 3bunch of lines with text3
 3bunch of lines with text4
 $$$$ 

Output:

  0:
 bunch of lines with text
 bunch of lines with text2
 bunch of lines with text3
 1:
 2bunch of lines with text
 2bunch of lines with text2
 2:
 3bunch of lines with text
 3bunch of lines with text2
 3bunch of lines with text3
 3bunch of lines with text4

Edit:

I finally came up with a solution that Stream lazily generates:

 public static Stream<String> lines(Path path, String delimiter) throws IOException { Stream<String> lines = Files.lines(path); Iterator<String> iterator = lines.iterator(); return StreamSupport.stream(Spliterators.spliteratorUnknownSize(new Iterator<String>() { String nextLine; @Override public boolean hasNext() { if (nextLine != null) { return true; } while (iterator.hasNext()) { String line = iterator.next(); if (!delimiter.equals(line)) { nextLine = line; return true; } } lines.close(); return false; } @Override public String next() { if (!hasNext()) { throw new NoSuchElementException(); } StringBuilder sb = new StringBuilder(nextLine); nextLine = null; while (iterator.hasNext()) { String line = iterator.next(); if (delimiter.equals(line)) { break; } sb.append('\n').append(line); } return sb.toString(); } }, Spliterator.ORDERED | Spliterator.NONNULL | Spliterator.IMMUTABLE), false); } 

This is actually / coincidentally very similar to the implementation of BufferedReader.lines() (which is used internally by Files.lines(Path) ). This may be less than the overhead to not use both of these methods, but use Files.newBufferedReader(Path) and BufferedReader.readLine() instead.

+2
source

You can try

  List<String> list = new ArrayList<>(); try (Stream<String> stream = Files.lines(Paths.get(fileName))) { list = stream .filter(line -> !line.equals("$$$$")) .collect(Collectors.toList()); } catch (IOException e) { e.printStackTrace(); } 
0
source

A similar shorter answer already exists, but type.safe is as follows, with no extra state:

  Path path = Paths.get("... .txt"); try { List<StringBuilder> glist = Files.lines(path, StandardCharsets.UTF_8) .collect(() -> new ArrayList<StringBuilder>(), (list, line) -> { if (list.isEmpty() || list.get(list.size() - 1).toString().endsWith("$$$$\n")) { list.add(new StringBuilder()); } list.get(list.size() - 1).append(line).append('\n'); }, (list1, list2) -> { if (!list1.isEmpty() && !list1.get(list1.size() - 1).toString().endsWith("$$$$\n") && !list2.isEmpty()) { // Merge last of list1 and first of list2: list1.get(list1.size() - 1).append(list2.remove(0).toString()); } list1.addAll(list2); }); glist.forEach(sb -> System.out.printf("------------------%n%s%n", sb)); } catch (IOException ex) { Logger.getLogger(App.class.getName()).log(Level.SEVERE, null, ex); } 

Instead of .endsWith("$$$$\n") it would be better to do:

 .matches("(^|\n)\\$\\$\\$\\$\n") 
0
source

Here's a solution based on this previous work :

 public class ChunkSpliterator extends Spliterators.AbstractSpliterator<List<String>> { private final Spliterator<String> source; private final Predicate<String> delimiter; private final Consumer<String> getChunk; private List<String> current; ChunkSpliterator(Spliterator<String> lineSpliterator, Predicate<String> mark) { super(lineSpliterator.estimateSize(), ORDERED|NONNULL); source=lineSpliterator; delimiter=mark; getChunk=s -> { if(current==null) current=new ArrayList<>(); current.add(s); }; } public boolean tryAdvance(Consumer<? super List<String>> action) { while(current==null || !delimiter.test(current.get(current.size()-1))) if(!source.tryAdvance(getChunk)) return lastChunk(action); current.remove(current.size()-1); action.accept(current); current=null; return true; } private boolean lastChunk(Consumer<? super List<String>> action) { if(current==null) return false; action.accept(current); current=null; return true; } public static Stream<List<String>> toChunks( Stream<String> lines, Predicate<String> splitAt, boolean parallel) { return StreamSupport.stream( new ChunkSpliterator(lines.spliterator(), splitAt), parallel); } } 

which you can use as

 try(Stream<String> lines=Files.lines(pathToYourFile)) { ChunkSpliterator.toChunks( lines, Pattern.compile("^\\Q$$$$\\E$").asPredicate(), false) /* chain your stream operations, eg .forEach(s -> { s.forEach(System.out::print); System.out.println(); }) */; } 
0
source

You can use Scanner as an iterator and create a stream from it:

 private static Stream<String> recordStreamOf(Readable source) { Scanner scanner = new Scanner(source); scanner.useDelimiter("$$$$"); return StreamSupport .stream(Spliterators.spliteratorUnknownSize(scanner, Spliterator.ORDERED | Spliterator.NONNULL), false) .onClose(scanner::close); } 

This will save newlines in chunks for further filtering or splitting.

0
source

Source: https://habr.com/ru/post/1257950/


All Articles