Rebol 3: reading STDIN efficiently line by line (to create an awk tool)

I am trying to create an awk tool that uses Rebol 3 to process large text files using bash pipes and tools. Am I having trouble reading STDIN line by line in Rebol 3?

For example, this shell command creates 3 lines:

$ (echo "first line" ; echo "second line" ; echo "third line" ) first line second line third line 

But the word Rebol input reads all 3 lines at once. I would expect it to stop on a new line, as it will stop if you use input interactively.

 r3 --do 'while [ x: input ] [ if empty? x [ break ] print x print "***" ]' abcdef abcdef *** blabla blabla *** 

But when I run it all together, it immediately reads the entire input. I could read everything at once and split it into lines, but I want it to work in a "streaming" way, since I usually have a cat for 1000 lines.

 $ (echo "first line" ; echo "second line" ; echo "third line" ) \ | r3 --do 'while [ x: input ] [ if empty? x [ break ] print x print "***" ]' first linesecond linethird line *** 

I also looked at the input source to create a similar function. I could read character by character in a while loop and check for newlines, but that doesn't seem to be effective.

+5
source share
2 answers

I figured this out and it seems to work well even on large 10,000 files. It can be written more elegantly and improved though.

The r3awk function takes an STDIN and a block of code that it executes on each line, binding a string variable to it:

 r3awk: func [ code /local a lines line partial ] [ partial: copy "" lines: read/lines/string system/ports/input while [ not empty? lines ] [ lines/1: rejoin [ partial lines/1 ] partial: pull lines foreach line lines [ do bind code 'line ] if error? try [ lines: read/lines/string system/ports/input ] [ lines: copy [] ] ] line: partial do bind code 'line ] 

It works like that. read / line reads a few characters from the stream and returns a block of lines. Each time it is called, it reads the next batch of such characters, so that all this ends in a while loop. The code processes (makes a block of code) like while loops (and not at the end).

The batch of characters does not end on a new line, so the last line is partial every time. And this is the first line in the next batch, so it brings them together. In the end, he should process the last (this time not partial) line. Try it because some lines caused utf coding errors.

It can be used on the command line:

 (echo "first line" ; echo "second line" ; echo "third line" ) | \ r3 --import utils.r --do 'r3awk [ parse line [ copy x to space (print x) ] ]' first second third 

What needs to be improved: it’s better to make the function better, to deduplicate some code. Check what happens if reads / lines end exactly on a new line.

+4
source

I ran into the same issue with input couple of years ago. I do not think that these are planned changes, but rather an incomplete implementation (touch the tree!).

Here is a workaround that I wrote at the time (which worked fine for me on MacOS and Linux).

 input-line: function [ {Return next line (string!) from STDIN. Returns NONE when nothing left} /part size [integer!] "Internal read/part (buffer) size" ][ buffer: {} ;; static if none? part [size: 1024] forever [ if f: find buffer newline [ remove f ;; chomp newline (NB. doesn't cover Windows CRLF?) break ] if empty? data: read/part system/ports/input size [ f: length? buffer break ] append buffer to-string data ] unless all [empty? data empty? buffer] [take/part buffer f] ] 

Usage example:

 while [not none? line: input-line] [ ;; do something with LINE of data from STDIN ] 
+3
source

Source: https://habr.com/ru/post/1263428/


All Articles