Scalaz-stream: how to handle the "header" (first chunks) differently?

Question

Scalaz-stream: how to handle the "header" (first chunks) differently?

Context: I'm trying to write Process1[ByteVector, spray.http.HttpResponsePart] with the output of ChunkedResponseStart(bytes), MessageChunk(bytes), MessageChunk(bytes), ..., ChunkedResponseEnd . I have not yet completely wrapped my head around the scalaz stream and its vocabulary.

How to write a process that can handle the first fragments of n differently?

I came up with this (lines as an example):

 val headerChunk = process1.chunk[String](5).map(_.reduce(_ + _)) val headerChunkAndRest: Process1[String, String] = headerChunk.take(1) ++ process1.id io.linesR(Files.newInputStream(Paths.get("testdata/fahrenheit.txt"))) .pipe(headerChunkAndRest) .to(io.stdOutLines) .run.run

What is an idiomatic and possibly common way to write headerChunkAndRest ?

+6

scala scalaz-stream

Vasiliy Levykin Mar 03 '15 at 11:20

source share

1 answer

stefan.schwetschke · Accepted Answer · 2015-03-04T11:15:16+0000

General considerations

There are several ways to do this, depending very much on the details of your needs. You can use the following helper methods that are part of scalaz threads:

foldWithIndex This gives you the current piece index as a number. You can distinguish based on this index
zipWithState You can add state from one call to your method to the next and use this state to track if you are still parsing the headers or if you have reached the body. In the next step, you can use this state to process the header and body of different
repartition Use this to group the entire headline and all body elements together. Then you can process them in the next step.
zipWithNext This function always presents you with the previous item grouped with the current item. You can use this to detect when you switch from the header to the body and react accordingly.

Perhaps you should think about what you really need. For your question, this will be zipwithIndex and then map . But if you change your mind about the problem, you will probably end up with repartition or zipWithState .

Code example

Make a simple example: an HTTP client that separates the elements of an HTTP header from the body (HTTP, not HTML). In the header, things like cookies in the body are real “content,” like an image or HTTP sources.

A simple HTTP client might look like this:

 import scalaz.stream._ import scalaz.concurrent.Task import java.net.InetSocketAddress import java.nio.channels.AsynchronousChannelGroup implicit val AG = nio.DefaultAsynchronousChannelGroup def httpGetRequest(hostname : String, path : String = "/"): Process[Nothing, String] = Process( s"GET $path HTTP/1.1", s"Host: $hostname", "Accept: */*", "User-Agent: scalaz-stream" ).intersperse("\n").append(Process("\n\n")) def simpleHttpClient(hostname : String, port : Int = 80, path : String = "/")(implicit AG: AsynchronousChannelGroup) : Process[Task, String] = nio.connect(new InetSocketAddress(hostname, port)).flatMap(_.run(httpGetRequest(hostname, path).pipe(text.utf8Encode))).pipe(text.utf8Decode).pipe(text.lines())

Now we can use this code to separate the header lines from the rest. In HTTP, the header is structured in lines. It is separated from the body by an empty line. So, first count the number of lines in the header:

 val demoHostName="scala-lang.org" // Hope they won't mind... simpleHttpClient(demoHostName).zipWithIndex.takeWhile(! _._1.isEmpty).runLast.run // res3: Option[(String, Int)] = Some((Content-Type: text/html,8))

When I ran this, there were 8 lines in the header. First we define an enumeration, so we classify the parts of the answer:

 object HttpResponsePart { sealed trait EnumVal case object HeaderLine extends EnumVal case object HeaderBodySeparator extends EnumVal case object Body extends EnumVal val httpResponseParts = Seq(HeaderLine, HeaderBodySeparator, Body) }

And then use the zipwithIndex plus map to classify parts of the response:

 simpleHttpClient(demoHostName).zipWithIndex.map{ case (line, idx) if idx < 9 => (line, HeaderLine) case (line, idx) if idx == 10 => (line, HeaderBodySeparator) case (line, _) => (line, Body) }.take(15).runLog.run

It works great for me. But, of course, the number of header lines can change at any time without notice. It is much more reliable to use a very simple parser that considers the structure of the response. for this I use zipWithState :

 simpleHttpClient(demoHostName).zipWithState(HeaderLine : EnumVal){ case (line, HeaderLine) if line.isEmpty => HeaderBodySeparator case (_, HeaderLine) => HeaderLine case (_, HeaderBodySeparator) => Body case (line, Body) => Body }.take(15).runLog.run

You can see that both approaches use a similar structure, and both approaches should produce the same result. Great, both approaches can be easily reused. You can simply replace the source, for example. with the file, and nothing needs to be changed. The same with processing after classification. .take(15).runLog.run is the same in both approaches.

Scalaz-stream: how to handle the "header" (first chunks) differently?

General considerations

Code example

More articles: