Using haskell pipe-bytestring to iterate the file line by line

I use a channel library and must convert the ByteString stream to a stream of strings (i.e. String ) using ASCII encoding. I know that there are other libraries (Pipes.Text and Pipes.Prelude) that may make it easier for me to output strings from a text file, but due to some other code, I need to get strings like String from ByteString .

More formally, I need to convert a Producer ByteString IO () to Producer String IO () , which gives strings.

I'm sure this should be a one-liner for an experienced Pipes-Programmer, but so far I have not been able to successfully crack all the FreeT and Lens trickery tags in Pipes-ByteString.

Any help is much appreciated!

Stephan

+5
source share
2 answers

If you need this type signature, I would suggest the following:

 import Control.Foldl (mconcat, purely) import Data.ByteString (ByteString) import Data.Text (unpack) import Lens.Family (view) import Pipes (Producer, (>->)) import Pipes.Group (folds) import qualified Pipes.Prelude as Pipes import Pipes.Text (lines) import Pipes.Text.Encoding (utf8) import Prelude hiding (lines) getLines :: Producer ByteString IO r -> Producer String IO (Producer ByteString IO r) getLines p = purely folds mconcat (view (utf8 . lines) p) >-> Pipes.map unpack 

This works because the purely folds mconcat :

 purely folds mconcat :: (Monad m, Monoid t) => FreeT (Producer tm) r -> Producer tmr 

... where t in this case will be Text :

 purely folds mconcat :: Monad m => FreeT (Producer Text m) r -> Producer Text mr 

Anytime you want to reduce each Producer subgroup of a FreeT -delimited stream, you probably want to use purely folds . Then it's just a matter of choosing the right Fold to reduce the subgroup with. In this case, you just want to merge all the Text fragments within the group, so you go into mconcat . I usually do not recommend this, as it will break into very long lines, but you indicated that you need this behavior.

The reason this is verbose is because the pipes ecosystem promotes Text over String , and also tries to encourage handling of arbitrarily long strings. If you were not limited to your other code, then a more idiomatic approach would be as follows:

 view (utf8 . lines) 
+5
source

After a little hack and some hints from this blog , I came up with a solution, but it is surprisingly clumsy, and I'm afraid a little inefficient, since it uses ByteString.append:

 import Pipes import qualified Pipes.ByteString as PB import qualified Pipes.Prelude as PP import qualified Pipes.Group as PG import qualified Data.ByteString.Char8 as B import Lens.Family (view ) import Control.Monad (liftM) getLines :: Producer PB.ByteString IO r -> Producer String IO r getLines = PG.concats . PG.maps toStringProducer . view PB.lines toStringProducer :: Producer PB.ByteString IO r -> Producer String IO r toStringProducer producer = go producer B.empty where go producer bs = do x <- lift $ next producer case x of Left r -> do yield $ B.unpack bs return r Right (bs', producer') -> go producer' (B.append bs' bs) 
+1
source

Source: https://habr.com/ru/post/1203166/


All Articles