Retrieving values from a subtype

Question

Retrieving values from a subtype

I am parsing an XML file using HXT and I am trying to split the node extraction part into modular elements (I used this as a guide ). Unfortunately, I cannot figure out how to apply some of the selectors as soon as I start the first level parsing.

  import Text.XML.HXT.Core let node tag = multi (hasName tag) xml <- readFile "test.xml" let doc = readString [withValidate yes, withParseHTML no, withWarnings no] xml books <- runX $ doc >>> node "book"

I see books have type [XmlTree]

  :t books books :: [XmlTree]

Now, I would like to get the first element of books , and then extract some values inside the subtree.

  let b = head(books) runX $ b >>> node "cost" Couldn't match type 'Data.Tree.NTree.TypeDefs.NTree' with 'IOSLA (XIOState ()) XmlTree' Expected type: IOSLA (XIOState ()) XmlTree XNode Actual type: XmlTree In the first argument of '(>>>)', namely 'b' In the second argument of '($)', namely 'b >>> node "cost"'

I cannot find selectors when I have XmlTree and I am showing the above misuse to illustrate what I would like. I know I can do this:

  runX $ doc >>> node "book" >>> node "cost" /> getText ["55.9","95.0"]

But I'm not only interested in cost , but also many other elements inside the book . The XML file is pretty deep, so I don’t want to embed everything with <+> , and many of them prefer to extract the piece I want and then extract the subelements in a separate function.

Example (finished) XML file:

  <?xml version="1.0" encoding="UTF-8"?><start xmlns="http://www.example.com/namespace" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <books> <book> <author> <name> <first>Joe</first> <last>Smith</last> </name> <city>New York City</city> </author> <released>1990-11-15</released> <isbn>1234567890</isbn> <publisher>X Publisher</publisher> <cost>55.9</cost> </book> <book> <author> <name> <first>Jane</first> <last>Jones</last> </name> <city>San Francisco</city> </author> <released>1999-01-19</released> <isbn>0987654321</isbn> <publisher>Y Publisher</publisher> <cost>95.0</cost> </book> </books> </start>

Can someone help me understand how to extract book subelements? Ideally, with something nice, like >>> and node , so I can define my own functions like getCost , getName , etc. that everyone will be rude to have an XmlTree -> [String] signature XmlTree -> [String]

+5

haskell hxt

Ecognium Jan 25 '16 at 6:51

source share

1 answer

zakyggaps · Accepted Answer · 2016-01-25T12:26:57+0000

doc is not what you thought. It has an IOStateArrow sb XmlTree . You really have to read your guide again, all you want to know was enclosed under the name “Avoid I / O” .

Arrows are basically functions. SomeArrow ab can be considered as a generalized / specialized function of type a -> b . >>> and other operators in the field are intended for composition of arrows, similar to composition of functions. Your books is of type [XmlTree] , so it is not an arrow and cannot be composed with arrows. What satisfies your needs, runLA , converts the arrow as a node "tag" into a normal function:

 module Main where import Text.XML.HXT.Core main = do html <- readFile "test.xml" let doc = readString [withValidate yes, withParseHTML no, withWarnings no] html books <- runX $ doc >>> node "book" -- runLA (node "cost" /> getText) :: XmlTree -> [String] let costs = books >>= runLA (node "cost" /> getText) print costs node tag = multi (hasName tag)

Retrieving values ​​from a subtype

More articles:

Retrieving values from a subtype