I am trying to parse alinks from the main body ( <article>) of a blog post. I adapted what I found on FPComplete , but nothing was printed. (The code does not work, as far as I can see, how to run it in the online IDE, and for the purpose of Bing, it also does not create links.)
In GHCI, I can mimic the first line of parseAF, and this gives me a large entry that I think is correct. But cursor $// findNodes &| extractDatareturns[]
I tried regex, but didn't like it, trying to find such a long piece of text.
Can anyone help?
{-
module HtmlParser where
import Network.HTTP.Conduit (simpleHttp)
import Prelude hiding (concat, putStrLn)
import Data.Text (concat)
import Data.Text.IO (putStrLn)
import Text.HTML.DOM (parseLBS)
import Text.XML.Cursor (Cursor, attribute, element, fromDocument, ($//), (&//), (&/), (&|))
-- The URL we're going to search
url = "http://www.amsterdamfoodie.nl/2015/wine-beer-food-restaurants-troost/"
-- The data we're going to search for
findNodes :: Cursor -> [Cursor]
findNodes = element "article" &/ element "a"
-- Extract the data from each node in turn
extractData = concat . attribute "href"
cursorFor :: String -> IO Cursor
cursorFor u = do
page <- simpleHttp u
return $ fromDocument $ parseLBS page
-- Process the list of data elements
processData = mapM_ putStrLn
-- main = do
parseAF :: IO ()
parseAF = do
cursor <- cursorFor url
processData $ cursor $// findNodes &| extractData
. , element "article". element "p", , p article, . ....!!