A specific use case for algebraic data types

I wrote a general counter for cleaning sites as an exercise, and I did it, and it completed and works fine, but I have a question. You can find it here: https://github.com/mindreader/scrape-enumerator if you want to look at the code.

The main idea is that I need an enumerator that spills out certain entries on the site on pages, such as search engines, blogs, things in which you should pick up the page, and it will have 25 entries, and you need one entry at a time . But at the same time, I did not want to write plumbing for each site, so I need a common interface. I came up with this (this uses type families):

class SiteEnum a where type Result a :: * urlSource :: a -> InputUrls (Int,Int) enumResults :: a -> L.ByteString -> Maybe [Result a] data InputUrls state = UrlSet [URL] | UrlFunc state (state -> (state,URL)) | UrlPageDependent URL (L.ByteString -> Maybe URL) 

To do this on every type of site, this requires a source of some kind of URL, which can be a list (possibly infinite) of pre-expressed URLs, or it can be an initial state and generate URLs from it (for example, if the URLs contain & page = 1, & page = 2, etc.), and then for really bolted pages like google, specify the source URL and then provide a function that will search body for the next link and then use this. Your site makes the data type an instance of SiteEnum and gives a site-specific Result type, and now the counter handles all the I / O, and you don’t need to think about it. This works fine, and I implemented one site with it.

My question is that the annoyance with this implementation is the InputUrls data type. When I use UrlFunc, everything turns gold. When I use UrlSet or UrlPageDependent, this is not all fun and games because the state type is undefined and I have to use it for :: InputUrls () to compile it. This seems completely unnecessary, because a type variable, due to the way the program is written, will never be used for most sites, but I don’t know how to get around it. I find that I want to use such types in different contexts, and always get type variables that are needed only by some parts of the data type, but it doesn't seem to me that I should use it like that. Is there a better way to do this?

+4
source share
1 answer

Why do you need an UrlFunc case? From what I understand, the only thing you do with the state function is to use it to build a list similar to one in UrlSet , so instead of saving the state function, just save the resulting list. Thus, you can exclude a state variable from your data type, which should eliminate the ambiguity problem.

+2
source

Source: https://habr.com/ru/post/1382358/


All Articles