Haskell hash table

I am trying to create a small haskell application that translates several key phrases from English into French.

Firstly, I have a list of ordered pairs of strings that represent both the English word / phrase, followed by French translations:

icards = [("the", "le"),("savage", "violent"),("work", "travail"), ("wild", "sauvage"),("chance", "occasion"),("than a", "qu'un")...] 

next I have new data:

 data Entry = Entry {wrd, def :: String, len :: Int, phr :: Bool} deriving Show 

then I use icards to populate the Entrys list:

 entries :: [Entry] entries = map (\(x, y) -> Entry xy (length x) (' ' `elem` x)) icards 

for simplicity, I create a new type that will be [Entry] called Run.

Now I want to create a hash table based on the number of characters in the English word. This will be used later to speed up the search. So I want to create a function called run:

 runs :: [Run] runs = --This will run through the entries and return a new [Entry] that has all of the words of the same length grouped together. 

I also have:

 maxl = maximum [len e | e <- entries] 
+4
source share
2 answers

groupBy and sortBy are in a Data.List .

 import Data.List import Data.Function -- for `on` runs :: [Run] runs = f 0 $ groupBy ((==) `on` len) $ sortBy (compare `on` len) entries where f _ [] = [] fi (r @ (Entry {len = l} : _) : rs) | i == l = r : f (i + 1) rs fi rs = [] : f (i + 1) rs 

Personally, I used a map instead

 import qualified Data.Map as M runs :: M.Map String Entry runs = M.fromList $ map (\entry -> (wrd entry, entry)) entries 

and searching directly with the English word instead of the two-stage English word, and then the process of the English word.

+4
source

It so happened that Hackage has a hashmap package! I am going to create a small data type based on this HashMap, which I will call MultiMap. This is a typical trick: it's just a hash map of linked lists. I'm not sure what the correct name for MultiMap really is.

 import qualified Data.HashMap as HM import Data.Hashable import Prelude hiding (lookup) type MultiMap kv = HM.Map k [v] insert :: (Hashable k, Ord k) => k -> a -> MultiMap ka -> MultiMap ka insert kv = HM.insertWith (++) k [v] lookup :: (Hashable k, Ord k) => k -> MultiMap ka -> [a] lookup km = case HM.lookup km of Nothing -> [] Just xs -> xs empty :: MultiMap ka empty = HM.empty fromList :: (Hashable k, Ord k) => [(k,v)] -> MultiMap kv fromList = foldr (uncurry insert) empty 

I imitated only the fundamentals of the Map: insert, search, empty and from the list. Now it's pretty easy to turn entries into a MutliMap :

 data Entry = Entry {wrd, def :: String, len :: Int, phr :: Bool} deriving (Show) icards = [("the", "le"),("savage", "violent"),("work", "travail"), ("wild", "sauvage"),("chance", "occasion"),("than a", "qu'un")] entries :: [Entry] entries = map (\(x, y) -> Entry xy (length x) (' ' `elem` x)) icards fromEntryList :: [Entry] -> MutiMap Int Entry fromEntryList es = fromList $ map (\e -> (len e, e)) es 

By loading this into ghci, we can now find a list of records with a given length:

 ghci> let m = fromEntryList entries ghci> lookup 3 m [Entry {wrd = "the", def = "le", len = 3, phr = False}] ghci> lookup 4 m [Entry {wrd = "work", def = "travail", len = 4, phr = False}, Entry {wrd = "wild", def = "sauvage", len = 4, phr = False}] 

(Note that this lookup not the one defined in Prelude.) Similarly, you can use the English word as a key.

 -- import Data.List (find) -- up with other imports fromEntryList' :: [Entry] -> MultiMap String Entry fromEntryList' es = fromList $ map (\e -> (wrd e, e)) es eLookup :: String -> MultiMap String Entry -> Maybe Entry eLookup str m = case lookup str m of [] -> Nothing xs -> find (\e -> wrd e == str) xs 

Testing...

 ghci> let m = fromEntryList' entries ghci> eLookup "the" m Just (Entry {wrd = "the", def = "le", len = 3, phr = False}) ghci> eLookup "foo" m Nothing 

Please note that in eLookup we first search the map to determine if anything was placed in this slot. Since we use a hash set, we need to remember that two different lines can have the same hash code. Therefore, if the slot is not empty, we run find in the linked list to see if any of the entries really matches the correct English word. If you are interested in performance, you should use Data.Text instead of String .

+8
source

Source: https://habr.com/ru/post/1379179/


All Articles