How to create a dictionary of dictionary dictionaries in Python

So, I take the natural language processing class, and I need to create a trigram language model for generating random text, which to some extent looks “realistic”, based on some sample data.

It is essential to create a “trigram” for storing various combinations of three-letter words. My professor hints that this can be done using a dictionary of dictionaries of dictionaries that I tried to create using:

trigram = defaultdict( defaultdict(defaultdict(int))) 

However, I get the error message:

 trigram = defaultdict( dict(dict(int))) TypeError: 'type' object is not iterable 

How would I do to create a 3-layer nested dictionary or dictionary of dictionaries of int value dictionaries?

I think people vote for a question about stack overflows if they don’t know how to answer it. I will add some background to better explain the question to those who want to help.

This trigram is used to track three-dimensional patterns of words. They are used in text processing software and almost everywhere in the natural language processing of “think siri or google now”.

If we denote 3 levels of dictionaries as dict1 dict2 and dict3 , then we analyze the text file and read the instruction "The boy works" will have the following:

A dict1 that has the key to "the". Access to this key will return dict2, which contains the key "boy". Access to this key will return the final dict3, which will contain the key "runs", now access to this key will return the value 1.

This symbolizes that in this text “boy runs” appeared 1 time. If we come across this again, we will follow the same process and with a step of 1 to two. If we come across a “girl walk,” then dict2 “key dictionary” will now contain another key for “girl”, which will have dict3, which has a “walk” key and a value of 1, and so on. In the end, after parsing a ton of text (and tracking the number of words), you will have a trigram that can determine the likelihood of a specific start word leading to a combination of three words based on the frequency of occurrence in the previous text being analyzed.

This can help you create grammar rules for identifying languages, or, in my case, create randomly generated text that is very similar to grammatical English. I need a three-layer dictionary, because in any position of a combination of three words there may be another word that can create a whole set of combinations. I TRAINED everything possible to explain the trigrams and the purpose behind them, to the best of my ability ... I just told the class a couple of weeks ago.

Now ... with all that is said. How do I start creating a dictionary of dictionaries of dictionaries whose basic dictionary contains int values ​​in python?

trigram = defaultdict (defaultdict (defaultdict (int)))

causes an error for me

+6
source share
4 answers

I already tried the nested defaultdict and the solution seems to be calling lambda :

 trigram = defaultdict(lambda: defaultdict(lambda: defaultdict(int))) trigram['a']['b']['c'] += 1 

This is not very, but I suspect that the proposal is a nested dictionary for an effective search.

+11
source

As a rule, already published solutions can be used to create a nested trigram dictionary. If you want to expand the idea for a more generalized solution, you can do one of the following: one of them is adopted from Perl AutoVivification and the other using collection.defaultdict .

Solution 1:

 class ngram(dict): """Based on perl autovivification feature.""" def __getitem__(self, item): try: return super(ngram, self).__getitem__(item) except KeyError: value = self[item] = type(self)() return value 

Solution 2:

 from collections import defaultdict class ngram(defaultdict): def __init__(self): super(ngram, self).__init__(ngram) 

Demo using solution 1

 >>> trigram = ngram() >>> trigram['two']['three']['four'] = 4 >>> trigram {'two': {'three': {'four': 4}}} >>> a['two'] {'three': {'four': 4}} >>> a['two']['three'] {'four': 4} >>> a['two']['three']['four'] 4 

Demo using solution 2

 >>> a = ngram() >>> a['two']['three']['four'] = 4 >>> a defaultdict(<class '__main__.ngram'>, {'two': defaultdict(<class '__main__.ngram'>, {'three': defaultdict(<class '__main__.ngram'>, {'four': 4})})}) 
+4
source

The defaultdict __init__ method takes the argument that is required to be called. The called passed to defaultdict should be called without arguments and should return an instance of the default value.

The problem with embedding defaultdict , as you did, was that defaultdict __init__ takes an argument. Providing a defaultdict this argument means that instead of wrapping the defaultdict with the __init__ called as its argument, it has a defaultdict instance that cannot be called.

@Pcoving's lambda solution will work because it creates an anonymous function that returns defaultdict initialization using a function that returns the correct defaultdict type for each layer in the dictionary nesting.

+1
source

If it just retrieves and retrieves trigrams, you should try this with NLTK :

 >>> import nltk >>> sent = "this is a foo bar crazycoder" >>> trigrams = nltk.ngrams(sent.split(), 3) [('this', 'is', 'a'), ('is', 'a', 'foo'), ('a', 'foo', 'bar'), ('foo', 'bar', 'crazycoder')] # token "a" in first element of trigram >>> first_a = [i for i in trigrams if i[0] == "a"] [('a', 'foo', 'bar')] # token "a" in 2nd element of trigram >>> second_a = [i for i in trigrams if i[1] == "a"] [('is', 'a', 'foo')] # token "a" in third element of trigram >>> third = [i for i in trigrams if i[2] == "a"] [('this', 'is', 'a')] # look for 2gram in trigrams >> two_foobar = [i for i in trigrams if "foo" in i and "bar" in i] [('a', 'foo', 'bar'), ('foo', 'bar', 'crazycoder')] # look for a perfect 3gram >> perfect = [i fof i in trigrams if "foo bar crazycoder".split() == i] [('foo', 'bar', 'crazycoder')] 
0
source

Source: https://habr.com/ru/post/954827/


All Articles