Attach to the announcer lists with understanding dict

Suppose I have a large list of words. For instance:

>>> with open('/usr/share/dict/words') as f: ... words=[word for word in f.read().split('\n') if word] 

If I wanted to create an index on the first letter of this list of words, this is easy:

 d={} for word in words: if word[0].lower() in 'aeiou': d.setdefault(word[0].lower(),[]).append(word) # You could use defaultdict here too... 

The result is something like this:

 {'a':[list of 'a' words], 'e':[list of 'e' words], 'i': etc...} 

Is there a way to do this with an understanding of Python 2.7, 3+? In other words, is it possible with dict understanding syntax to add the list represented by the key as the dict is created?

t

  index={k[0].lower():XXX for k in words if k[0].lower() in 'aeiou'} 

Where XXX performs the operation of adding or creating a list for the key when creating the index .

Edit

Accepting suggestions and comparative tests:

 def f1(): d={} for word in words: c=word[0].lower() if c in 'aeiou': d.setdefault(c,[]).append(word) def f2(): d={} {d.setdefault(word[0].lower(),[]).append(word) for word in words if word[0].lower() in 'aeiou'} def f3(): d=defaultdict(list) {d[word[0].lower()].append(word) for word in words if word[0].lower() in 'aeiou'} def f4(): d=functools.reduce(lambda d, w: d.setdefault(w[0], []).append(w[1]) or d, ((w[0].lower(), w) for w in words if w[0].lower() in 'aeiou'), {}) def f5(): d=defaultdict(list) for word in words: c=word[0].lower() if c in 'aeiou': d[c].append(word) 

Produces this test:

  rate/sec f4 f2 f1 f3 f5 f4 11 -- -21.8% -31.1% -31.2% -41.2% f2 14 27.8% -- -11.9% -12.1% -24.8% f1 16 45.1% 13.5% -- -0.2% -14.7% f3 16 45.4% 13.8% 0.2% -- -14.5% f5 18 70.0% 33.0% 17.2% 16.9% -- 

A default default direct loop is the fastest, followed by a job and a loop using setdefault .

Thanks for the ideas!

+6
source share
4 answers

This is not possible (at least not easily or directly) with a dict understanding.

A possible, but potentially offensive syntax, with an understanding of a set or list:

 # your code: d={} for word in words: if word[0].lower() in 'aeiou': d.setdefault(word[0].lower(),[]).append(word) # a side effect set comprehension: index={} r={index.setdefault(word[0].lower(),[]).append(word) for word in words if word[0].lower() in 'aeiou'} print r print [(k, len(d[k])) for k in sorted(d.keys())] print [(k, len(index[k])) for k in sorted(index.keys())] 

Print

 set([None]) [('a', 17094), ('e', 8734), ('i', 8797), ('o', 7847), ('u', 16385)] [('a', 17094), ('e', 8734), ('i', 8797), ('o', 7847), ('u', 16385)] 

The set concept creates a set with the results of the setdefault() method after iterating over the list of words . The total value of set([None]) in this case. It also provides the desired side effect for creating your list of listings.

This is not as readable (IMHO) as the direct loop construct, and should be avoided (IMHO). It is not shorter and probably not much faster. These are more interesting little things about Python than useful - IMHO ... Maybe win a bet?

+3
source

No-dict solutions are designed to create non-overlapping keys with each iteration; they do not support aggregation. For this particular use case, a loop is the right way to efficiently complete a task (in linear time).

+9
source

I would use filter :

 >>> words=['abcd','abdef','eft','egg','uck','ice'] >>> index={k.lower():list(filter(lambda x:x[0].lower()==k.lower(),words)) for k in 'aeiou'} >>> index {'a': ['abcd', 'abdef'], 'i': ['ice'], 'e': ['eft', 'egg'], 'u': ['uck'], 'o': []} 
+3
source

This is not really an understanding of dictate, but:

 reduce(lambda d, w: d.setdefault(w[0], []).append(w[1]) or d, ((w[0].lower(), w) for w in words if w[0].lower() in 'aeiou'), {}) 
+1
source

Source: https://habr.com/ru/post/919396/


All Articles