Given the number of copies of the text of all two consecutive words

Question

Given the number of copies of the text of all two consecutive words

Input:

Once upon a time a time this upon a

Conclusion:

dictionary {
    'Once upon': 1,
       'upon a': 2,
       'a time': 2,
       'time a': 1,
    'time this': 1,
    'this upon': 1
}

CODE:

def countTuples(path):
    dic = dict()
    with codecs.open(path, 'r', 'utf-8') as f:
        for line in f:
            s = line.split()
            for i in range (0, len(s)-1):
                dic[str(s[i]) + ' ' + str(s[i+1])] += 1
    return dic

I get this error:

File "C:/Users/user/Anaconda3/hw2.py", line 100, in countTuples
    dic[str(s[i]) + ' ' + str(s[i+1])] += 1
TypeError: list indices must be integers or slices, not str

If you delete +=and just put =1everything will work fine, I think the problem is that I'm trying to access the record in order to extract a value that does not exist yet?

What can I do to fix this?

+4

python dictionary python-3.x n-gram

Tony tannous Apr 14 '17 at 12:33

source share

3 answers

, , defaultdict:

from collections import defaultdict

line = 'Once upon a time a time this upon a'

dic = defaultdict(int)

s = line.split()

for i in range(0, len(s)-1):
    dic[str(s[i]) + ' ' + str(s[i+1])] += 1

:

dic

defaultdict(int,
            {'Once upon': 1,
             'a time': 2,
             'this upon': 1,
             'time a': 1,
             'time this': 1,
             'upon a': 2})

:

def countTuples(path):
    dic = defaultdict(int)
    with codecs.open(path, 'r', 'utf-8') as f:
        for line in f:
            s = line.split()
            for i in range (0, len(s)-1):
                dic[str(s[i]) + ' ' + str(s[i+1])] += 1
    return dic

+2

Kewl 14 . '17 12:43

You do not need to do this like that, just use Counterand use zipto feed the bigrams to the counter, for example:

from collections import Counter

def countTuples(path):
    dic = Counter()
    with codecs.open(path, 'r', 'utf-8') as f
        for line in f:
            s = line.split()
            dic.update('%s %s'%t for t in zip(s,s[1:]))
    return dic

+2

Willem van onsem Apr 14 '17 at 12:53

source share

pansen · Accepted Answer · 2017-04-14T12:43:43+0000

You can use defaultdictto make your solution work. Using, defaultdictyou specify the default value type for a key-value pair. This allows you to make an assignment similar +=1to a key that has not yet been explicitly created:

import codecs
from collections import defaultdict

def countTuples(path):
    dic = defaultdict(int)
    with codecs.open(path, 'r', 'utf-8') as f:
        for line in f:
            s = line.split()
            for i in range (0, len(s)-1):
                dic[str(s[i]) + ' ' + str(s[i+1])] += 1
    return dic

>>> {'Once upon': 1,
     'a time': 2,
     'this upon': 1,
     'time a': 1,
     'time this': 1,
     'upon a': 2})

Given the number of copies of the text of all two consecutive words

More articles: