Boolean matrix form python dict lists

I have a list of lists e.g.

dictionary_test = {'A': ['hello', 'byebye', 'howdy'], 'B': ['bonjour', 'hello', 'ciao'], 'C': ['ciao', 'hello', 'byebye']}

I want to convert it to a logical matrix for further analysis. Preferably, dict keysas column names and list items as row names:

         A    B    C
  hello  1    1    1
 byebye  1    0    1
  howdy  1    0    0
bonjour  0    1    0
   ciao  0    1    1

Is it possible to do this in Python (it is advisable that I can write a matrix to a file .csv)? I would like it to be related to numpy, right?

An additional problem is that the size of the dictionary is unknown (both the number of keys and the number of elements in the lists differ).

+4
source share
2 answers

You can use pandas. Here is an example.

>>> import pandas as pd
>>> dictionary_test = {'A': ['hello', 'byebye', 'howdy'], 'B': ['bonjour', 'hello', 'ciao'], 'C': ['ciao', 'hello', 'byebye']}
>>> values = list(set([ x for y in dictionary_test.values() for x in y]))
>>> data = {}
>>> for key in dictionary_test.keys():
...  data[key] = [ True if value in dictionary_test[key] else False for value in values ]
... 
>>> pd.DataFrame(data, index=values)
             A      B      C
ciao     False   True   True
howdy     True  False  False
bonjour  False   True  False
hello     True   True   True
byebye    True  False   True

. values.

+6

Xin, ( ) , dictionary_test .

import pandas as pd

dictionary_test = {'A': ['hello', 'byebye', 'howdy'], 'B': ['bonjour', 'hello', 'ciao'], 'C': ['ciao', 'hello', 'byebye']}

df = pd.DataFrame(dictionary_test)

# all possible words (all possibles indices
words = {word for col in df.columns for word in df[col]}

# create a new DataFrame with the words as the index
d = pd.DataFrame(index = words)

# check whether a given column in your raw data contains a given index
# 1 if yes, 0 if no
for idx in d.index:
    for col in df.columns:
        d.loc[idx, col] = 1 if idx in set(df[col]) else 0

:

d
Out[6]: 
           A    B    C
hello    1.0  1.0  1.0
byebye   1.0  0.0  1.0
bonjour  0.0  1.0  0.0
howdy    1.0  0.0  0.0
ciao     0.0  1.0  1.0

: ValueError: arrays must all be same length , , :

# find how long the longest list is
longest_list_len = max(map(len, dictionary_test.values()))
dictionary_test = {key: value + [None] * (longest_list_len - len(value)) for key, value in dictionary_test.items()}

dictionary_test. words :

# Exclude the `None`s we added above to ensure equal length
words = {word for col in df.columns for word in df[col] if word != None}

!

+2

Source: https://habr.com/ru/post/1668438/


All Articles