Frequency paired values table in Python

Question

Frequency paired values table in Python

I am completely new to python and most of my work has been done in R. I would like to know how to get this question in python. Please refer to the link for a clear understanding of the question and decision codes R. How to calculate a table of pairwise samples from a long form frame

This is a dataset:

id  featureCode
5   PPLC
5   PCLI
6   PPLC
6   PCLI
7   PPL
7   PPLC
7   PCLI
8   PPLC
9   PPLC
10  PPLC

and this is what I want:

     PPLC  PCLI  PPL
PPLC  0     3     1
PCLI  3     0     1
PPL   1     1     0

I would like to calculate the number of times each function code is used with other function codes ("the number of hits in the header"). Hope this makes sense now. Please help us with this. Thank..

0

python python-2.7

user3371626 Mar 04 '14 at 15:14

source share

2 answers

Sudeep Juvekar · Answer 1 · 2014-03-04T16:01:31+0000

Pandas, DataFrames, R. , DataFrame df, . ( pandas.read_table. . Thid: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_table.html).

groupby id.

gps = df.groupby("id")
print gps.groups
Out: {5: [0, 1], 6: [2, 3], 7: [4, 5, 6], 8: [7], 9: [8], 10: [9]}

groups , .

featureCode.

unqFet = list(set(df["featureCode"]))
final = pandas.DataFrame(columns=unqFet, index=unqFet)
final = final.fillna(0)
print final
Out: 
            PCLI PPLC PPL
     PCLI    0    0   0
     PPLC    0    0   0
     PPL     0    0   0

, final.

for g in gps.groups.values():
    for i in range(len(g)):
       for j in range(len(g)):
          if i != j:
              final[ df["featureCode"][g[i]] ][ df["featureCode"][g[j]] ] += 1

print final
Out:
          PCLI PPLC PPL
   PCLI    0    3   1
   PPLC    3    0   1
   PPL     1    1   0

sabbahillel · Answer 2 · 2014-03-04T16:05:43+0000

, , . , . , , . Python Pandas, Python.

# Assume the you have a set of tuples lst
lst.sort() # sort the list by id
mydict = {}
id = None
tags = []
for ids in lst:
  if ids[0] == id
    # Pick up the current entry
    tags.append(ids[1])
  else:
    # This is a new id
    # check the count of the previous tags.
    for elem1 in tags:
      for elem2 in tags:
        if elem1 != elem2:
          if elem1 not in mydict:
            mydict[elem1] = {}
          if elem2 not in mydict[elem1]:
            mydict[elem1][elem2] = 0
          mydict[elem1][elem2] += 1
    # This is a different id, reset the indicators for the next loop
    id = ids[0]
    tags = ids[1]        # This is a new id
else:
  # The last element of the lst has to be processed as well
  # check the count of the previous tags.
  for elem1 in tags:
    for elem2 in tags:
      if elem1 != elem2:
        if elem1 not in mydict:
          mydict[elem1] = {}
        if elem2 not in mydict[elem1]:
          mydict[elem1][elem2] = 0
        mydict[elem1][elem2] += 1


# at this point, my dict has the full dictionary count
for tag in mydict.keys():
  print tag, mydict[tag]

, , , .

Frequency paired values ​​table in Python

More articles:

Frequency paired values table in Python