Counting a unique identifier in a csv file using Pandas (python)

Question

Counting a unique identifier in a csv file using Pandas (python)

So, I have a dataset that has a column called "logid" that consists of 4-digit numbers. I have about 200k lines in my csv files, and I would like to count every unique login and print it something like this:

Logid | #ofccurences for each unique identifier. So maybe 1000 | 10 means that logid 1000 is viewed 10 times in the csv file column of the file. Separator | don't just need to make it easier for you guys to read. This is my code:

import pandas as pd
import os, sys
import glob
count = 0
path = "C:\\Users\\cam19\\Desktop\\New folder\\*.csv"
for fname in glob.glob(path):
    df = pd.read_csv(fname, dtype=None, names=['my_data'], low_memory=False)
    counts = df['my_data'].value_counts()
counts

Using this, I get a strange conclusion that I don't quite understand:

4            16463
10013          490
pserverno        1
Name: my_data, dtype: int64

I know that I am doing something wrong in the last line

counts = df ['my_data']. value_counts ()

, . , , C excel ( , 3?) !

+4

python pandas csv

Cameron 31 . '17 2:45

3

, DataFrame append all df list, concat :

dfs = []
path = "C:\\Users\\cam19\\Desktop\\New folder\\*.csv"
for fname in glob.glob(path):
    df = pd.read_csv(fname, dtype=None, usecols=['logid'], low_memory=False)
    dfs.append(df)

df = pd.concat(dfs)

value_counts - Series. 2 column DataFrame rename_axis reset_index:

counts = df['my_data'].value_counts().rename_axis('my_data').reset_index(name='count')
counts

groupby size:

counts = df.groupby('my_data').size().reset_index(name='count')
counts

+1

jezrael 31 . '17 5:06

.

counts = df.loc['logid'].value_counts()

0

Asela Dassanayake Jul 31 '17 at 2:50

source share

R.A.Munna · Accepted Answer · 2017-07-31T04:53:39+0000

. . , csv .

row1,row1,row1
row2,row2,row2
row3,row3,row3
logid,header1,header2
1000,a,b
1001,c,d
1000,e,f
1001,g,h

csv,

# skipping the first three row
df = pd.read_csv("file_name.csv", skiprows=3)
print(df['logid'].value_counts())

:

1001    2
1000    2

, .

1

 df = pd.read_csv(fname, dtype=None, names=['my_data'], low_memory=False)

names = ['my_data'] . csv , . row3, . , csv . , csv . .

Counting a unique identifier in a csv file using Pandas (python)

More articles: