So, I have a dataset that has a column called "logid" that consists of 4-digit numbers. I have about 200k lines in my csv files, and I would like to count every unique login and print it something like this:
Logid | #ofccurences for each unique identifier. So maybe 1000 | 10 means that logid 1000 is viewed 10 times in the csv file column of the file. Separator | don't just need to make it easier for you guys to read. This is my code:
import pandas as pd
import os, sys
import glob
count = 0
path = "C:\\Users\\cam19\\Desktop\\New folder\\*.csv"
for fname in glob.glob(path):
df = pd.read_csv(fname, dtype=None, names=['my_data'], low_memory=False)
counts = df['my_data'].value_counts()
counts
Using this, I get a strange conclusion that I don't quite understand:
4 16463
10013 490
pserverno 1
Name: my_data, dtype: int64
I know that I am doing something wrong in the last line
counts = df ['my_data']. value_counts ()
, . , , C excel ( , 3?) !