How to create a list in Python with unique CSV file values?

I have a CSV file that looks like this:

1994, Category1, Something Happened 1 1994, Category2, Something Happened 2 1995, Category1, Something Happened 3 1996, Category3, Something Happened 4 1998, Category2, Something Happened 5 

I want to create two lists,

 Category = [Category1, Category2, Category3] 

and

 Year = [1994, 1995, 1996, 1998] 

I want to omit duplicates in a column. I read the file as follows:

 DataCaptured = csv.reader(DataFile, delimiter=',') DataCaptured.next() 

and Looping through,

  for Column in DataCaptured: 
+5
source share
3 answers

You can do:

 DataCaptured = csv.reader(DataFile, delimiter=',', skipinitialspace=True) Category, Year = [], [] for row in DataCaptured: if row[0] not in Year: Year.append(row[0]) if row[1] not in Category: Category.append(row[1]) print Category, Year # ['Category1', 'Category2', 'Category3'] ['1994', '1995', '1996', '1998'] 

As pointed out in the comments, if the order doesn't matter, using the set will be simpler and faster:

 Category, Year = set(), set() for row in DataCaptured: Year.add(row[0]) Category.add(row[1]) 
+7
source

A very concise way to do this is to use pandas , the benefits: it has a faster CSV phaser; and it works in columns (for this you only need one df.apply(set) ):

 In [244]: #Suppose the CSV is named temp.csv df=pd.read_csv('temp.csv',header=None) df.apply(set) Out[244]: 0 set([1994, 1995, 1996, 1998]) 1 set([ Category2, Category3, Category1]) 2 set([ Something Happened 4, Something Happene... dtype: object 

The downside is that it returns pandas.Series , and in order to access each list, you need to do something like list(df.apply(set)[0]) .

Edit

If the order needs to be preserved, it can also be done very easily, for example:

 for i, item in df.iteritems(): print item.unique() 

item.unique() will return numpy.array s instead of list s.

+5
source

dawg pointed out one of the biggest tricks in Python: use set() remove duplicates from the list. dawg shows how to create a unique list from scratch by adding each element to set , which is ideal. But here is another equivalent way to do this by creating a list with duplicates and a list without duplicates using the list(set()) approach:

 import csv in_str = [ 'year, category, event', '1994, Category1, Something Happened 1', '1994, Category2, Something Happened 2', '1995, Category1, Something Happened 3', '1996, Category3, Something Happened 4', '1998, Category2, Something Happened 5' ] cdr = csv.DictReader(in_str, skipinitialspace=True) col = [] for i in cdr: col.append(i['category']) # all items in the column... print(col) # only unique items in the column... print(list(set(col))) 
0
source

Source: https://habr.com/ru/post/971394/


All Articles