Parsing a CSV file using python (to create a decision tree later)

Firstly, full disclosure: this goes to uni assignment, so I don't want to get the code. :). I'm more looking for approaches; I am very new to python after reading a book, but have not written any code yet.

The whole task is to import the contents of the CSV file, create a decision tree from the contents of the CSV file (using the ID3 algorithm ), and then analyze the second CSV file to work with the tree. There is a big (understandable) preference to be able to deal with various CSV files (I asked if we were allowed to hard-code column names, mainly to exclude it as a possibility, but there was no answer).

CSV files are in a fairly standard format; The title bar is marked with #, then the column names are displayed, and each row after that is a simple series of values. Example:

# Column1, Column2, Column3, Column4
Value01, Value02, Value03, Value04
Value11, Value12, Value13, Value14

I'm currently trying to work out the first part: CSV parsing. To make decisions for the decision tree, the structure of the dictionary seems to be the most logical; so I thought about doing something in this direction:

Read in each line, character by character
If the character is not a comma or a space
    Append character to temporary string
If the character is a comma
    Append the temporary string to a list
    Empty string
Once a line has been read
    Create a dictionary using the header row as the key (somehow!)
    Append that dictionary to a list

, , , . , , - : " Column1 Column4, !" - , - , , , .

? -, ? , ?

+3
7

Python . , :

with open(name_of_file,"r") as file:
    for line in file:
         # process the line

string.split , string.strip . Python lists dictionaries.

, , [], {}:

mylist = []; # Creates an empty list
mydict = {}; # Creates an empty dictionary

, .append(), . , mylist.append(5), 5 , mydict[key]=value, key value. , , in. :

if key in mydict:
   print "Present"
else:
   print "Absent"

, for-loop, :

for val in mylist:
    # do something with val

for key in mydict:
    # do something with key or with mydict[key]

, , enumerate, :

for idx, val in enumerate(mylist):
    # do something with val or with idx. Note that val=mylist[idx]

:

idx=0
for val in mylist:
   # process val, idx
   idx += 1

, :

for idx in xrange(len(mylist)):
    # Do something with idx and possibly mylist[idx]

, , len.

; for-loops . , :

>>> list1 = range(10)
>>> list1
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list2 = [2*x for x in list1]
>>> list2
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

, Python, .

+4

csv docs.python.org:

import csv
reader = csv.reader(open("some.csv", "rb"))
for row in reader:
    print row

print , ID3 ​​.

database.append(row)
+4

: (1) csv (2), csv ( C) - !

+2

csv.DictReader.

:

import csv
reader = csvDictReader(open('my_file.csv','rb') # 'rb' = read binary
for d in reader:
    print d # this will print out a dictionary with keys equal to the first row of the file.
+2

CSV-. , , , ...

no-no, () , str.split() , .

+1

CSV

str.split() , str.split() . CSV . http://en.wikipedia.org/wiki/Comma-separated_values

:

1997,Ford,E350,"Super, luxurious truck"

str.split(), ​​ 5 :

('1997', 'Ford', 'E350', '"Super', ' luxurious truck"')

, , , 4 :

('1997', 'Ford', 'E350', 'Super, luxurious truck')

, , , "\ r\n" "\n" . :

1997,Ford,E350,"Super
luxurious truck"
1997,Ford,E250,"Ok? Truck"

, :

file = open('filename.csv', 'r')
for line in file:
    # problem here, "line" may contain partial data

, , CSV , , , .

1997,Ford,E350,"Super ""luxurious"" truck"

('1997', 'Ford', 'E350', 'Super "luxurious" truck')

, :

  • .
  • , , " "
  • " " , , .
  • " " , . ( , "", "", ", " )
  • " " , , .
  • , .
  • , , , .

, , # CSV. , , . # CSV .

, CSV (, 10 100 . ), . list , ( ). , , reset 0, .

, header = ['Column1', 'Column2'] , , :

record[header[column_index]] += character
+1

csv, @Kaloyan Todorov, , , :

for line in file:
    columns = line.split(',')
    for column in columns:
        print column.strip()

.

0

Source: https://habr.com/ru/post/1743095/


All Articles