Reorganize CSV, so dates are not column headers

I am trying to reorganize an excel (or csv) table so that dates are no longer column headers. I use limited python knowledge to try to do this, but due to a lack of knowledge on where to start, I can use some help.

Under each date is recorded what happened that day for a particular place. Zero values ​​may be skipped. Some cells contain a β€œ-” and can be converted to 0. I would like to make a column for the date and column to indicate the numeric reading of the day. The name of the place is a new line if it was controlled that day.

Example (smh for the person who started it this way):

Name,7/1/2009,7/2/2009,7/3/2009,7/4/2009..... (and so on to the present) Place A,,5,3, Place B,0,,23,-- Place C,1,2,,35 

I would like to:

 Name, Date, Reading Place A, 7/2/2009, 5 Place A, 7/3/2009, 3 Place B, 7/1/2009, 0 Place B, 7/4/2009, 0 <--- Even though this is a dash originally it can be converted to a 0 to keep the number an int. 

There are hundreds of rows (places), and columns (dates) fall into BPD (these are the correct 1772 columns!).

+5
source share
3 answers

What you are trying to do is normalize as a table.

How do you do this in the general case: for each row in a denormal table, you insert rows into a regular table for each denormal column.

How you do this, in particular, depends on how you process the tables. For example, if you use the csv module, in Python 3.x, with the default Excel CSV file, it will look something like this:

 with open('old.csv') as oldcsv, open('new.csv', 'w') as newcsv: r, w = csv.reader(oldcsv), csv.writer(newcsv) header = next(r) w.writerow(['Name', 'Date', 'Reading']) for row in r: for colname, colval in zip(header[1:], row[1:]): w.writerow([row[0], colname, colval]) 

If you want to use, for example, xlrd / xlwt , XlsxReader / XlsxWriter , win32com Excel scripts, etc., the details will be different, but the main idea will be the same: iterate over rows and then iterate over date columns, create a new row for each of them based on the name from the row, the date from the column header and the value from the row.

And you should know how to skip null values, convert "--" to 0 , etc. here.

+2
source

The code below is pretty clear, even if you are just starting out with python :

enumerate - iterator for the index, iterable

 >>> content = """Name,7/1/2009,7/2/2009,7/3/2009,7/4/2009 ... Place A,,5,3, ... Place B,0,,23,-- ... Place C,1,2,,35""" >>> >>> lines = [line.split(',') for line in content.split('\n')] >>> >>> for line in lines: ... if 'Name' not in line[0]: ... for count, date in enumerate(lines[0]): ... if count >= 1: ... if not line[count] or line[count] == '--': ... line[count] = 0 ... # write (line[0], date, line[count]) to a file or print it: ... print (line[0], date, line[count]) ... ('Place A', '7/1/2009', 0) ('Place A', '7/2/2009', '5') ('Place A', '7/3/2009', '3') ('Place A', '7/4/2009', 0) ('Place B', '7/1/2009', '0') ('Place B', '7/2/2009', 0) ('Place B', '7/3/2009', '23') ('Place B', '7/4/2009', 0) ('Place C', '7/1/2009', '1') ('Place C', '7/2/2009', '2') ('Place C', '7/3/2009', 0) ('Place C', '7/4/2009', '35') 
0
source

The following code will normalize as a csv table of the format you described and output a new csv file with lines for each pair (Place, Date) that has an entry. It will also change any entry specified as "-" to 0.

 oldlist = [] newlist = ['Name,Date,Reading'] with open('path_to_csv.csv') as oldcsv, open('newcsv.csv', 'w') as newcsv: for line in oldcsv: line = line.strip('\n') oldlist.append(line.split(',')) for (i,row) in enumerate(oldlist[1:]): for (j, column) in enumerate(row[1:]): if column != '': newrow = [] newrow.append(row[0]) #Adds place name to each newlist row. newrow.append(oldlist[0][j+1]) #Adds date to each newlist row. if column == '--': newrow.append('0') else: newrow.append(column) #Adds reading to each newlist row. newlist.append(",".join(newrow)) for line in newlist: newcsv.write("%s\n" % line) 
0
source

Source: https://habr.com/ru/post/1201441/


All Articles