Convert space delimited file to CSV

I have a text file containing tabular data. What I need to do is automate the task of writing to a new text file, separated by a comma, and not a space in space, extracting several columns from existing data, changing the order of the columns.

This is a fragment of the first 4 lines of the source data:

  Number of rows: 8542
  Algorithm | Date | Time | Longitude | Latitude | Country    
  1 2000-01-03 215926.688 -0.262 35.813 Algeria 
  1 2000-01-03 215926.828 -0.284 35.817 Algeria

Here is what I want at the end:

  Longitude, Latitude, Country, Date, Time
 -0.262,35.813, Algeria, 2000-01-03,215926.688

Any tips on how to approach this?

+4
source share
4 answers

I think the file is separated by tabs, not spaces.

If so, you can try something like:

input_file = open('some_tab_separated_file.txt', 'r') output_file = open('some_tab_separated_file.csv', 'w') input_file.readline() # skip first line for line in input_file: (a, date, time, lon, lat, country) = line.strip().split('\t') output_file.write(','.join([lon, lat, country, date, time]) + '\n') input_file.close() output_file.close() 

This code has not been verified, any error remains for you as an exercise.

+6
source

You can use the csv module and the separator reader to read your data and use a script from the same module (with a comma delimiter) to create the output.

In fact, the first example in the csv module documentation uses delimiter=' ' .

You can use DictReader / DictWriter and specify the column order in its constructor ( fieldnames list: different for reading / writing, if you want to reorder) to display the records in the order you want.

(When creating output, you may need to skip / ignore your first two lines.)

EDIT:

Here is an example of using verbose country names:

 import cStringIO import csv f = cStringIO.StringIO("""ABC 1 2 Costa Rica 3 4 Democratic Republic of the Congo """) r = csv.DictReader(f, delimiter=' ', restkey='rest') for row in r: if row.get('rest'): row['C'] += " %s" % (" ".join(row['rest'])) print 'A: %s, B: %s, C: %s' % (row['A'], row['B'], row['C']) 

Use restkey= and combine the dict entry for this value, which is a list of the remaining ones (here restkey='rest' ). It means:

 A: 1, B: 2, C: Costa Rica A: 3, B: 4, C: Democratic Republic of the Congo 
+4
source

str.split() will be separated by any space without any arguments. operator.itemgetter() takes a few arguments and returns a tuple.

0
source

I assume the important idea is that you should use '\ t' as the @Paulo Scardine delimiter.

I just wanted to add that pandas is a very good library for processing column data.

 >>> src = 'path/to/file' >>> dest = 'path/to/dest_csv' >>> column_names = ['names', 'of', 'columns'] >>> df = pd.read_csv(src, delimiter='\t', names=column_names) # Do something in pandas if you need to >>> df.to_csv(dest, index=False, sep = ';') 
0
source

Source: https://habr.com/ru/post/1390934/


All Articles