CSV data processing

Recently, I was asked to take a CSV file that looks like this:

enter image description here

In something like this:

enter image description here

Remembering that there will be hundreds if not thousands of lines due to the creation of a new line each time a user logs in / out, and there will be more than just two users. My first thought was to upload the .csv file to MySQL and then run the query. However, I really do not want to install MySQL on the machine that will be used for this.

I could do it manually for each agent in Ecxel / Open Office, but due to lack of space for errors, and there are so many lines for this, I want to automate the process. What is the best way to do this?

+4
source share
2 answers

This single-line line relies only on awk and date to convert back and forth over timestamps:

 awk 'BEGIN{FS=OFS=","}NR>1{au=$1 "," $2;t=$4; \ "date -u -d \""t"\" +%s"|getline ts; sum[au]+=ts;}END \ {for (a in sum){"date -u -d \"@"sum[a]"\" +%T"|getline h; print a,h}}' test.csv 

having test.csv as follows:

 Agent,Username,Project,Duration AAA,aaa,NBM,02:09:06 AAA,aaa,NBM,00:15:01 BBB,bbb,NBM,04:14:24 AAA,aaa,NBM,00:00:16 BBB,bbb,NBM,00:45:19 CCC,ccc,NDB,00:00:01 

leads to:

 CCC,ccc,00:00:01 BBB,bbb,04:59:43 AAA,aaa,02:24:23 

You can use this with minor adjustments to extract the date from additional columns.

+2
source

Let me give you an example if you decide to use SQLite. You did not specify a language, but I will use Python because it can be read as pseudocode. This part creates your sqlite file:

 import csv import sqlite3 con = sqlite3.Connection('my_sqlite_file.sqlite') con.text_factory = str cur = con.cursor() cur.execute('CREATE TABLE "mytable" ("field1" varchar, \ "field2" varchar, "field3" varchar);') 

and you use the command:

 cur.executemany('INSERT INTO stackoverflow VALUES (?, ?, ?)', list_of_values) 

to insert rows into your database after reading them from the csv file. Please note that we only created three fields in the database, so we add only 3 values ​​from your list_of_values . That is why we use (?, ?, ?) .

+1
source

Source: https://habr.com/ru/post/1442552/


All Articles