Given the common bash-tools, it is easy to split a large file (in my case, a MySQL dump, and therefore a TSV file) into smaller parts using the split command. In addition, this command supports splitting a file after n newlines (i.e., the -l argument). But this command does not distinguish between escaped and unescaped newline characters and, thus, can split one row of a table into two incomplete parts.
Example (TSV with 2 columns)
cool 2014-12-15 17:31:00 do not censor it ...^M\\n 2016-01-24 22:33:00 watch out ari, you've got compeition! hahah 2001-12-05 19:11:01 Oh God, the poor guy! xD\\nCan't wait to watch this! 2011-07-11 22:01:20 wish i could do that.\\n 2001-02-07 00:24:11 Funny! I will use this reason when I drink something in other houses 2015-06-10 12:20:00
As you can see, there are two columns (the first contains a comment and the second contains a date), which are separated by a tab. I only visualized newline escaped lines, tabs and incomplete translation lines are not printed. If you put these lines in a file and separate it (for example, split example.tsv -l 1 ), you will get 9 files, but there are only 6 comments (3 contain escaped lines)! This is because escaped newline characters are treated as regular lines with a backslash prefix. This is a huge problem for me, because splitting a file can lead to incomplete table rows in the output files.
Is it possible to ignore the escaped lines of a new line or does someone know another command that can do this?
source share