Flat file recovery tool

Scenario: attempt to import many (> 100), large (> 1 million recursive) files (csv).

Problem: Many entries lack field separators.

Asking: is there a parsing tool that will try to identify and verify the file and allow you to make “built-in” corrections?

ETA: I am trying to import these files into MS SQL Server using the DTS Import Wizard. The error message gives me the line number of the file that it breaks. Fix repeat.

+4
source share
5 answers

Been there, done it. Wrote my own tool.

It’s great how many programs that supposedly output CSVs don’t actually do it right.

A commercial tool would be nice, but given the many problems that I encountered in CSV files (missing separators, bad separator values, built-in CR / LF in the middle of fields, etc.), it was worth writing my own. Thus, when I find a new problem, I simply expand the existing program to deal with it.

I probably should change my nickname at NIH, given my inclinations.

+4
source

I would probably just knock something in Python (or Perl or Awk).
How do you know where the fields are if there are no separators?

edit - I would probably read in all lines, ignore the existing delimiter, break them into known content, and write them again. You only need to do this once, and it will be faster and easier than getting an error and fixing a specific line.

+2
source

If flat files come from the same source, I agree that creating your own tools is one of the best options - problems should be consistent with the file in the file.

OTOH, if you have a constant need to import data from different suppliers, getting an import tool can be more productive.

Almost ten years have passed since I worked on ETL, so I can not make any specific recommendations.

By the way, is it possible to restore flat files? The best solution is to not have broken data in the first place, and not to clear them later.

+1
source

You can try Flat File Checker to solve this problem. This allows you to easily reject badly formatted files and identify lines in which the file has problems.

+1
source

This message appears when you try to access data and indicates that there are no records. The source patch file can get you out of this distortion, where you will find all your software.

-1
source

Source: https://habr.com/ru/post/1277571/


All Articles