Parsing a file with limited space

I'm trying to help my father - he gave me an export from a planning application while he was working. We are trying to find out if we can import it into the mysql database so that he / the staff can collaborate with it on the Internet.

I tried several different methods, but none of them work correctly - and this is not my specialty.

You can export here: http://roikingon.com/export.txt

Any help / advice on how to do this would be greatly appreciated!

Thanks!

+6
source share
4 answers

I tried to write a (somewhat dynamic) fixed column parser. Take a look: http://codepad.org/oAiKD0e7 (it's too long for SO, but basically it's β€œdata”).

What i noticed

  • Text data is left-aligned using the pad on the right, like "hello___" ( _ = space)
  • Numeric data is right-aligned with the left hand "___42"

If you want to use my code, there is still something to do:

  • 12.x record types have a variable column (after some static columns), you will have to implement another β€œhandler” for it
  • Some of my widths are most likely wrong. I think there is a system (for example, digits 4 characters long and 8 characters long, with some options for special cases). Someone with domain knowledge and more than one sample can define columns.
  • Getting the source data is only the first step, you need to compare the raw data with some useful model and write this model to the database.
+2
source

With this file structure, you mostly need to reverse engineer a proprietary format. Yes, this is a space, but the format does not conform to any standard, for example CSV, YAML, etc. It is fully patented by what seems like a heading and a separate section with its own headings.

I think it is best to try and see if there is any other type of export that can be done, such as Excel or XML, and work from there. If not, then see if there is any html output that can be displayed on the screen and is inserted into Excel and seeing what you get.

Because of everything that I mentioned above, it will be very difficult to massage the file in its current form into something that can be reasonably imported into the database. (Note that several tables are required from the file structure.)

+2
source

you can use split with regex (zero or more spaces).

I will try and let you know.

It seems that you have no structure data.

 $data = "12.1 0 1144713 751 17 Y 8 517 526 537 542 550 556 561 567 17 "; $arr = preg_split("/ +/", $data); print_r($arr); Array ( [0] => 12.1 [1] => 0 [2] => 1144713 [3] => 751 [4] => 17 [5] => Y [6] => 8 [7] => 517 [8] => 526 [9] => 537 [10] => 542 [11] => 550 [12] => 556 [13] => 561 [14] => 567 [15] => 17 [16] => ) 

Try preg_split("/ +/", $data); which breaks the string into zero or more spaces, then you will have a nice array that you can process. But looking at your data, there is no structure, so you will need to know which element of the array corresponds to that data.

Good luck.

+1
source

Open it with excel and save it as a comma. Treat consecutive delimiters as one, or not. Then save it with excel as csv, which will be separated by commas and will be easier to import into mysql.

EDIT: The guy who says he uses preg_split on "[+]" gives you essentially the same answer as me above.

The question is what to do after that.

Have you decided how many "string types" exist? Once you have determined this and determined their characteristics, it will be much easier to write code to go through it.

If you save it in csv, you can use the fgetcsv PHP function and its related functions. For each row, you will check its type and perform operations depending on the type.

I noticed that your data rows can be divided by whether the first data of the column contains "." so here is an example of how you can scroll the file.

while ($ row = fgetcsv ($ file_handle)) {if (strpos ($ row [0], '.') === false) {// do something} else {// do something else}}

"do something" will be something like "CREATE TABLE table_$row[0] " or "INSERT INTO table " etc.

Ok, and here are a few more observations:

Your file really looks like a few glued files. It contains several formats. Note that all lines starting with "4" have a 4-letter abbreviation for the company, followed by the full name of the company. One of them is caco. If you are looking for "caco", you will find it in several "tables" inside the file.

I also noticed that "smuwtfa" (days of the week) rained down.

Use these hints to determine the processing logic for each row.

+1
source

Source: https://habr.com/ru/post/904645/


All Articles