Parsing analyzer

I need to create a parser analyzer that imports data from excel or csv and puts it in the database. I have no problem getting data from the source. I need to find columns containing prices, product name and description automatically.

What can you suggest, how to do this, are there common methods or libraries?

Data example 1:

Intel Core 2 Duo E6300 (2.80GHz, 1066MHz, 2MB, S775) tray  |    83
Intel Core 2 Duo E6500 (2.93GHz, 1066MHz, 2MB, S775) tray  |    86

Sample data 2:

     Title                     Description                Guaranty     Price  
Intel Core 2 Duo E6300  |  2.80GHz, 1066MHz, 2MB, S775   |  12       |  83    
Intel Core 2 Duo E6500  |  2.93GHz, 1066MHz, 2MB, S775   |  6        |  86

Sample data 3:

 UPC                Title                      Price
 456546545     |  Intel Core 2 Duo E6300    |  83 
 4654654654    |  Intel Core 2 Duo E6500    |  out of stock
+3
source share
3 answers

, , , , , . , . ..

. , . REAL , , . (Rd, St, Court ..), .

, .

+2

SQL Server, SQL Server, CSV Excel.

0

Depending on the quality of your input (all nested lines are equally formatted), you can try the following:

string s = "Intel Core 2 Duo E6300 (2.80GHz, 1066MHz, 2MB, S775) tray  |    83";
string firstPart = s.Substring(0, s.IndexOf("(")).Trim(); //returns "Intel Core 2 Duo E6300"
string secondPart = s.Substring(s.IndexOf("(") + 1, s.IndexOf(")") - s.IndexOf("(") - 1).Trim(); //returns "2.80GHz, 1066MHz, 2MB, S775"
string thirdPart = s.Substring(s.IndexOf(")") + 1, s.IndexOf("|") - s.IndexOf(")") - 1).Trim(); //returns "tray"
string fourthPart = s.Substring(s.IndexOf("|") + 1, s.Length - s.IndexOf("|") - 1).Trim(); //returns "83"

But when your data is not formatted uniformly, you may need some (or many) checks before you can use the above functions.

0
source

Source: https://habr.com/ru/post/1755743/


All Articles