I support a program that needs to parse data that is present in an “almost structured” form in the text. those. various programs that produce it use several different formats, maybe they were printed and OCR'd returned (yes, I know) with errors, etc., so I need to use a heuristic that guesses how it was created and applied various quirks modes, etc. This is frustrating because I am a little familiar with the theory and practice of parsing, if everything behaves well, and there are good parsing schemes, etc., but the unreliability of the data led me to write very sloppy ad-hoc code. Everything is fine at the moment, but I'm worried that expanding it to handle more options and more complex data will get out of hand.So my question is:
Since there are many existing commercial products that do related things ("fad modes" in web browsers, interpreting errors in compilers, even natural language processing and data mining, etc.). I am sure that some smart people have invested in this thought and tried to develop a theory, and what are the best sources for background reading to parse unprincipled data as much as possible?
I understand that this is somewhat open, but my problem is that I think I need more information to even know what the right questions to ask.
, , , , , ...
, , , - "" , ( , )
OCR, , . , , , "" (.. ) , , OCR .
"Parsing Frameworks", , , , , . , , . , - - .
, , . , , , , . ( "" , , , , )
Source: https://habr.com/ru/post/1716711/More articles:Setting property value of parent class viewcontroller from child viewcontroller? - scopeNHibernate ITransaction and the clean domain model - nhibernateAllow embed / object / param HTML tags using HTMLPurifier? - htmlUnderstanding Cocoa Memory - memory-managementLoading jQuery script on click event - javascriptC # plugin architecture with strong names: misunderstanding - c #Управление путём Python при перемещении кода с компьютера разработки - pythonВозобновление потоков С# - multithreadingasp.net mvc - How to return a user to a previous action? - asp.net-mvcMySQL vs PostgreSQL on Windows - c #All Articles