How to get started with extracting information?

I am new when it comes to extracting information. Over the past few days, I read a lot of scientific papers and ordered a book about NLP. I want to find out how I can create a FlipDog.com system (hopefully not from scratch). They extract vacancies from over 60,000 company websites. How do i get started?

I am open to learning any programming language. Has anyone used Mallet / GATE / MinorThird or RoadRunner? Ideally, I want to be able to train a system with a dataset related to my domain and extract information based on this. What platform would you recommend for this purpose?

Thank!

+3
source share
1 answer

- dapper.net ( scraping -). dapper . , .

" ", lingpipe. Java- , , Gate Apache UIMA. - Lingpipe , . Gate UIMA.

-, , - (, nutch), (yahoo, google, bing) (, apache lucene), .

Update:

python : http://www.nltk.org/

+3

Source: https://habr.com/ru/post/1766877/


All Articles