I am new when it comes to extracting information. Over the past few days, I read a lot of scientific papers and ordered a book about NLP. I want to find out how I can create a FlipDog.com system (hopefully not from scratch). They extract vacancies from over 60,000 company websites. How do i get started?
I am open to learning any programming language. Has anyone used Mallet / GATE / MinorThird or RoadRunner? Ideally, I want to be able to train a system with a dataset related to my domain and extract information based on this. What platform would you recommend for this purpose?
Thank!
source
share