I am embarking on a project by a nonprofit organization that helps process and classify 1000 reports annually from my field workers / contractors around the world. I am relatively new to NLP and therefore wanted to find a group guide on how to solve our problem.
I will talk about the current process and our problems and will love your help in order to best solve our problem.
Ongoing process: Field officers submit reports from local projects in the form of best practices. These reports are then processed by a full-time team of curators who (i) ensure that they adhere to the best practice template and (ii) edit documents to improve language / style / grammar.
Objective: . As the number of field workers has increased, the volume of generated reports has grown, and our editors are now becoming a bottleneck.
Solution: We would like to automate the first stage of our process, that is, check the document for compliance with the organizational best practice template
In principle, we must ensure that each report has 3 components, namely: 1. Indicates its goal: What topic / problem affects this best practice? 2. Identifies the audience: for whom is it? 3. Highlights Relevance: what can the reader read after reading?
Here is an example of a good report presentation.
" . , ., ".
RegEx . :
1 " " = "", "",
2 " " = "identits", "is for"
3 " " = "", "", ""
RegEx , , , - NLTK, CoreNLP.
.