How to avoid sequential processing in NLP?

The general approach in NLP is a chain of processes that looks like:

  • Tokenization
  • Morphological analysis
  • POS tagging
  • Parsing, or Named Recognition of Entities, or fragmentation of noun words, etc.
  • Classification (or any "ultimate goal" of the program)

It always seemed strange to me that every step makes decisions without “consulting with” the back steps. For example, you can POS-tag a word as a noun, even if it makes any parsing impossible in the future after processing.

I was wondering if there are any approaches to this general NLP problem that take into account the back steps. Propaganda of faith, if you will.

+4
source share
2 answers

You might want to take a look at the “Iteration of the Pipeline” by Hollingshead and Roark (http://acl.ldc.upenn.edu/P/P07/P07-1120.pdf), and the subsequent work of Christie Hollingshead on pipelines in general and the relationship between the stages the pipeline.

+1
source

The pipe that you described about usually consists of how many applications are structured, but this is not the only possible architecture. Some approaches include several passes along the conveyor, where information from one stage is used in the next. Another work brings together some of the steps you listed, such as morphological analysis and PoS labeling. I recently read an article entitled “Hierarchical Dirichlet Process Model for Joint POS and Morphology Induction”, where labels and PoS morphology are induced together because they are co-dependent.

0
source

Source: https://habr.com/ru/post/1439033/


All Articles