Name and Name Recognition (NER) Functions

I am new to Named Entity Recognition and I am having trouble understanding that / how functions are used for this task.

Some of the documents that I have read so far mention the functions used, but do not really explain them, for example, in the Introduction to the CoNLL-2003 Joint Task: Language Independent Object Name Recognition , the following functions are mentioned:

The main functions used by sixteen systems participating in the CoNLL-2003 Joint Task, sorted by performance on English test data. Aff: affix information (n-grams); bag: bag of words; cas: global case Information; chu: chunk marks doc: global documentation gas: newspaper; lex: lexical features; ort: spelling information; stroke: spelling patterns (for example, Aa0); pos: tags for part of speech before: previously predicted NE tags; quo: flag signature that this word is between quotation marks; tri: trigger words.

I am a little confused by some of them. For instance:

  • - this is not a bag of words, which should be a method of generating functions (one for each word)? How can a BOW be a feature? Or does it just mean that we have a function for each word, as in BOW, in addition to all the other mentioned functions?
  • How can there be a directory of geographical names?
  • How can POS tags be used as functions? Don't we have a POS tag for every word? Isn't every object / instance "text"?
  • What is global documentation?
  • What are function trigger words?

I think that all I need to do is just look at the sample table with each of these functions in the form of columns and see their values ​​in order to understand how they really work, but so far I have not been able to find an easily readable dataset.

Can someone clarify or point me to some explanation or example of using these functions?

+4
3

(, , ).

isn't bag of words supposed to be a method to generate features (one for each word)? How can BOW itself be a feature? Or does this simply mean we have a feature for each word as in BOW, besides all the other features mentioned?
how can a gazetteer be a feature?

BOW Feature Extraction . , IMO BOW , ( , ). Uning NGrams , BOW .

how can POS tags exactly be used as features ? Don't we have a POS tag for each word? 

POS , " " ( , ). , "" , POS . , POS , " ", , POS.

Isn't each object/instance a "text"?

, , , , , "" - ( ).

what is global document information?

: NLP . - . , , , Paris, ? , 5 , , , , . " ", , ( "" "" ..).

what is the feature trigger words?

, , . , . .

, , , .

, - , .

+1

, , NER / , . ( , , , POS), - ( , ).

- , ( )? BOW ? , , BOW, ?

, BOW , , (, )

?

, , ( ). : " " : " " "" .

POS ? POS ? / ""?

- . (, CRF): . , NE .

?

(, , ), ( ), ..

?

- , , . , "Mr" , , .

+1

NER python, :

  • ngrams ( CountVectorizer)
  • (.. )
  • viterbi or ray search by probability of label sequence
  • part of speech (pos), word length, number of words, is_capitalized, is_stopword
0
source

Source: https://habr.com/ru/post/1668614/


All Articles