Name and Name Recognition (NER) Functions

Question

Name and Name Recognition (NER) Functions

I am new to Named Entity Recognition and I am having trouble understanding that / how functions are used for this task.

Some of the documents that I have read so far mention the functions used, but do not really explain them, for example, in the Introduction to the CoNLL-2003 Joint Task: Language Independent Object Name Recognition , the following functions are mentioned:

The main functions used by sixteen systems participating in the CoNLL-2003 Joint Task, sorted by performance on English test data. Aff: affix information (n-grams); bag: bag of words; cas: global case Information; chu: chunk marks doc: global documentation gas: newspaper; lex: lexical features; ort: spelling information; stroke: spelling patterns (for example, Aa0); pos: tags for part of speech before: previously predicted NE tags; quo: flag signature that this word is between quotation marks; tri: trigger words.

I am a little confused by some of them. For instance:

- this is not a bag of words, which should be a method of generating functions (one for each word)? How can a BOW be a feature? Or does it just mean that we have a function for each word, as in BOW, in addition to all the other mentioned functions?
How can there be a directory of geographical names?
How can POS tags be used as functions? Don't we have a POS tag for every word? Isn't every object / instance "text"?
What is global documentation?
What are function trigger words?

I think that all I need to do is just look at the sample table with each of these functions in the form of columns and see their values in order to understand how they really work, but so far I have not been able to find an easily readable dataset.

Can someone clarify or point me to some explanation or example of using these functions?

+4

machine-learning nlp classification named-entity-recognition feature-selection

Mr. Phil 02 . '17 12:08

3

markg · Answer 1 · 2017-02-02T19:21:40+0000

(, , ).

isn't bag of words supposed to be a method to generate features (one for each word)? How can BOW itself be a feature? Or does this simply mean we have a feature for each word as in BOW, besides all the other features mentioned?
how can a gazetteer be a feature?

BOW Feature Extraction . , IMO BOW , ( , ). Uning NGrams , BOW .

how can POS tags exactly be used as features ? Don't we have a POS tag for each word?

POS , " " ( , ). , "" , POS . , POS , " ", , POS.

Isn't each object/instance a "text"?

, , , , , "" - ( ).

what is global document information?

: NLP . - . , , , Paris, ? , 5 , , , , . " ", , ( "" "" ..).

what is the feature trigger words?

, , . , . .

, , , .

, - , .

eldams · Answer 2 · 2017-02-03T09:04:08+0000

, , NER / , . ( , , , POS), - ( , ).

- , ( )? BOW ? , , BOW, ?

, BOW , , (, )

?

, , ( ). : " " : " " "" .

POS ? POS ? / ""?

- . (, CRF): . , NE .

?

(, , ), ( ), ..

?

- , , . , "Mr" , , .

Vadim smolyakov · Answer 3 · 2017-10-24T21:38:45+0000

NER python, :

ngrams ( CountVectorizer)
(.. )
viterbi or ray search by probability of label sequence
part of speech (pos), word length, number of words, is_capitalized, is_stopword

Name and Name Recognition (NER) Functions

More articles: