Anyone want to explain the "Tokenized Field" in terms of databases?

I am reading about SOLR and indexing a MySQL database in SOLR.

What do they mean by tokenize and un-tokenize?

And what does it mean when the fields are "normalized"?

I know how and what it means to normalize a database, but a field? How can a normal field be normalized?

thank

+3
source share
2 answers

What do they mean by “tokenization” and “un-tokenize”?

, .. , . Untokenized , . " ", " ", "".

, ""?

Unicode - , . U + 0060 - `( ), è (U + 00E8), (U + 0060 U + 0065). , , , , è.

+5

, . , .. , (, , ..). , .

: " !" : 1) 2) 3)

"the" (), cat (), , .

+3

Source: https://habr.com/ru/post/1729537/


All Articles