Anyone want to explain the "Tokenized Field" in terms of databases?

Question

I am reading about SOLR and indexing a MySQL database in SOLR.

What do they mean by tokenize and un-tokenize?

And what does it mean when the fields are "normalized"?

I know how and what it means to normalize a database, but a field? How can a normal field be normalized?

thank

+3

pesar Jan 22 '10 at 8:56

2 answers

, . , .. , (, , ..). , .

: " !" : 1) 2) 3)

"the" (), cat (), , .

+3

jspcal 22 . '10 9:04

Michael Borgwardt · Accepted Answer · 2010-01-22T09:09:56+0000

What do they mean by “tokenization” and “un-tokenize”?

, .. , . Untokenized , . " ", " ", "".

, ""?

Unicode - , . U + 0060 - `( ), è (U + 00E8), (U + 0060 U + 0065). , , , , è.