Text search vectors cannot process data, large ones - see documented limits . Their strength is a fuzzy search, so you can search for “swim” and find “swim”, “swim”, “swim” and “swim” by the same call. They are not intended to replace grep .
The reason for the restrictions is in the source code as MAXSTRLEN (and MAXSTRPOS). Text search vectors are stored in one long continuous array up to 1 megabyte long (total of all characters for all unique tokens). To access them, the ts_vector index structure allows 11 bits for the word length and 20 bits for its position in the array. These restrictions allow the index structure to fit into a 32-bit unsigned int.
You are probably working in one or both of these restrictions if you have too many unique words in the file, or the words are repeated very often - something is quite possible if you have a 50 MB log file with quasi-random data.
Are you sure you need to store log files in a database? You basically copy the file system, and grep or python can do the search there pretty well. If you really need to, you can think about this:
CREATE TABLE errorlogs ( id SERIAL PRIMARY KEY , archive_id INTEGER NOT NULL REFERENCES archives , filename VARCHAR(256) NOT NULL ); CREATE TABLE log_lines ( line PRIMARY KEY , errorlog INTEGER REFERENCES errorlogs(id) , context TEXT , tsv TSVECTOR ); CREATE INDEX log_lines_tsv_idx ON log_lines USING gin( line_tsv );
Here you treat each line of the journal as a “document”. To do a search, you would do something like
SELECT e.id, e.filename, g.line, g.context FROM errorlogs e JOIN log_lines g ON e.id = g.errorlog WHERE g.tsv @@ to_tsquery('some & error');
afs76 source share