Solr request - is there a way to limit the size of a text field in a response

Is there a way to limit the amount of text in a text box from a query? Here is a quick script ....

I have 2 fields:

  • docId - int
  • text is a string.

I will request the docId field and want to get the preview text from the 200 character text field. On average, a text field has anything from 600-2000 characters, but I only need a preview.

eg. [MySolrCore] / select Q = DocId: 123 & fl = text

?

Is there a way to do this, since I see no reason to return the entire text box if I need only a small preview?

I don’t look at stroke highlighting, since I’m not looking for specific text in the β€œText” field, but if there is similar functionality to the hl.fragsize parameter, it will be great!

Hope someone can point me in the right direction!

Hooray!

+4
source share
5 answers

You will need to check the effectiveness of this work and simply return the entire field, but this may work for your situation. Basically, turn on the selection in the field that won't match, and then use the alternate field to return the limited number of characters you want.

http://solr:8080/solr/select/?q=*:*&rows=10&fl=author,title&hl=true&hl.snippets=0&hl.fl=sku&hl.fragsize=0&hl.alternateField=description&hl.maxAlternateFieldLength=50

Notes:

  • Make sure your alternate field does not exist in the field list parameter (fl)
  • Make sure that the highlight box (hl.fl) does not actually contain the text you want to search for.

I find that the cost of a processor for a token to work is sometimes more than the cost of a processor and bandwidth is just the return of the entire field. You will have to experiment.

+4
source

I decided to send my comment in response.

I would suggest that you do not store your text data in Solr / Lucene. Index data only to search for and store a unique identifier or URL to identify a document. The contents of the document must be extracted from a separate storage system.

Solr / Lucene are search optimized. This is not your data warehouse or database, and they should not be used in this way. When you store more data in Solr than necessary, you negatively impact your entire search engine. You inflate the size of indexes, increase the replication time between masters and slaves, copy data that requires only one copy, and also the waste cache in document caches, which should be used to speed up the search.

So, I would suggest 2 things.

First, optimally, remove the entire text store from your search index. Retrieve the preview text and all the text from the secondary system, which is optimized for storing documents such as a file server.

Secondly, suboptimal, save only the preview text in your search index. Store the entire document in a different location, such as a file server.

+3
source

you can add an additional field, such as excerpt / summary, which consist of the first 200 characters in the text and instead return this field

0
source

My desire, which I suspect is shared by many sites, is to offer a piece of text with every response to the request. This improves what the user sees from simple names or equivalents. This is normal (see Google for an example) and a productive technique. Currently, we cannot easily cope with sending all the content from Solr / Lucene to a web presentation program and create a code snippet there along with many others in the answer set, since this is a significant network, processor and hog memory (think about working with many files with several MB).

It is reasonable for Solr / Lucene to control the sending of only the first N bytes of content on demand, thereby saving a lot of problems in the field. Kludges backlit etc. They are precisely this and interfere with proper use. We remember that the feed mechanisms in Solr / ucene may not analyze files, so these feeders cannot create fragments.

0
source

Linkedin real-time search http://snaprojects.jira.com/browse/ZOIE

For storing big data http://project-voldemort.com/

-1
source

Source: https://habr.com/ru/post/1336951/


All Articles