Tika solr integration

I am trying to index using curl based query

inquiry

curl "http://localhost:8080/solr1/update/extract?literal.id=who.pdf&uprefix=attr_&fmap.content=attr_content&commit=true" -F " myfile=@ /root/apache-solr-3.1.0/docs/who.pdf" 

When sending a request, I get this error,

  Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 400 - ERROR:unknown field 'ignored_meta'</h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b> <u>ERROR:unknown field 'ignored_meta'</u></p><p><b>description</b> <u>The request sent by the client was syntactically incorrect (ERROR:unknown field 'ignored_meta').</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/6.0.18</h3></body></html>r 
+6
source share
1 answer

Your problem is that the default handler for ExtractingRequestHandler, defined in solrconfig.xml, puts all Tika unidentified selected fields in fields named "ingored_XXXXX".

To solve this problem, you can simply add the field name "ignored_ *" to your Solr configuration as follows:

 <dynamicField name="ignored_*" type="ignored"/> 

Remember to add an ignored type as well if you remove it from the default configuration:

 <fieldtype name="ignored" stored="false" indexed="false" multiValued="true" class="solr.StrField" /> 

This will stop your Solr from crashing if the Tika index fields that Solr is not aware of.

+13
source

Source: https://habr.com/ru/post/889439/


All Articles