Solr, stain, bad call, illegal

I represent the search for sunspots in my project. I got POC just by searching the name field. When I entered the description field and resold, I received the following error.

** Invoke sunspot:reindex (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute sunspot:reindex
Skipping progress bar: for progress reporting, add gem 'progress_bar' to your Gemfile
rake aborted!
RSolr::Error::Http: RSolr::Error::Http - 400 Bad Request
Error: {'responseHeader'=>{'status'=>400,'QTime'=>18},'error'=>{'msg'=>'Illegal character ((CTRL-CHAR, code 11))
 at [row,col {unknown-source}]: [42,1]','code'=>400}}

Request Data: "<?xml version=\"1.0\" encoding=\"UTF-8\"?><add><doc><field name=\"id\">ItemsDesign 1322</field><field name=\"type\">ItemsDesign</field><field name=\"type\">ActiveRecord::Base</field><field name=\"class_name\">ItemsDesign</field><field name=\"name_text\">River City Clocks Musical Multi-Colored Quartz Cuckoo Clock</field><field name=\"description_text\">This colorful chalet style German quartz cuckoo clock accurately keeps time and plays 12 different melodies. Many colorful flowers are painted on the clock case and figures of a Saint Bernard and Alpine horn player are on each side of the clock dial. Two decorative pine cone weights are suspended beneath the clock case by two chains. The heart shaped pendulum continously swings back and forth.&#13;On every

I assume that there is a bad char that you can see below. which is dotted with many descriptions. I'm not even sure what this char is.

What can I do to make solr ignore it or clear the data so that the traders can handle it.

thank

+4
source share
3 answers

Put in the initializer to automatically clear stain calls of any UTF8 control characters:

# config/initializers/sunspot.rb
module Sunspot
  # 
  # DataExtractors present an internal API for the indexer to use to extract
  # field values from models for indexing. They must implement the #value_for
  # method, which takes an object and returns the value extracted from it.
  #
  module DataExtractor #:nodoc: all
    # 
    # AttributeExtractors extract data by simply calling a method on the block.
    #
    class AttributeExtractor
      def initialize(attribute_name)
        @attribute_name = attribute_name
      end

      def value_for(object)
        Filter.new( object.send(@attribute_name) ).value
      end
    end

    # 
    # BlockExtractors extract data by evaluating a block in the context of the
    # object instance, or if the block takes an argument, by passing the object
    # as the argument to the block. Either way, the return value of the block is
    # the value returned by the extractor.
    #
    class BlockExtractor
      def initialize(&block)
        @block = block
      end

      def value_for(object)
        Filter.new( Util.instance_eval_or_call(object, &@block) ).value
      end
    end

    # 
    # Constant data extractors simply return the same value for every object.
    #
    class Constant
      def initialize(value)
        @value = value
      end

      def value_for(object)
        Filter.new(@value).value
      end
    end

    # 
    # A Filter to allow easy value cleaning
    #
    class Filter
      def initialize(value)
        @value = value
      end
      def value
        strip_control_characters @value
      end
      def strip_control_characters(value)
        return value unless value.is_a? String

        value.chars.inject("") do |str, char|
          unless char.ascii_only? and (char.ord < 32 or char.ord == 127)
            str << char
          end
          str
        end

      end
    end

  end
end

( Github Sunspot): Sunspot Solr -

+7

@thekingoftruth, . Filter github, , .

, HABTM.

:

 searchable do
    text :name, :description, :excerpt
    text :venue_name do
      venue.name if venue.present?
    end
    text :artist_name do
      artists.map { |a| a.name if a.present? } if artists.present?
    end
  end

, : (: config/initializers/sunspot.rb)

module Sunspot
  #
  # DataExtractors present an internal API for the indexer to use to extract
  # field values from models for indexing. They must implement the #value_for
  # method, which takes an object and returns the value extracted from it.
  #
  module DataExtractor #:nodoc: all
    #
    # AttributeExtractors extract data by simply calling a method on the block.
    #
    class AttributeExtractor
      def initialize(attribute_name)
        @attribute_name = attribute_name
      end

      def value_for(object)
        Filter.new( object.send(@attribute_name) ).value
      end
    end

    #
    # BlockExtractors extract data by evaluating a block in the context of the
    # object instance, or if the block takes an argument, by passing the object
    # as the argument to the block. Either way, the return value of the block is
    # the value returned by the extractor.
    #
    class BlockExtractor
      def initialize(&block)
        @block = block
      end

      def value_for(object)
        Filter.new( Util.instance_eval_or_call(object, &@block) ).value
      end
    end

    #
    # Constant data extractors simply return the same value for every object.
    #
    class Constant
      def initialize(value)
        @value = value
      end

      def value_for(object)
        Filter.new(@value).value
      end
    end

    #
    # A Filter to allow easy value cleaning
    #
    class Filter
      def initialize(value)
        @value = value
      end

      def value
        if @value.is_a? String
          strip_control_characters_from_string @value
        elsif @value.is_a? Array
          @value.map { |v| strip_control_characters_from_string v }
        elsif @value.is_a? Hash
          @value.inject({}) do |hash, (k, v)|
            hash.merge( strip_control_characters_from_string(k) => strip_control_characters_from_string(v) )
          end
        else
          @value
        end
      end

      def strip_control_characters_from_string(value)
        return value unless value.is_a? String

        value.chars.inject("") do |str, char|
          unless char.ascii_only? && (char.ord < 32 || char.ord == 127)
            str << char
          end
          str
        end

      end
    end

  end
end
+3

UTF8 . Solr .
http://en.wikipedia.org/wiki/UTF-8#Codepage_layout

- :

name.gsub!(/\p{Cc}/, "") 

edit: If you want to override it around the world, I think it is possible by overriding value_for_methods in the AttributeExtractor and, if necessary, BlockExtractor. https://github.com/sunspot/sunspot/blob/master/sunspot/lib/sunspot/data_extractor.rb I have not tested this. If you manage to add any global patch, let me know. I have had the same issue lately.

+2
source

Source: https://habr.com/ru/post/1538640/


All Articles