Write custom anaylzer in pylucene / inheritance using jcc?

I want to write my own analyzer in a pill. Usually in java lucene, when you write an analyzer class, your class inherits the lucene Analyzer class.

but pylucene uses jcc, java for the C ++ / python compiler.

So, how can you inherit the python class from the java class using jcc, and especially how do you write your own pill analyzer?

Thank.

+3
source share
2 answers

Here is an example parser that wraps an EdgeNGram filter.

import lucene
class EdgeNGramAnalyzer(lucene.PythonAnalyzer):
    '''
    This is an example of a custom Analyzer (in this case an edge-n-gram analyzer)
    EdgeNGram Analyzers are good for type-ahead
    '''

    def __init__(self, side, minlength, maxlength):
        '''
        Args:
            side[enum] Can be one of lucene.EdgeNGramTokenFilter.Side.FRONT or lucene.EdgeNGramTokenFilter.Side.BACK
            minlength[int]
            maxlength[int]
        '''
        lucene.PythonAnalyzer.__init__(self)
        self.side = side
        self.minlength = minlength
        self.maxlength = maxlength

    def tokenStream(self, fieldName, reader):
        result = lucene.LowerCaseTokenizer(Version.LUCENE_CURRENT, reader)
        result = lucene.StandardFilter(result)
        result = lucene.StopFilter(True, result, StopAnalyzer.ENGLISH_STOP_WORDS_SET)
        result = lucene.ASCIIFoldingFilter(result)
        result = lucene.EdgeNGramTokenFilter(result, self.side, self.minlength, self.maxlength)
        return result
+1
source

, , , Python, Java class, .. "" java-. PythonAnalyzer tokenStream.

0

Source: https://habr.com/ru/post/1727481/


All Articles