I started playing with Lucene.NET today, and I wrote a simple testing method for indexing and finding source code files. The problem is that standard analyzers / tokens consider the name of the camel source code identifier as the only token.
I'm looking for a method of treatment identifiers camel case, for example MaxWidth, in the three tokens: MaxWidth, maxand width. I was looking for such a tokenizer, but I could not find it. Before you write your own: is there anything in this direction? Or is there a better approach than writing a tokenizer from scratch?
UPDATE: in the end, I decided to get my hands dirty and wrote it myself CamelCaseTokenFilter. I will write a post about this on my blog and I will update the question.
source
share