Bayesian classification for semi-structured data in Java

I would like to train and use the Bayesian classifier for the following situation:

  • Semi-structured data - basically an XML schema
  • Information is contained in several text fields.
  • Some fields / parts of the circuit can be repeated an arbitrary number of times.

The classification itself is quite simple - basically I need the likelihood that the document is in a certain category.

Design limitations:

  • The solution must be either open source or available under another license without royalties
  • It should be possible to save / load classifiers for future use.
  • It should be possible to embed this library in a larger Java application (that is, the Java / JVM library should work)

Are there any libraries / tools that meet this requirement?

+4
source share
1 answer

I'm not sure if you already have your classifier, but I used Apache UIMA for a couple of project boxes. UIMA is a "simple" framework, but comes with some logic . Some heavy search queries have come up with an example of a Bayesian classifier using UIMA .

It has mechanisms for changing your configurations at runtime, but I'm also a bit unclear about what you mean by “save and load classifiers”. Does this mean that you have an array of binary classifiers that you want to load (and unload) at runtime, or do you have different models that you would like to load / unload?

Answers to other questions:

  • yes, UIMA is open source, released under ASLv2
  • Yes, you can embed UIMA as a library in your application.
+1
source

Source: https://habr.com/ru/post/1433522/


All Articles