Is there any open source library that identifies data patterns in a table?

Well, here's the situation: we have a table of about 50 columns (created by joining database tables) and several thousand rows. We need to identify the pattern in several known erroneous records of this data. Here is a really veiled example. Given the table:

-----------------------
| id | title | date   |
-----------------------
| 01 | c     | 2009-01|
| 02 | a     | 2009-02|
| 03 | a     | 2009-02|
| 04 | b     | 2009-03| 
| 05 | b     | 2009-03| 
| 06 | a     | 2009-04| 
-----------------------

And I ask the library to tell me how lines 1, 4 and 5 are connected? Or how do they differ from all the other lines? The library will say:

  • All selected rows have an odd month number
  • All selected rows do not have title = 'a'

Perhaps the library iterates through a series of pivot table groups in excel. Whenever he finds combinations of groups and calculations that are interesting, he informs you.

( ): , , "" - . , "" , , , , . :

-----------------------------------------------------
| id | user  | created_on| facility | review_status |
-----------------------------------------------------
| 01 | tom   | 2009-01   | Bay      | Locked        |  
| 02 | berry | 2009-02   | Inner    |               |
| 03 | jan   | 2009-02   | Hamming  | Submited      |
| 04 | bernie| 2009-03   | Youth    | Accepted      |
| 05 | jack  | 2009-03   | Johnson  | Locked        |
| 06 | frank | 2009-04   | Baber St.|               |
-----------------------------------------------------

, ( 5) "", .

- ? , DATA MINING , Open Source OR "free as in beer". !

P.S. Petitio principii , , ( , , , ).

+3
4

, : Weka Machine Learning Library.

http://www.cs.waikato.ac.nz/ml/weka/

, Pentaho. " " Weka .

, Ruby binding library Rubyforge, Rarff.

0

MySQL, procedure_analyse() , .

0

" ".

, , , RECORDS .

, "" ( ) , "" .

“Data errors” are a consequence of the fact that they did not notice anything in the initial analysis / implementation. Is always. Therefore, if you want to find “patterns” in the data flaws that you have, think about the initial analysis / implementation and try to figure out what errors were made in them.

0
source

Source: https://habr.com/ru/post/1715025/


All Articles