Well, here's the situation: we have a table of about 50 columns (created by joining database tables) and several thousand rows. We need to identify the pattern in several known erroneous records of this data. Here is a really veiled example. Given the table:
-----------------------
| id | title | date |
-----------------------
| 01 | c | 2009-01|
| 02 | a | 2009-02|
| 03 | a | 2009-02|
| 04 | b | 2009-03|
| 05 | b | 2009-03|
| 06 | a | 2009-04|
-----------------------
And I ask the library to tell me how lines 1, 4 and 5 are connected? Or how do they differ from all the other lines? The library will say:
- All selected rows have an odd month number
- All selected rows do not have title = 'a'
Perhaps the library iterates through a series of pivot table groups in excel. Whenever he finds combinations of groups and calculations that are interesting, he informs you.
( ):
, , "" - . , "" , , , , . :
-----------------------------------------------------
| id | user | created_on| facility | review_status |
-----------------------------------------------------
| 01 | tom | 2009-01 | Bay | Locked |
| 02 | berry | 2009-02 | Inner | |
| 03 | jan | 2009-02 | Hamming | Submited |
| 04 | bernie| 2009-03 | Youth | Accepted |
| 05 | jack | 2009-03 | Johnson | Locked |
| 06 | frank | 2009-04 | Baber St.| |
-----------------------------------------------------
, ( 5) "", .
- ?
, DATA MINING , Open Source OR "free as in beer".
!
P.S. Petitio principii , , ( , , , ).