I am creating this data classification tool. Basically, I will regularly receive data rows in a flat file that look like this:
a:b:c:d:e a:b:c:d:e a:b:c:d:e a:b:c:d:e
And I have a list of categories to break these lines, for example:
Original Cat1 Cat2 Cat3 Cat4 Cat5 --------------------------------------- a:b:c:d:eabcde
On the right side of this second, category names are known, as well as the number of categories for breaking down the data. But this may change over time (for example, categories added / removed ... the total number of categories changed).
Well, thatโs why Iโm not looking for help on how to parse strings or get data in db or something else ... I know how to do all this, and the kernel script is basically written to handle string parsing and split into variables number of categories.
I am mainly looking for advice on how to structure my database for storing this material. So I thought about it, and here is what I came up with:
Table: Generated generated_id int - unique id for each row generated generated_timestamp datetime - timestamp of when row was generated last_updated datetime - timestamp of when row last updated generated_method varchar(6) - method in which row was generated (manual or auto) original_string varchar (255) - the original string Table: Categories category_id int - unique id for category category_name varchar(20) - name of category Table: Category_Values category_map_id int - unique id for each value (not sure if I actually need this) category_id int - id value to link to table Categories generated_id int - id value to link to table Generated category_value varchar (255) - value for the category
Basically, the idea is when I parse the row, I will insert a new record in the Generated table, as well as X records in the Category_Values table, where X, however, there are many categories that currently exist. And category names are stored in another Categories table.
What my script will immediately execute are the lines of the process with raw values โโand output the generated category values โโto a new file that will be sent somewhere. But then I have this db, which I do to store the generated data, so that I can make another script where I can search and write off previously created values, or update previously created records with new values โโor something else.
Does this sound like a good database structure? Anything obvious that I'm missing, or maybe I'm feeling for myself? For example, with this structure ... well ... I'm not an sql expert, but I think I should have done as
select * from Generated where original_string = '$string' // id is put into $id
and then
select * from Category_Values where generated_id = '$id'
... and then I will have data to work with search results or a form for changing data ... well, I'm quite sure that I can even combine this into one query with a connection or something, I donโt know how it is actually done, but I donโt know how to do it. But I know that I can do what I need from this db structure. But am I making it harder than necessary? Make an obvious noob mistake?