Opinions and recommendations on the structure of the database

I am creating this data classification tool. Basically, I will regularly receive data rows in a flat file that look like this:

a:b:c:d:e a:b:c:d:e a:b:c:d:e a:b:c:d:e 

And I have a list of categories to break these lines, for example:

 Original Cat1 Cat2 Cat3 Cat4 Cat5 --------------------------------------- a:b:c:d:eabcde 

On the right side of this second, category names are known, as well as the number of categories for breaking down the data. But this may change over time (for example, categories added / removed ... the total number of categories changed).

Well, thatโ€™s why Iโ€™m not looking for help on how to parse strings or get data in db or something else ... I know how to do all this, and the kernel script is basically written to handle string parsing and split into variables number of categories.

I am mainly looking for advice on how to structure my database for storing this material. So I thought about it, and here is what I came up with:

 Table: Generated generated_id int - unique id for each row generated generated_timestamp datetime - timestamp of when row was generated last_updated datetime - timestamp of when row last updated generated_method varchar(6) - method in which row was generated (manual or auto) original_string varchar (255) - the original string Table: Categories category_id int - unique id for category category_name varchar(20) - name of category Table: Category_Values category_map_id int - unique id for each value (not sure if I actually need this) category_id int - id value to link to table Categories generated_id int - id value to link to table Generated category_value varchar (255) - value for the category 

Basically, the idea is when I parse the row, I will insert a new record in the Generated table, as well as X records in the Category_Values table, where X, however, there are many categories that currently exist. And category names are stored in another Categories table.

What my script will immediately execute are the lines of the process with raw values โ€‹โ€‹and output the generated category values โ€‹โ€‹to a new file that will be sent somewhere. But then I have this db, which I do to store the generated data, so that I can make another script where I can search and write off previously created values, or update previously created records with new values โ€‹โ€‹or something else.

Does this sound like a good database structure? Anything obvious that I'm missing, or maybe I'm feeling for myself? For example, with this structure ... well ... I'm not an sql expert, but I think I should have done as

 select * from Generated where original_string = '$string' // id is put into $id 

and then

 select * from Category_Values where generated_id = '$id' 

... and then I will have data to work with search results or a form for changing data ... well, I'm quite sure that I can even combine this into one query with a connection or something, I donโ€™t know how it is actually done, but I donโ€™t know how to do it. But I know that I can do what I need from this db structure. But am I making it harder than necessary? Make an obvious noob mistake?

+6
source share
2 answers

My suggestion:

 Table: Generated id unsigned int autoincrement primary key generated_timestamp timestamp last_updated timestamp default '0000-00-00' ON UPDATE CURRENT_TIMESTAMP generated_method ENUM('manual','auto') original_string varchar (255) Table: Categories id unsigned int autoincrement primary key category_name varchar(20) Table: Category_Values id unsigned int autoincrement primary key category_id int generated_id int category_value varchar (255) - value for the category FOREIGN KEY `fk_cat`(category_id) REFERENCES category.id FOREIGN KEY `fk_gen`(generated_id) REFERENCES generated.id 

References
Timestamps: http://dev.mysql.com/doc/refman/5.1/en/timestamp.html
Create table syntax: http://dev.mysql.com/doc/refman/5.1/en/create-table.html
Enumerations: http://dev.mysql.com/doc/refman/5.1/en/enum.html

+1
source

I think this solution is perfect for what you want to do. The list of categories is now flexible, so you can add new categories or pensions of the old ones (I would recommend thinking long and hard about it before agreeing to delete the category. - you are an orphan record or delete them too, etc.)

Basically, I say that you are right. The structure is simple, but it will work well for you. Great job (and great job giving exactly the right amount of information in the question).

+1
source

Source: https://habr.com/ru/post/888124/


All Articles