Database table design dilemma, a lot of flags?

I want to start with Thank you, you guys were kind to me.

I will immediately go to the question.

Having a table with over 400 columns is that bad?

I have web forms that consist mainly of questions requiring answers in a block. The total number of flags may exceed 400, if not more.

I actually modeled one of the forms and placed each checkbox in a column (took several hours). Due to my unfamiliarity with database design, I did not feel that this was the right way.

So, I read somewhere that some people use the serialization function to save a group of checkboxes as text in a column.

I just want to know that this is the best way to save these checkboxes.

Oh, and some more information I will use cakephp orm with these tables.

Thanks again.

My database looks something like this:

Table: Patients, table: admitForm, Table: SomeOtherFOrm

each form table will have a PatientId

As I said above, I first tried to create a table for each form, and then placing each checkbox in a column. It took me forever.

so I read some where serializing checkboxes to a question would be a good idea

So I would ask if there would be a good approach.

+4
source share
4 answers

== Edit # 3 == Updated ERD with the ability to store free answers to forms, and also linked patient_reponse_option to the question_option_link website so that the patients answer is saved with the correct options context (we know which question the answer is too). I will send some inquiries soon.

enter image description here

== Edit # 2 ==

Updated ERD with form data

enter image description here

== Edit # 1 ==

A short answer to your question: no, 400 columns is the wrong approach. Alternatively, check the following diagram:

enter image description here

== Original ==

According to your recent board, you will want to include a pivot table. The summary table breaks the M: M relationship between “patients” and “options,” for example, many patients may have many options. For this you do not need a table with 400 columns, you just need to include the aforementioned pivot table.

Example circuit:

// patient table tableName: patient id: int(11), autoincrement, unsigned, not null, primary key name_first: varchar(100), not null name_last: varshar(100), not null // Options table tableName: option id: int(11), autoincrement, unsigned, not null, primary key name: varchar(100), not null, unique key // pivot table tableName: patient_option_link id: int(11), autoincrement, unsigned, not null, primary key patient_id: Foreign key to patient (`id`) table option_id: Foreign key to option (`id`) table 

With this scheme, you can have any number of “options” without adding a new column to the patient table. Which, if you have a large number of rows, will crush your database if you ever need to run the alter table add column command.

I added the identifier to the pivot table, so if you ever need to process individual rows, it will be easier to work with them, and also to know that the patient has _and_and_and_and_and_and_and_and_and_and_and_complex.

+1
source

For questions with multiple parameters, just add another table.

The question that no one has asked you yet is whether you need to do data mining or put the answers to these questions in the where clause in the request. If you do not need to make any queries on the data that look at the data contained in these answers, you can simply serialize them into several fields. You can even pack them in numbers. (everyone who comes after you hates you if you pack the data though)

Here is my circuit idea.

The Database Schema Pictorial

+2
source

I think I would divide this into 3 tables. One table representing any object answers questions. The second table contains the questions themselves. Finally, the table of the third join, which will be filled with the primary key of the first table and the identifier of the question from the second table, whenever the entity from the first table selects a check box for this question.

+1
source

Usually 400 columns mean that your data can be normalized better and divided into several tables. 400 columns may really fit, depending on the use case. An example where this might be appropriate is if you need these fields for each individual query And you need to filter the records using these columns (i.e.: use them in your WHERE clause) ... in this case, SQL JOINs will probably be more expensive than having a sparsely populated "wide" table.

If you never need to use SQL to filter records based on these "flags" (I assume that they are yes / no values ​​like boolean / tinyint), then serialization is a valid approach. I would go along this route if I needed to use the checkbox values ​​most of the time when I query the table but don't need to use them in the WHERE clause.

If you do not need these checkboxes or only a small number of them are required, on most queries to your table, then most likely you should work on splitting the table into several tables. One approach is to have a table with checkbox values ​​(id, record_id, checkbox_name, checkbox_value), where record_id is the id your table master record. This implies a one-to-many relationship between your primary records and your checkbox values.

+1
source

Source: https://habr.com/ru/post/1383752/


All Articles