Is it possible to have an SQL table with more than a million columns?

I am creating a database for microchip data. Each patient sample has more than 1,000,000 functions, and I would like to store patient samples as rows in an SQL table with each function as a column.

HuEX Microarray Data +----+----------+----------+-----+------------------+ | ID | Feature1 | Feature2 | ... | Feature1,000,000 | +----+----------+----------+-----+------------------+ | 1 | 2.3543 | 10.5454 | ... | 5.34333 | | 2 | 13.4312 | 1.3432 | ... | 40.23422 | +----+----------+----------+-----+------------------+ 

I know that most relational database systems have limits on the number of columns in a table.

 +------------+-----------------+ | DBMS | Max Table Col # | +------------+-----------------+ | SQL Server | 1,024 - 30,000 | | MySQL | 65,535 bytes | | PostgreSQL | 250 - 1,600 | | Oracle | 1,000 | +------------+-----------------+ 

Obviously, these restrictions are too low for my task. In any case, do you need to increase the number of columns in the SQL database table or have another DBMS that can handle such a large number of columns in the table?

Update

Note that all columns will have values โ€‹โ€‹for all rows.

+6
source share
5 answers

Not.

An event, if you can make it work, it will be very slow and uncomfortable.

Instead, you should create a separate table with columns for PatientID , Feature and Value .
This table will have one row for each cell in the proposed table.

It also allows you to add additional information about each pair of patients.

+12
source

Usually you break (normalize) tables:

 Sample: ID, PatientID Feature: ID, Name SampleFeature: SampleID, FeatureID, value 

SQL databases cannot process many columns, but they can process many rows.

+4
source

Try rearranging the table:

 CREATE TABLE MicroarrayData ( SampleID INTEGER, FeatureID INTEGER, Value REAL, PRIMARY KEY (SampleID, FeatureID) ); 
+4
source

This is actually a precedent for Model Attribute Attribute (EAV) and may be better suited for non-RDBMS / SQL in some intensive environments. (A relational database is work horses, though ... can also use it until it is sufficiently insufficient ;-)

From the Wikipedia article:

Entity Attribute Model (EAV) is a data model for describing objects, where the number of attributes (properties, parameters) that can be used to describe them is potentially large, but the number that will really be applied to this subject is relatively modest. In mathematics, this model is known as a sparse matrix.

Happy coding.

+2
source

Well, the new information that this is a dense array of homogeneous numeric (double) values โ€‹โ€‹and queries is important (i.e., I will ignore de-normalization in blobs / XML and using special UDFs), I suggest the following:

Divide each result into several records, where each record has the form:

 ID, SEGMENT, IDx ... // where x is [0, q] 

The q value is arbitrary, but should be chosen based on the specific database implementation (for example, try to fit the 8k record size in SQL Server) for performance / efficiency reasons.

Each result will be split into records so that SEGMENT refers to a segment. That is, the "absolute index" of this function is n = SEGMENT * q + x , and the function n will be found in the record, where SEGMENT = n / q . It follows that the Primary Key (ID, SEGMENT) .

Thus, the query is still simple - the only change is the conversion to / from the segment - the only additional requirement is SEGMENT (this column can also participate in the index).

(A separate table can be used to map functions to SEGMENT/x or otherwise. Thus, it is similar to the EAV model.)

Thus, although it is similar in some way to the fully normalized form, it uses the packed / homogeneous / static nature of the original matrix to significantly reduce the number of records - while 2 million records are perhaps a small table and 20 million records - this is only " average "table, 200 million records (of 200 chips x 1 million functions per chip, if each function leads to a record) begins to become complicated. At the same time, a q out of 200 will reduce the number of records to only 10 million. (Each compact record is also much more efficient in terms of data / structure ratio.)

Happy coding.


Although the one โ€œwhat ifโ€ suggestion above is for my part, I would recommend a more detailed study of the problem - in particular, the exact data patterns required. I'm not sure if this is the โ€œtypicalโ€ use of standard RDBMS, and RDBMS may not even be a good way to approach this problem.

+1
source

Source: https://habr.com/ru/post/891575/


All Articles