Matlab prepared database statements and bind variables

I'm new to Matlab, but I managed to use it to interact with the database (PostgreSQL in my case), following the official documentation on how to connect to the database using JDBC drivers , and then execute the query with exec . However, the example given in the latter case involves embedding data in a query, which can be inefficient and open to SQL injection.

I would really like to use bind variables, for example, in JDBC PreparedStatement . I can not find anything about this in the documentation. Is there any way to do this?

+5
source share
2 answers

Yes, it can be done as you indicated, but you can also do it all with the Matlab Database Toolbox.

Inserting data through the Matlab database toolbar

In fact, this toolkit works with PostgreSQL through a direct JDBC connection. Moreover, both the embedded data insertion methods datainsert and fastinsert work through specially created prepared statements. The only difference between the two is that the relevant prepared statements are populated with data. fastinsert does this in Matlab using exactly the same technique you mentioned (using different setters like setDouble , setBoolean , setTimestamp , etc., and of course setObject as well). But it turns out that this method is difficult to use in the case of large amounts of data, because in such cases fastinsert becomes very slow. datainsert populates the generated prepared statement with data in Java through a specific com.mathworks.toolbox.database.writeTheData object. This class implements two methods: doubleWrite and cellWrite . doubleWrite allows filling the prepared statement with numerical scalar data. cellWrite assumes that the data is transmitted as a matrix of cells with a scalar object in each cell (the object is either a numeric or logical Matlab scalar, or a Matlab string, or a Java scalar object supported by the JDBC driver, say org.postgresql.jdbc.PgArray in case of an array) for the corresponding field and tuple. Thus, instead of calling setObject from Matlab (this can lead to significant overhead), it is more efficient to populate the specified matrix of cells with the necessary objects as the first step and then call the datainsert from the Matlab Database Toolbox as the second step.

Performance of data insertion methods

But in case all inserted objects have Matlab types (scalars, rows, matrices, multidimensional arrays, structures and arbitrary other Matlab types), there is at least one more effective way to insert data using a special high-performance PostgreSQL client library written 100% in C and based on libpq. It is called PgMex . It is especially useful for large amounts of data (about 1 GB or more) or when it is necessary to insert non-scalar data (arrays). In such cases, both fastinsert and datainsert exhibit deteriorating performance and persistent running out of Java heap memory caused by JDBC driver restrictions for a large dataset. This can be easily seen from the following images:

The case of scalar numeric data Array Case

Here, the performance of fastinsert and datainsert compared with the performance of batchParamExec from PgMex (for more details see https://pgmex.alliedtesting.com/#batchparamexec ). The first image refers to scalar numerical data, the second refers to arrays. The endpoint of each graph corresponds to a certain maximum amount of data transferred to the database by the appropriate method without any error. The amount of data that exceeds this maximum (specific for each method) causes the "out of Java heap memory" problem (The size of the Java heap for each experiment is indicated at the top of each figure). For more information about experiments, see the following paper with complete benchmarking results for inserting data .

+7
source

It seems that I managed to answer my own question, so I share love. AFAICS database toolkit is very simple, but, fortunately, you can directly use the Java JDBC API, as I watched the support team answer this question .

Thus, prepared statements can use the following bind variables:

 % Assume schema % CREATE TABLE (idcol SERIAL PRIMARY KEY, colX INTEGER, colY INTEGER); jdbcconn = conn.Handle stmt = jdbcconn.prepareStatement(['INSERT INTO mytable ('... 'colX, colY) '... 'VALUES (?,?) '... 'RETURNING idcol']); stmt.setObject(1, x); stmt.setObject(2, y); rs = stmt.executeQuery(); success = rs.next(); newentry_id = getInt(1); close(rs); close(stmt); 
+5
source

Source: https://habr.com/ru/post/1240676/


All Articles