Postgres Performance Tips Billion Line Load

I am in the middle of a project trying to capture many pieces of information from 70 GB XML documents and load them into a relational database (in this case postgres). I am currently using python and psycopg2 scripts for this insert and much more. I found that as the number of rows in some tables increases. (The largest of them is about 5 million rows). The script (insert) speed slowed down to a crawl. What once took a couple of minutes takes about an hour.

What can I do to speed this up? Am I mistaken in using python and psycopg2 for this task? Is there anything I can do for a database that can speed this process up? I have a feeling that I am doing this completely different.

+3
source share
7 answers

What are the settings for wal_buffers and checkpoint_segments? For large transactions, you must configure some options. Check out the manual .

Consider the PostgreSQL 9.0 High Performance book , there is much more to it than tuning the database to get high performance.

+1
source

, , , , . . .

+1

. , .

, , . (1K, 10K, 100K ..) , .

0

COPY . , .

, . - .

checkpoint_segments 3 ( 3 * 16 = 48 ) - , , 32 (512 ). , .

, Postgres "-F", .

0

Population the Database . , PostgreSQL.

, , . 70 , , . , INSERT. COPY - , - .

psql , , :

\d tablename
ALTER TABLE tablename DROP CONSTRAINT constraint_name;

, , - :

ALTER TABLE tablename ADD CONSTRAINT constraint_name FOREIGN KEY (other_table) REFERENCES other_table (join_column);

, , , - pgdump --schema - . , , .

0

5- , - 100k 1 mil; 1-2 ( 70-90, , 1/10 ).

python PSYCOPG2 . , XML2 /

https://dba.stackexchange.com/questions/8172/sql-to-read-xml-from-file-into-postgresql-database

duffymo , 10000 ( , ) autovacuum , , . work_mem maintenance_work_mem ... , wal_buffers, (9.0 auto -1), 8 postgresql, cud fsync wal_sync_method ( , )

, , / ;

use prepared statements for inserts, cast variables

you are trying to insert data into a table with non-isolated data for temporary data storage

Are inserts having conditions or values ​​from a subquery, functions, or the like?

0
source

Source: https://habr.com/ru/post/1791462/


All Articles