How to ignore some lines when importing from a text file separated by a tab in PostgreSQL?

I have a 30 GB delimited text file that contains more than 100 million rows, when I want to import this text file into a PostgreSQL table using the \ copy command, some lines cause an error. how can I ignore these lines and also take a record of ignored lines when importing into postgresql?

I am connecting to my SSH machine, so I cannot use pgadmin!

It is very difficult to edit a text file before importing, because many different lines have different problems. if there is a way to check the lines one by one before importing and then run the \ copy command on individual lines, that would be useful.

Below is the code that the table generates:

CREATE TABLE Papers(
    Paper_ID CHARACTER(8) PRIMARY KEY,
    Original_paper_title TEXT,
    Normalized_paper_title TEXT,
    Paper_publish_year INTEGER, 
    Paper_publish_date DATE,
    Paper_Document_Object_Identifier TEXT,
    Original_venue_name TEXT,
    Normalized_venue_name TEXT,
    Journal_ID_mapped_to_venue_name CHARACTER(8),
    Conference_ID_mapped_to_venue_name CHARACTER(8),
    Paper_rank BIGINT,
    FOREIGN KEY(Journal_ID_mapped_to_venue_name) REFERENCES Journals(Journal_ID),
    FOREIGN KEY(Conference_ID_mapped_to_venue_name) REFERENCES Conferences(Conference_ID));
+4
4

, .

create table Papers_stg (rec text);

SQL.

:

select  rec
from    Papers_stg
where   cardinality(string_to_array(rec,'       ')) <> 11

create table Papers_fields_text
as
select  fields[1]  as Paper_ID                          
       ,fields[2]  as Original_paper_title              
       ,fields[3]  as Normalized_paper_title            
       ,fields[4]  as Paper_publish_year                
       ,fields[5]  as Paper_publish_date                
       ,fields[6]  as Paper_Document_Object_Identifier  
       ,fields[7]  as Original_venue_name               
       ,fields[8]  as Normalized_venue_name             
       ,fields[9]  as Journal_ID_mapped_to_venue_name   
       ,fields[10] as Conference_ID_mapped_to_venue_name
       ,fields[11] as Paper_rank                        

from   (select  string_to_array(rec,'       ')  as fields
        from    Papers_stg
        ) t
where   cardinality(fields) = 11

+3

- . script (), "", , ​​ "err_input.txt".

. "row-by-row" "" " " .

0

BEFORE INSERT - . , ( ) null. , .

, (, ..), . , " " ...

StackExchange, Bartosz Dmytrak PostgreSQL:

CREATE OR REPLACE FUNCTION "myschema"."checkTriggerFunction" ()
RETURNS TRIGGER
AS
$BODY$
BEGIN
IF EXISTS (SELECT 1 FROM "myschema".mytable WHERE "MyKey" = NEW."MyKey")
THEN
 RETURN NULL;
ELSE
 RETURN NEW;
END IF;
END;
$BODY$
LANGUAGE plpgsql;

and trigger:
CREATE TRIGGER "checkTrigger"
  BEFORE INSERT
  ON "myschema".mytable
  FOR EACH ROW
  EXECUTE PROCEDURE "myschema"."checkTriggerFunction"();
0

Source: https://habr.com/ru/post/1664921/


All Articles