How to import from a mixed encoding file into a PostgreSQL table

Question

How to import from a mixed encoding file into a PostgreSQL table

I have a 30 gigabyte text file. The file encoding is UTF8, but also contains some Windows-1252 characters. Therefore, when I try to import, it gives the following error:

ERROR:  invalid byte sequence for encoding "UTF8": 0x9b

How can i fix this?

the file already has the UTF8 format, when I run the "file" command for this file, it says that the encoding is UTF8. but it also contains some non-byte sequences of UTF8. for example, when I run the \ copy command after a while, it gives the above error for this line:

0B012234    Basic study of <img src="/fulltext-image.asp?format=htmlnonpaginated&src=323K744431152658_html\233_2    basic study of img src fulltext image asp format htmlnonpaginated src 323k744431152658_html 233_2   1975        Semigroup Forum semigroup forum 04861B53        19555

0

sql encoding postgresql utf-8 windows-1255

Ramin zahedi Dec 29 '16 at 11:45

source share

1 answer

David דודו Markovitz · Accepted Answer · 2016-12-29T14:08:02+0000

(\).
CSV-, , . -

\copy t from myfile.txt with csv quote E'\x1' delimiter E'\x2'

How to import from a mixed encoding file into a PostgreSQL table

More articles: