SQL-like JOIN in two text files in Python, is there a built-in way?

The common task that I have to accomplish is an SQL-like JOIN in two text files. those. create a new file from the files of the “left hand” and “right hand” using some kind of connection in the identifier column shared between them. Sometimes variables like external joins, etc. are required.

Of course, I could write a simple script to do this in a general way, but is there a python module - built-in or installable - that can do this? What can handle huge files would be ideal.

EDIT:

  • I know PyTables, but is this the easiest solution for flat text files?
  • By "huge files" I mean, sometimes the "left" file is too large to hold in memory
  • The lack of (so far) python response is bothering me. Am I using the wrong tool / paradigm for this? The reason I asked for python lib is to make it easy to add other transformations to each line (check identifiers, etc.).
+3
source share
2 answers

[wild idea]

Will these files fit into your system memory and leave enough? In this case, you can load them into tables using SQLite, and then attach them to your heart using SQL.

[/ wild idea]

Update

. OP , .. . @Dave Kirby. SQLite .

+1

unixy cygwin, - ​​ , .

[26] % join --help
Usage: join [OPTION]... FILE1 FILE2
For each pair of input lines with identical join fields, write a line to
standard output.  The default join field is the first, delimited
by whitespace.  When FILE1 or FILE2 (not both) is -, read standard input.

  -a FILENUM        print unpairable lines coming from file FILENUM, where
                      FILENUM is 1 or 2, corresponding to FILE1 or FILE2
  -e EMPTY          replace missing input fields with EMPTY
  -i, --ignore-case ignore differences in case when comparing fields
  -j FIELD          equivalent to `-1 FIELD -2 FIELD'
  -o FORMAT         obey FORMAT while constructing output line
  -t CHAR           use CHAR as input and output field separator
  -v FILENUM        like -a FILENUM, but suppress joined output lines
  -1 FIELD          join on this FIELD of file 1
  -2 FIELD          join on this FIELD of file 2
      --help     display this help and exit
      --version  output version information and exit

Unless -t CHAR is given, leading blanks separate fields and are ignored,
else fields are separated by CHAR.  Any FIELD is a field number counted
from 1.  FORMAT is one or more comma or blank separated specifications,
each being `FILENUM.FIELD' or `0'.  Default FORMAT outputs the join field,
the remaining fields from FILE1, the remaining fields from FILE2, all
separated by CHAR.

Important: FILE1 and FILE2 must be sorted on the join fields.

Report bugs to <bug-coreutils@gnu.org>.

- python, , SQLite - SQL .

edit , , . SQLite, .

0

Source: https://habr.com/ru/post/1762832/


All Articles