Fast conversion of numeric data to a fixed-width format file in Python

What is the fastest way to convert records containing only numeric data to fixed format strings and write them to a file in Python? For example, suppose that record- it's a huge list consisting of objects with attributes id, x, yand wt, and we often need to turn them into an external file. Cleaning can be done using the following snippet:

with open(serial_fname(), "w") as f: 
    for r in records:
        f.write("%07d %11.5e %11.5e %7.5f\n" % (r.id, r.x, r.y, r.wt))

However, my code spends too much time creating external files, leaving too little time to accomplish what it should do between flash files.

Change the original question:

I ran into this problem when writing server software that tracks global recording, pulling information from several "producer" systems and relaying any changes to the record installed on "consumer" systems in real time or near real-time in pre-processed form. Many consumer systems are Matlab applications.

Listed below are some of the suggestions I have received so far (thanks) with some comments:

  • Dump only the changes, not the entire data set: I already do this. The resulting change sets are still huge.
  • ( - ) : , Matlab , .
  • : , , Matlab.
  • : . - GIL . , .
+3
5

, numpy.savetxt , :

import sys
import numpy as np

fmt = '%7.0f %11.5e %11.5e %7.5f'
records = 10000

np.random.seed(1234)
aray = np.random.rand(records, 4)

def writ(f, aray=aray, fmt=fmt):
  fw = f.write
  for row in aray:
    fw(fmt % tuple(row))

def prin(f, aray=aray, fmt=fmt):
  for row in aray:
    print>>f, fmt % tuple(row)

def stxt(f, aray=aray, fmt=fmt):
  np.savetxt(f, aray, fmt)

nul = open('/dev/null', 'w')
def tonul(func, nul=nul):
  func(nul)

def main():
  print 'looping:'
  loop(sys.stdout, aray)
  print 'savetxt:'
  savetxt(sys.stdout, aray)

( 2,4 Core Duo Macbook Pro Mac OS X 10.5.8, Python 2.5.4 DMG python.org, numpy 1.4 rc1, ), , , , :

$ py25 -mtimeit -s'import ft' 'ft.tonul(ft.writ)'
10 loops, best of 3: 101 msec per loop
$ py25 -mtimeit -s'import ft' 'ft.tonul(ft.prin)'
10 loops, best of 3: 98.3 msec per loop
$ py25 -mtimeit -s'import ft' 'ft.tonul(ft.stxt)'
10 loops, best of 3: 104 msec per loop

savetxt, , , , write... print ( ), , , write ( , - ). , 2,5% , , , , . (BTW, /dev/null, 6 7 , ).

+3

, . , , - , .

, . "", , .

? , ... , , .

, ... . , C , . "", .

, : .

, , Python, . , , .

Python, . . , , .

, PostgreSQL. , . Python ORM. SqlAlchemy.

, : , JSON, , , C, JSON. , ; .

, , "" "", , . , , ; . SQL ; , . - :

for line in sys.stdin:
    id = line[:7]  # fixed width: id is 7 wide
    records[id] = line # will insert or update as needed

, .

- , .

.

+2

, . . .

: , . .

0

, , , .

, " , ", , , .

Run Python code to collect data and use the ORM module to insert / update data into the database. Then run a separate process to create a “report” that will be fixed-width text files. The database will do all the work of creating a text file. If necessary, put the database on your own server, as hardware is pretty cheap these days.

0
source

You can use try to direct your loop to C using ctypes.

0
source

Source: https://habr.com/ru/post/1724851/


All Articles