Create a layered structured array from a list of strings

I am working on a python utility to get data from a Tycho 2 star directory. One of the functions that I am working on is catalog requests and returns all the information for a given star identifier (or a set of star identifiers).

I am currently doing this by going through the lines of a directory file, and then try to parse the line into a layered structured array if one was requested. (note, if there is a better way to do this, you can tell me, although this is not what this question is about - I do it this way because the directory is too large to load all this into memory at one time)

In any case, as soon as I determined the record I want to save, I ran into a problem ... I cannot figure out how to parse it into a structured array.

For example, say the record I want to save:

record = '0002 00038 1| |  3.64121230|  1.08701186|   14.1|  -23.0| 69| 82| 1.8| 1.9|1968.56|1957.30| 3|1.0|3.0|0.9|3.0|12.444|0.213|11.907|0.189|999| |         |  3.64117944|  1.08706861|1.83|1.73| 81.0|104.7| | 0.0'

Now, I'm trying to parse this into a multicore structured array with dtype:

        dform = [('starid', [('TYC1', int), ('TYC2', int), ('TYC3', int)]),
             ('pflag', str),
             ('starBearing', [('rightAscension', float), ('declination', float)]),
             ('properMotion', [('rightAscension', float), ('declination', float)]),
             ('uncertainty', [('rightAscension', int), ('declination', int), ('pmRA', float), ('pmDc', float)]),
             ('meanEpoch', [('rightAscension', float), ('declination', float)]),
             ('numPos', int),
             ('fitGoodness', [('rightAscension', float), ('declination', float), ('pmRA', float), ('pmDc', float)]),
             ('magnitude', [('BT', [('mag', float), ('err', float)]), ('VT', [('mag', float), ('err', float)])]),
             ('starProximity', int),
             ('tycho1flag', str),
             ('hipparcosNumber', str),
             ('observedPos', [('rightAscension', float), ('declination', float)]),
             ('observedEpoch', [('rightAscension', float), ('declination', float)]),
             ('observedError', [('rightAscension', float), ('declination', float)]),
             ('solutionType', str),
             ('correlation', float)]

It seems like this should be a fairly simple task, but everything I try breaks down ...

I tried:

np.genfromtxt(BytesIO(record.encode()),dtype=dform,delimiter=(' ','|'))
np.genfromtxt(BytesIO(record.encode()),dtype=dform,delimiter=(' ','|'),missing_values=' ',filling_values=None)

both of which give me

{TypeError}cannot perform accumulate with flexible type

which makes no sense, since it should not make any accumulation.

I also tried

np.array(re.split('\|| ',record),dtype=dform)

who complains

{TypeError}a bytes-like object is required, not 'str'

and another option

np.array([x.encode() for x in re.split('\|| ',record)],dtype=dform)

which does not throw errors, but also, of course, does not return the correct results:

[ ((842018864, 0, 0), '', (0.0, 0.0), (0.0, 0.0), (0, 0, 0.0, 0.0), (0.0, 0.0), 0, (0.0, 0.0, 0.0, 0.0), ((0.0, 0.0), (0.0, 0.0)), 0, '', '', (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), '', 0.0)...

, ? , genfromtxt - ( , ), , . -, ?

0
1

, , , , , . dtype .


TypeError: cannot perform accumulate with flexible type, delimiter. , LineSplitter. , ( "" ).

genfromtxt docs:

: str, int ,         , .          .          .

genfromtxt , .split, loadtxt, , re.

{TypeError}a bytes-like object is required, not 'str', dtype 'str'. , record unicode ( Py3). , BytesIO(record.encode()).

genfromtxt :

record = b'....'
np.genfromtxt([record], ....)

records = b"""one line
tow line
three line
"""
np.genfromtxt(records.splitlines(), ....)

genfromtxt , 32 :

In [19]: A=np.genfromtxt([record],dtype=None,delimiter='|')
In [20]: len(A.dtype)
Out[20]: 32
In [21]: A
Out[21]: 
array((b'0002 00038 1', False, 3.6412123, 1.08701186, 14.1, -23.0, 69, 82, 1.8, 1.9, 1968.56, 1957.3, 3, 1.0, 3.0, 0.9, 3.0, 12.444, 0.213, 11.907, 0.189, 999, False, False, 3.64117944, 1.08706861, 1.83, 1.73, 81.0, 104.7, False, 0.0), 
      dtype=[('f0', 'S12'), ('f1', '?'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<f8'), ... ('f26', '<f8'), ('f27', '<f8'), ('f28', '<f8'), ('f29', '<f8'), ('f30', '?'), ('f31', '<f8')])

np.array([x for x in re.split(b'\|| ',record)],dtype=dform)

. , dform , .

, , .

np.array([(record1...), (record2...), ....], dtype([(field1),(field2 ),...]))

. , dform, 66 v 17. , dform 66 , .

dtype, , .

In [41]: np.zeros((1,),dform)
Out[41]: 
array([ ((0, 0, 0), '', (0.0, 0.0), (0.0, 0.0), (0, 0, 0.0, 0.0), (0.0, 0.0), 0, (0.0, 0.0, 0.0, 0.0), ((0.0, 0.0), (0.0, 0.0)), 0, '', '', (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), '', 0.0)], 
      dtype=[('starid', [('TYC1', '<i4'), ('TYC2', '<i4'), ('TYC3', '<i4')]), ('pflag', '<U'), ('starBearing', [('rightAscension', '<f8'), ('declination', '<f8')]), ('properMotion', [('rightAscension', '<f8'), ('declination', '<f8')]), ('uncertainty', [('rightAscension', '<i4'), ('declination', '<i4'), ('pmRA', '<f8'), ('pmDc', '<f8')]), ('meanEpoch', ....('solutionType', '<U'), ('correlation', '<f8')])

In [64]: for name in A.dtype.names:
    print(A[name].dtype)
   ....:     
[('TYC1', '<i4'), ('TYC2', '<i4'), ('TYC3', '<i4')]
<U1
[('rightAscension', '<f8'), ('declination', '<f8')]
[('rightAscension', '<f8'), ('declination', '<f8')]
[('rightAscension', '<i4'), ('declination', '<i4'), ('pmRA', '<f8'), ('pmDc', '<f8')]
[('rightAscension', '<f8'), ('declination', '<f8')]
int32
[('rightAscension', '<f8'), ('declination', '<f8'), ('pmRA', '<f8'), ('pmDc', '<f8')]
[('BT', [('mag', '<f8'), ('err', '<f8')]), ('VT', [('mag', '<f8'), ('err', '<f8')])]
int32
<U1
<U1
[('rightAscension', '<f8'), ('declination', '<f8')]
[('rightAscension', '<f8'), ('declination', '<f8')]
[('rightAscension', '<f8'), ('declination', '<f8')]
<U1
float64

34 dtype. "", 2-4 , .

2 - |, record.split(b'|') 34 .

genfromtxt:

In [79]: np.genfromtxt([record],delimiter='|',dtype=dform)
Out[79]: 
array(((2, 38, 1), '', (3.6412123, 1.08701186), (14.1, -23.0), 
   (69, 82, 1.8, 1.9), (1968.56, 1957.3), 3, (1.0, 3.0, 0.9, 3.0),
   ((12.444, 0.213), (11.907, 0.189)), 999, '', '', 
   (3.64117944, 1.08706861), (1.83, 1.73), (81.0, 104.7), '', 0.0), 
      dtype=[('starid', [('TYC1', '<i4'), ('TYC2', '<i4'), ('TYC3', '<i4')]), 
 ('pflag', '<U'), 
 ('starBearing', [('rightAscension', '<f8'), ('declination', '<f8')]),  
 ('properMotion', [('rightAscension', '<f8'), ('declination', '<f8')]),
 ('uncertainty', [('rightAscension', '<i4'), ('declination', '<i4'), ('pmRA', '<f8'), ('pmDc', '<f8')]), 
 ('meanEpoch', [('rightAscension', '<f8'), ('declination', '<f8')]),   
 ('numPos', '<i4'), 
 ('fitGoodness', [('rightAscension', '<f8'), ('declination', '<f8'), ('pmRA', '<f8'), ('pmDc', '<f8')]), 
 ('magnitude', [('BT', [('mag', '<f8'), ('err', '<f8')]), ('VT', [('mag', '<f8'), ('err', '<f8')])]), 
 ('starProximity', '<i4'), ('tycho1flag', '<U'), ('hipparcosNumber', '<U'), 
 ('observedPos', [('rightAscension', '<f8'), ('declination', '<f8')]),
 ('observedEpoch', [('rightAscension', '<f8'), ('declination', '<f8')]), 
 ('observedError', [('rightAscension', '<f8'), ('declination', '<f8')]), ('solutionType', '<U'), ('correlation', '<f8')])

. genfromtxt . , np.array().

, /, genfromtxt .

+3

Source: https://habr.com/ru/post/1621324/


All Articles