Convert Cassandra OrderedMapSerializedKey to a Python Dictionary

I have a column in Cassandra consisting of a list map that, when requested using the Python driver, returns an OrderedMapSerializedKey structure. This structure is a map of lists. I would like to put the whole request in pandas.

To extract data from this OrderedMapSerializedKey structure, i.e. get the key and use it as a label for the new column and save only the first element of the list as a value, which I use the above approach here with some complicated / dirty manipulations in the factory before returning the built-in DataFrame.

A similar problem was asked here , with no response.

Is there a better way to turn such an OrderedMapSerializedKey structure into a Python dictionary that can be easily loaded into a pandas DataFrame?

+4
source share
2 answers

I think the final solution could be to save the OrderedMapSerializedKeyCassandra structure as dictin your dataframe column, then you can transfer this value / column to anyone. Ultimate, because you may not know the actual keys in Cassandra strings (maybe different keys are inserted into strings).

So, here is the solution I tested, you only need to improve pandas_factoryfunciton:


EDIT:

(0-) Cassandra (rows - , ​​Cassandra)

from cassandra.util import OrderedMapSerializedKey

def pandas_factory(colnames, rows):

    # Convert tuple items of 'rows' into list (elements of tuples cannot be replaced)
    rows = [list(i) for i in rows]

    # Convert only 'OrderedMapSerializedKey' type list elements into dict
    for idx_row, i_row in enumerate(rows):

        for idx_value, i_value in enumerate(i_row):

            if type(i_value) is OrderedMapSerializedKey:

                rows[idx_row][idx_value] = dict(rows[idx_row][idx_value])

    return pd.DataFrame(rows, columns=colnames)

, / script .

!

+2

​​ pandas ingestion.

cassandra, row_factory:

from cassandra.query import (
    dict_factory,
    SimpleStatement
    )

from cassandra.cluster import (
    Cluster,
    ExecutionProfile,
    EXEC_PROFILE_DEFAULT
    )

profile = ExecutionProfile(
    row_factory=dict_factory
    )

hosts = ["127.0.0.1"]
port = 9042

cluster = Cluster(
    hosts,
    port=port,
    execution_profiles={EXEC_PROFILE_DEFAULT: profile}
    )

:

src_keyspace = "your_keyspace"
src_tbl = "your_table"
N_ROWS = 100

with cluster.connect(src_keyspace) as cass_session:

    res = cass_session.execute(
        SimpleStatement("SELECT * FROM {} LIMIT {}".format(src_tbl,
                                                           N_ROWS))
        )

OrderedMapSerializedKey dict:

    rows_as_dict = [
        { key: (val if not isinstance(val, OrderedMapSerializedKey)
                else dict(val)) for key, val in row.items() }
                    for row in res.current_rows
                    ]

pandas.DataFrame.from_dict

+1

Source: https://habr.com/ru/post/1685022/


All Articles