Adding meta information / metadata to pandas DataFrame

Is it possible to add meta information / metadata to a pandas DataFrame?

For example, the name of the tool used to measure the data responsible for the tool, etc.

One way would be to create a column with this information, but it seems useless to store a single piece of information in each row!

+66
python pandas
04 Feb '13 at 13:59
source share
8 answers

Of course, like most Python objects, you can attach new attributes to pandas.DataFrame :

 import pandas as pd df = pd.DataFrame([]) df.instrument_name = 'Binky' 

Please note, however, that although you can attach attributes to a DataFrame, operations performed on a DataFrame (e.g. groupby , pivot , join or loc , to name just a few) can return a new DataFrame without attached metadata. Pandas does not yet have a reliable method for distributing metadata attached to DataFrames .

Saving metadata in a file is possible. You can find an example of how to store metadata in an HDF5 file here .

+62
Feb 04 '13 at 14:03
source share

Just stumbled upon this problem myself. Starting with pandas 0.13, DataFrames have the _metadata attribute for them, which is stored through functions that return new DataFrames. Also, serialization seems to survive just fine (I only tried json, but I think hdf is also covered).

+11
Sep 07 '14 at 23:31
source share

Not really. Although you can add attributes containing metadata to the DataFrame class, as @unutbu mentions, many DataFrame methods return a new DataFrame, so your metadata will be lost. If you need to manipulate your data framework, the best option would be to combine your metadata and DataFrame into another class. See GitHub talk: https://github.com/pydata/pandas/issues/2485

There is currently an open transfer request to add a MetaDataFrame object that would better support metadata.

+10
04 Feb '13 at 14:12
source share

Having arrived at this rather late, I thought it might be useful if you need metadata to continue I / O. There was a relatively new package called h5io that I used to do this.

It should allow you to quickly read / write from HDF5 for several common formats, one of which is a data framework. Thus, you can, for example, put in the dictionary of the dictionary and include metadata as fields in the dictionary. For example:.

 save_dict = dict(data=my_df, name='chris', record_date='1/1/2016') h5io.write_hdf5('path/to/file.hdf5', save_dict) in_data = h5io.read_hdf5('path/to/file.hdf5') df = in_data['data'] name = in_data['name'] etc... 

Another option would be to view a project such as xray , which in some respects is more complex, but I think it allows you to use metadata and is pretty easy to convert to a DataFrame.

+3
Jan 13 '16 at 21:53
source share

As mentioned in other answers and comments, _metadata not part of the public API, so it is definitely not recommended to use it in a production environment. But you can still use it in prototyping research and replace it if it stops working. And now it works with groupby / apply , which is useful. This is an example (which I could not find in other answers):

 df = pd.DataFrame([1, 2, 2, 3, 3], columns=['val']) df.my_attribute = "my_value" df._metadata.append('my_attribute') df.groupby('val').apply(lambda group: group.my_attribute) 

Output:

 val 1 my_value 2 my_value 3 my_value dtype: object 
+3
Nov 09 '16 at 19:35
source share

As mentioned in @choldgraf, I found that xarray is a great tool for linking metadata when comparing data and building results between multiple data frames.

In my work, we often compare the results of several firmware versions and various test scripts, adding this information is as simple as this:

 df = pd.read_csv(meaningless_test) metadata = {'fw': foo, 'test_name': bar, 'scenario': sc_01} ds = xr.Dataset.from_dataframe(df) ds.attrs = metadata 
+2
Sep 28 '18 at 1:04 on
source share

The best answer for attaching arbitrary attributes to a DataFrame is a good one, but if you use a dictionary, list, or tuple, it will give the error "Pandas does not allow columns to be created using the new attribute name." The following solution works for storing arbitrary attributes.

 from types import SimpleNamespace df = pd.DataFrame() df.meta = SimpleNamespace() df.meta.foo = [1,2,3] 
+2
Jan 10 '19 at 22:00
source share

I had the same problem, and I used a workaround to create a new smaller DF from a dictionary with metadata that I wanted to save and orient by index:

  meta = {"name": "Sample Dataframe", "Created": "19/07/2019"} dfMeta = pd.DataFrame.from_dict(meta, orient='index') 

This dfMeta can be saved along with your original DF in marinade, etc.

0
Jul 19 '19 at 13:18
source share



All Articles