If you really want the DataFrame to behave as immutable, instead of using the copy by @Joop solution (which I would recommend), you could build the following structure.
Please note that this is just a starting point.
Basically, it is a proxy data object that hides everything that would change state and allow hashing, and all instances of the same source data will have the same hash. There are probably modules that make it colder, but I decided it could be an educational example.
Some warnings:
Depending on how the string representation of the proxy object is built, two different proxied objects can receive the same hash, the howerver implementation is compatible with the DataFrame among other objects.
Changes to the source object will affect the proxy object.
Uniformity will lead to some unpleasant unresolved recursions if another object returns an equality question (which is why list has a special case).
The DataFrame proxy maker helper is just the beginning, the problem is that any method that changes the state of the original object cannot be resolved or must be manually overwritten by the helper or the extraFilter parameter is completely masked when creating the _ReadOnly instance. See DataFrameProxy.sort .
Proxies will not be displayed as derived from the proxied type.
General proxy file to read
It can be used for any object.
import md5 import warnings class _ReadOnly(object): def __init__(self, obj, extraFilter=tuple()): self.__dict__['_obj'] = obj self.__dict__['_d'] = None self.__dict__['_extraFilter'] = extraFilter self.__dict__['_hash'] = int(md5.md5(str(obj)).hexdigest(), 16) @staticmethod def _cloak(obj): try: hash(obj) return obj except TypeError: return _ReadOnly(obj) def __getitem__(self, value): return _ReadOnly._cloak(self._obj[value]) def __setitem__(self, key, value): raise TypeError( "{0} has a _ReadOnly proxy around it".format(type(self._obj))) def __delitem__(self, key): raise TypeError( "{0} has a _ReadOnly proxy around it".format(type(self._obj))) def __getattr__(self, value): if value in self.__dir__(): return _ReadOnly._cloak(getattr(self._obj, value)) elif value in dir(self._obj): raise AttributeError("{0} attribute {1} is cloaked".format( type(self._obj), value)) else: raise AttributeError("{0} has no {1}".format( type(self._obj), value)) def __setattr__(self, key, value): raise TypeError( "{0} has a _ReadOnly proxy around it".format(type(self._obj))) def __delattr__(self, key): raise TypeError( "{0} has a _ReadOnly proxy around it".format(type(self._obj))) def __dir__(self): if self._d is None: self.__dict__['_d'] = [ i for i in dir(self._obj) if not i.startswith('set') and i not in self._extraFilter] return self._d def __repr__(self): return self._obj.__repr__() def __call__(self, *args, **kwargs): if hasattr(self._obj, "__call__"): return self._obj(*args, **kwargs) else: raise TypeError("{0} not callable".format(type(self._obj))) def __hash__(self): return self._hash def __eq__(self, other): try: return hash(self) == hash(other) except TypeError: if isinstance(other, list): try: return all(zip(self, other)) except: return False return other == self
DataFrame Proxy
It should be expanded with more methods, such as sort and filtering all other state-changing methods that are of no interest.
You can either create an instance of DataFrame -instance as a single argument, or give it arguments as you would need to create a DataFrame
import pandas as pd class DataFrameProxy(_ReadOnly): EXTRA_FILTER = ('drop', 'drop_duplicates', 'dropna') def __init__(self, *args, **kwargs): if (len(args) == 1 and not len(kwargs) and isinstance(args, pd.DataFrame)): super(DataFrameProxy, self).__init__(args[0], DataFrameProxy.EXTRA_FILTER) else: super(DataFrameProxy, self).__init__(pd.DataFrame(*args, **kwargs), DataFrameProxy.EXTRA_FILTER) def sort(self, inplace=False, *args, **kwargs): if inplace: warnings.warn("Inplace sorting overridden") return self._obj.sort(*args, **kwargs)
Finally:
However, despite the fact that the creation of this device is fun, why not just have a DataFrame that you are not DataFrame ? If it is available only to you, it is better to just make sure that you do not change it ...