Pandas DataFrame: full specification for __getitem __ ()?

Short version

For pandas Dataframe.__getitem__() , what are the valid input data (input types really) and what are the results of the result?

More details

Description of the problem

I would like to write code that makes full use of DataFrame [], essentially a Dataframe.__getitem__() . To this end, I would like to receive information about the I / O results at the level of details found on the API page, although it is not available for this method.

What has been done so far to resolve it

I searched for the full specification for this function on the Pandas API page. Although many other methods are documented, there is no Dataframe.__getitem__() .

I also reviewed the tutorial , but I do not consider it an attempt to be exhaustive.

I looked at the source code for the Dataframe.__getitem__() (first skip this as described in my own answer below). Obviously, many different types can be taken as input, but reverse engineering the code to determine what happens when each of these types is passed seems to not be the intended way to master this method.

Additional background

Pandas is one of the most important libraries in the role of Python in science and statistics, the DataFrame is perhaps the most central object in Pandas, and the [] operator is arguably the most central method in the DataFrame. Therefore, in fact, the answer to the question that I posted here has a very high pedagogical value, and not just usefulness for me.

+6
source share
1 answer

I suspect that part of the lack of a document for this function is due to the lack of comments in the source in the source, now that I look at it. If no one comes up with anything more user-friendly, here is the real DataFrame.__getitem__() method:

 def __getitem__(self, key): # shortcut if we are an actual column is_mi_columns = isinstance(self.columns, MultiIndex) try: if key in self.columns and not is_mi_columns: return self._getitem_column(key) except: pass # see if we can slice the rows indexer = _convert_to_index_sliceable(self, key) if indexer is not None: return self._getitem_slice(indexer) if isinstance(key, (Series, np.ndarray, list)): # either boolean or fancy integer index return self._getitem_array(key) elif isinstance(key, DataFrame): return self._getitem_frame(key) elif is_mi_columns: return self._getitem_multilevel(key) else: return self._getitem_column(key) 

... which, at the very least, gives a breakdown to the top level of the key (index) types that the DataFrame [] accepts.

+2
source

Source: https://habr.com/ru/post/981176/


All Articles