Case Insensitive pandas.concat

How can I execute case insensitive pandas.concat?

df1 = pd.DataFrame({"a":[1,2,3]},index=["a","b","c"]) df2 = pd.DataFrame({"b":[1,2,3]},index=["a","b","c"]) df1a = pd.DataFrame({"A":[1,2,3]},index=["A","B","C"]) pd.concat([df1, df2],axis=1) ab a 1 1 b 2 2 c 3 3 

but this does not work:

 pd.concat([df1, df1a],axis=1) a A A NaN 1 B NaN 2 C NaN 3 a 1 NaN b 2 NaN c 3 NaN 

Is there an easy way to do this?

I have the same question for concat on Series .

This works for a DataFrame :

 pd.DataFrame([11,21,31],index=pd.MultiIndex.from_tuples([("A",x) for x in ["a","B","c"]])).rename(str.lower) 

but this does not work for Series :

 pd.Series([11,21,31],index=pd.MultiIndex.from_tuples([("A",x) for x in ["a","B","c"]])).rename(str.lower) TypeError: descriptor 'lower' requires a 'str' object but received a 'tuple' 

To rename DataFrames use:

 def rename_axis(self, mapper, axis=1): index = self.axes[axis] if isinstance(index, MultiIndex): new_axis = MultiIndex.from_tuples([tuple(mapper(y) for y in x) for x in index], names=index.names) else: new_axis = Index([mapper(x) for x in index], name=index.name) 

whereas when renaming Series :

 result.index = Index([mapper_f(x) for x in self.index], name=self.index.name) 

so my updated question is how to rename / register without regard to sequence with series?

+4
source share
2 answers

You can do this through rename :

 pd.concat([df1, df1a.rename(index=str.lower)], axis=1) 

EDIT

If you want to do this using the MultiIndex ed Series , you will need to install it manually, for now. There 's a bug report in pandas The GitHub repository is waiting for a fix (thanks @ViktorKerkez).

 s.index = pd.MultiIndex.from_tuples(s.index.map(lambda x: tuple(map(str.lower, x)))) 

You can replace str.lower with any function that you want to use to rename your index .

Note that reindex cannot be used here at all, because it tries to find values ​​with a renamed index, and thus it will return nan values ​​if your rename does not change the original index .

+3
source

For MultiIndexed Series objects, if this is not an error, you can do:

 s.index = pd.MultiIndex.from_tuples( s.index.map(lambda x: tuple(map(str.lower, x))) ) 
+1
source

Source: https://habr.com/ru/post/1497717/


All Articles