There is one thing that I have to do quite often, and it surprises me how difficult it is to achieve this in Pandas. Suppose I need to create an empty one DataFramewith the specified index type and name, as well as column types and names. (Maybe I want to fill it out later, for example, in a loop.) The easiest way to do this, which I found is to create an empty pandas.Seriesobject for each column by specifying them dtypes, placing them in a dictionary that defines their names and passes the dictionary to constructor DataFrame. Something like the following.
def create_empty_dataframe():
index = pandas.Index([], name="id", dtype=int)
column_names = ["name", "score", "height", "weight"]
series = [pandas.Series(dtype=str), pandas.Series(dtype=int), pandas.Series(dtype=float), pandas.Series(dtype=float)]
columns = dict(zip(column_names, series))
return pandas.DataFrame(columns, index=index, columns=column_names)
First question. Is this really the easiest way to do this? There are so many things that are confusing about this. What I really want to do, and what I'm sure many people really want to do, is something like the following.
df = pandas.DataFrame(columns=["id", "name", "score", "height", "weight"], dtypes=[int, str, int, float, float], index_column="id")
Second question. Is this kind of syntax possible at all in Pandas? If not, are the developers willing to support something like this? It seems to me that it really should be as simple as that (the above syntax).