If I have DataFramewhere each row is an individual and each separate column attribute, how can I get a new DataFrameone that displays each person in multiple results?
I tried to do this with DataFrame.apply()one that seems most intuitive, but it gives exceptions, as in the example below. Adding broadcast=Falseor reduce=Falsenot helping.
The following is a trivial example, obviously, but consider any scenario in which each line maps to multiple lines. What is the best way to handle this? In fact, each row can display different results. This is basically a one-to-many relationship calculation.
Example : I have a data set DataFramewith the following structure, where I want, for each person, to get three upcoming birthdays (a trivial example, I know). So from:
+---+-------+------------+
| | name | birthdate |
+---+-------+------------+
| 1 | John | 1990-01-01 |
| 2 | Jane | 1957-04-03 |
| 3 | Max | 1987-02-03 |
| 4 | David | 1964-02-12 |
+---+-------+------------+
to something like:
+-------+------------+
| name | birthday |
+-------+------------+
| John | 2016-01-01 |
| John | 2017-01-01 |
| John | 2018-01-01 |
| Jane | 2016-04-03 |
| Jane | 2017-04-03 |
| Jane | 2018-04-03 |
| Max | 2016-02-03 |
| Max | 2017-02-03 |
| Max | 2018-02-03 |
| David | 2016-02-12 |
| David | 2017-02-12 |
| David | 2018-02-12 |
+-------+------------+
Intuitively, I would try to do something like this:
def get_birthdays(person):
birthdays = []
for year in range(2016, 2019):
birthdays.append({
'name': person.name,
'birthday': person.birthdate.replace(year=year)
})
return pd.DataFrame(birthdays)
data.apply(get_birthdays, axis=1)
However, this increases:
ValueError: could not broadcast input array from shape (3,2) into shape (3)
During handling of the above exception, another exception occurred:
[...]
ValueError: cannot copy sequence with size 2 to array axis with dimension 3