I have a custom CSV file that looks something like this:
x,y
1,"(5, 27, 4)"
2,"(3, 1, 6, 2)"
3,"(4, 5)"
Using pd.read_csv()
leads to something that is not all that useful, because tuples are not parsed. There are existing answers that relate to this ( 1 , 2 ), but since these tuples are heterogeneous in length, these answers are not entirely useful for the problem I am facing.
What I would like to do is plot x
vs y
using pandas routines. The naive approach leads to an error because tuples are stored as strings:
>>>
>>> df = pd.DataFrame({'x': [1, 2, 3],
'y': ["(5, 27, 4)","(3, 1, 6, 2)","(4, 5)"]})
>>> df.plot.scatter('x', 'y')
[...]
ValueError: scatter requires y column to be numeric
The result I hope for looks something like this:
import numpy as np
import matplotlib.pyplot as plt
for x, y in zip(df['x'], df['y']):
y = eval(y)
plt.scatter(x * np.ones_like(y), y, color='blue')

Pandas df.plot.scatter()
( eval()
)?