When I try the ipython.org laptop, "INTRODUCTION TO THE PYTHON FOR MOUNTAIN DATA"
The following code:
data = pd.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data-original", delim_whitespace = True, header=None, names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model', 'origin', 'car_name'])
produces the following error:
TypeError: read_csv() got an unexpected keyword argument 'delim-whitespace'
Unfortunately, the dataset file itself is not really csv, and I don't know why they used read_csv () to get their data.
The data looks like this:
14.0 8. 454.0 220.0 4354. 9.0 70. 1. "chevrolet impala"
Python / 2.7 environment on Debian stable w / ipython 0.13 After searching here, I understand that this is most likely a version issue, like the "delim-whitespace" argument, possibly in a later version of the pandas library than the one available for the APT package manager.
I tried several workarounds without success.
Firstly, I tried updating pandas by building from the last source, but I found that I could cascade other dependency assemblies, versions of which need to be updated and can lead to environmental disruption. For example, I had to install Cython, then he said that this is again too old version in the package manager APT, so I would have to rebuild Cython, + other libraries / modules, etc.
Then, with a little look at the API, I tried to use other arguments: using delimiter = '' in the read_csv () call, it breaks the lines inside the quotes into several columns,
ValueError: Expecting 9 columns, got 13 in row 0
I tried using the read_csv() quotechar='"' argument as described in the API, but again it was not recognized (unexpected keyword argument)
Finally, I tried using a different way to upload the file,
data = DataFrame() data.from_csv(url)
I got,
Out[18]: <class 'pandas.core.frame.DataFrame'> Index: 405 entries, 15.0 8. 350.0 165.0 3693. 11.5 70. 1."buick skylark 320" to 31.0 4. 119.0 82.00 2720. 19.4 82. 1. "chevy s-10" Empty DataFrame In [19]: print(data.shape) (0, 9)
w / sep argument for from_csv (),
In [20]: data.from_csv(url,sep=' ')
gives an error
ValueError: Expecting 31 columns, got 35 in row 1 In [21]: print(data.shape) (0, 9)
Also, with the same negative result:
In [32]: data = DataFrame( columns = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration','model', 'origin', 'car_name']) In [33]: data.from_csv(url,sep=', \t')Out[33]: <class 'pandas.core.frame.DataFrame'> Index: 405 entries, 15.0 8. 350.0 165.0 3693. 11.5 70. 1."buick skylark 320" to 31.0 4. 119.0 82.00 2720. 19.4 82. 1. "chevy s-10" Empty DataFrame In [34]: data.head() Out[34]: Empty DataFrame
I tried using ipython3 instead, but it cannot find / load matplotlib since for python3 there is no matplotlib for my system.
Any help with this issue would be greatly appreciated.
source share