Ipython pandas TypeError: read_csv () received unexpected keyword argument 'delim-whitespace' '

When I try the ipython.org laptop, "INTRODUCTION TO THE PYTHON FOR MOUNTAIN DATA"

The following code:

data = pd.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data-original", delim_whitespace = True, header=None, names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model', 'origin', 'car_name']) 

produces the following error:

  TypeError: read_csv() got an unexpected keyword argument 'delim-whitespace' 

Unfortunately, the dataset file itself is not really csv, and I don't know why they used read_csv () to get their data.

The data looks like this:

  14.0 8. 454.0 220.0 4354. 9.0 70. 1. "chevrolet impala" 

Python / 2.7 environment on Debian stable w / ipython 0.13 After searching here, I understand that this is most likely a version issue, like the "delim-whitespace" argument, possibly in a later version of the pandas library than the one available for the APT package manager.

I tried several workarounds without success.

  • Firstly, I tried updating pandas by building from the last source, but I found that I could cascade other dependency assemblies, versions of which need to be updated and can lead to environmental disruption. For example, I had to install Cython, then he said that this is again too old version in the package manager APT, so I would have to rebuild Cython, + other libraries / modules, etc.

  • Then, with a little look at the API, I tried to use other arguments: using delimiter = '' in the read_csv () call, it breaks the lines inside the quotes into several columns,

     ValueError: Expecting 9 columns, got 13 in row 0 
  • I tried using the read_csv() quotechar='"' argument as described in the API, but again it was not recognized (unexpected keyword argument)

  • Finally, I tried using a different way to upload the file,

     data = DataFrame() data.from_csv(url) 

    I got,

     Out[18]: <class 'pandas.core.frame.DataFrame'> Index: 405 entries, 15.0 8. 350.0 165.0 3693. 11.5 70. 1."buick skylark 320" to 31.0 4. 119.0 82.00 2720. 19.4 82. 1. "chevy s-10" Empty DataFrame In [19]: print(data.shape) (0, 9) 
  • w / sep argument for from_csv (),

     In [20]: data.from_csv(url,sep=' ') 

    gives an error

     ValueError: Expecting 31 columns, got 35 in row 1 In [21]: print(data.shape) (0, 9) 
  • Also, with the same negative result:

     In [32]: data = DataFrame( columns = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration','model', 'origin', 'car_name']) In [33]: data.from_csv(url,sep=', \t')Out[33]: <class 'pandas.core.frame.DataFrame'> Index: 405 entries, 15.0 8. 350.0 165.0 3693. 11.5 70. 1."buick skylark 320" to 31.0 4. 119.0 82.00 2720. 19.4 82. 1. "chevy s-10" Empty DataFrame In [34]: data.head() Out[34]: Empty DataFrame 
  • I tried using ipython3 instead, but it cannot find / load matplotlib since for python3 there is no matplotlib for my system.

Any help with this issue would be greatly appreciated.

+1
source share
2 answers

Oddly enough, the delim_whitespace parameter appears in the Pandas documentation in the method summary, but not in the parameter list. Try replacing it with delimiter = r'\s+' , which is equivalent to what I assume the authors meant.

CSV refers to comma separated values, but is often used to refer to common text delimited formats. TSV (values ​​separated by tabs) is another option; in this case, these are basically values ​​separated by spaces.

+2
source

Your code uses delim_whitespace , but delim-whitespace indicated in the error message. The former exists, the latter does not.

If the data file contains

  14.0 8. 454.0 220.0 4354. 9.0 70. 1. "chevrolet impala" 

and you define data with

 data = pd.read_csv('data', delim_whitespace = True, header=None, names = ['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model', 'origin', 'car_name']) 

then the DataFrame is successfully parsed:

  mpg cylinders displacement horsepower weight acceleration model \ 0 14 8 454 220 4354 9 70 origin car_name 0 1 chevrolet impala 

So, you just changed the hyphen to an underscore.


Note that specifying delim_whitespace=True uses a pure Python analyzer. In this case, I do not think it is necessary. Using delimiter=r'\s+' , as suggested by Steve Howard, is likely to work better. ( The source code says : “The C engine is faster, while the python engine is currently more fully functional,” but I think the only function that the python engine has is that the C engine does not matter skipfooter .)

+2
source

Source: https://habr.com/ru/post/981447/


All Articles