Pandas.read_csv on Spark (IBM Bluemix)

I use IPythonin the environmentSpark/Bluemix

I have csv loaded in the object store and I can read it in order using sc.textfile, but I get file does not existwhen I usepandas pd.read_csv

  • data = sc.textFile("swift://notebooks.books/rtenews.csv")

  • import pandas as pd data = pd.read_csv('swift://notebooks.books/rtenews.csv')

IOError File swift://notebooks.books/rtenews.csv does not exist

Why is this? How can I read csv file in pandasdataframe?

+4
source share
1 answer

After you have downloaded the CSV file to the Bluemix object repository, you can directly read the CSV file using Spark:

data = sc.textFile("swift://notebooks.books/rtenews.csv")

This is possible because configurations have been made to enable this feature.

If you try to read the CSV file with the following code using pandas:

import pandas as pd 
data = pd.read_csv('swift://notebooks.books/rtenews.csv')

, pandas Bluemix. API pandas.read_csv(): http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html URL.

, CSV Bluemix StringIO pandas.DataFrame.

" ":

CSV!

+2

Source: https://habr.com/ru/post/1622270/


All Articles