Pandas.read_csv on Spark (IBM Bluemix)

Question

I use IPythonin the environmentSpark/Bluemix

I have csv loaded in the object store and I can read it in order using sc.textfile, but I get file does not existwhen I usepandas pd.read_csv

data = sc.textFile("swift://notebooks.books/rtenews.csv")
import pandas as pd data = pd.read_csv('swift://notebooks.books/rtenews.csv')

IOError File swift://notebooks.books/rtenews.csv does not exist

Why is this? How can I read csv file in pandasdataframe?

+4

subiman Dec 30 '15 at 20:34

1 answer

Sven Hafeneger · Accepted Answer · 2016-01-04T10:11:38+0000

After you have downloaded the CSV file to the Bluemix object repository, you can directly read the CSV file using Spark:

data = sc.textFile("swift://notebooks.books/rtenews.csv")

This is possible because configurations have been made to enable this feature.

If you try to read the CSV file with the following code using pandas:

import pandas as pd 
data = pd.read_csv('swift://notebooks.books/rtenews.csv')

, CSV Bluemix StringIO pandas.DataFrame.

" ":

CSV!