Failed to read csv file uploaded to Google Cloud Storage bucket

Purpose. To read the csv file loaded into the Google storage bucket.

Environment. Launch Jupyter notebook using an SSH instance on the Master node. Using python on a Jupyter laptop, trying to access a simple csv file loaded into a Google storage bucket.

Approaches -

1st approach - Write a simple python program

Wrote the following program

import csv
f = open('gs://python_test_hm/train.csv' , 'rb' ) 
csv_f = csv.reader(f)
for row in csv_f
     print row

Results - "No such file or directory" error message

Second approach. Using gcloud Package tried to access train.csv file. An example code is shown below. The code below is not the actual code. The file in Google Cloud Storage in my version of the code was mentioned in "gs: ///Filename.csv" Results - Error message "There is no such file or directory"

Download data from CSV

import csv
from gcloud import bigquery
from gcloud.bigquery import SchemaField
client = bigquery.Client()
dataset = client.dataset('dataset_name')
dataset.create()  # API request

SCHEMA = [
    SchemaField('full_name', 'STRING', mode='required'),
    SchemaField('age', 'INTEGER', mode='required'),
 ]
table = dataset.table('table_name', SCHEMA)
table.create()

with open('csv_file', 'rb') as readable:
    table.upload_from_file(
        readable, source_format='CSV', skip_leading_rows=1)

The third approach is

import csv
import urllib

url = 'https://storage.cloud.google.com/<bucket>/train.csv'


response = urllib.urlopen(url)
cr = csv.reader(response)
print cr

for row in cr:
    print row

Results. The above code does not lead to any error, but displays the XML content on the google page, as shown below. I am interested in viewing the csv data of a train file.

['<!DOCTYPE html>']
['<html lang="en">']
['  <head>']
['  <meta charset="utf-8">']
['  <meta content="width=300', ' initial-scale=1" name="viewport">']
['  <meta name="google-site-verification" content="LrdTUW9psUAMbh4Ia074-   BPEVmcpBxF6Gwf0MSgQXZs">']
['  <title>Sign in - Google Accounts</title>']

Can someone shed light on what may be wrong here, and how can I achieve my goal? Your help is much appreciated.

Many thanks for your help!

+4
2

, Jupyter, Google Cloud Platform (GCP)? Google Cloud SDK ( ).

2 Google (GCS):

 
from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket('python_test_hm')
blob = bucket.blob('train.csv')
blob.upload_from_string('this is test content!')

GCS:

 
from google.cloud import storage
client = storage.Client()
bucket = client.get_bucket('python_test_hm')
blob = storage.Blob('train.csv', bucket)
content = blob.download_as_string()
+3

, , - , , , . , , auth, - , /.

0

Source: https://habr.com/ru/post/1652082/


All Articles