How to connect to a Netezza database from Spark SQLContext

Question

How to connect to a Netezza database from Spark SQLContext

I have a Spark instance and am trying to connect to an existing Netezza data type to retrieve some data.

Using SparkSQL SQLContext, and in accordance with the Spark SQL Programming Guide , this is possible using the readmethod. I decided that I need to provide the JDBC driver using the flag --jars, and not SPARK_CLASSPATH, as in the documentation. Operation looks like

// pyspark
df = sqlContext.read.format('jdbc').options( ... ).load()

// spark-shell
val df = sqlContext.read.format("jdbc").options( ... ).load()

I can find documentation on connecting to Netezza using JDBC, but not how to pass the username and password correctly. What "options" do I need to pass here?

+4

jdbc apache-spark apache-spark-sql netezza

Kirk Broadhurst 25 sept. '15 at 17:01

source share

1 answer

Kirk Broadhurst · Accepted Answer · 2015-09-25T17:01:56+0000

IN pyspark

df = sqlContext.read.format('jdbc').options(url='jdbc:netezza://server1:5480/DATABASE', \
    user='KIRK', password='****', dbtable='SCHEMA.MYTABLE', \
    driver='org.netezza.Driver').load()

and in spark-shell

val df = sqlContext.read.format("jdbc").options(Map(
             "url" -> "jdbc:netezza://server1:5480/DATABASE", 
             "user" -> "KIRK", 
             "password" -> "****", 
             "dbtable" -> "SCHEMA.MYTABLE", 
             "driver" -> "org.netezza.Driver")).load()

Please note that Netezza loves things in ALL CAPS. I do not know if this is necessary, but it does not hurt.

How to connect to a Netezza database from Spark SQLContext

More articles: