How to use connection pool for postgresql in spark mode

Question

How to use connection pool for postgresql in spark mode

I have a spark (1.2.1 v) that inserts the contents of rdd into postgres using postgresql.Driver for scala

rdd.foreachPartition(iter => {

        //connect to postgres database on the localhost
        val driver = "org.postgresql.Driver"
        var connection:Connection = null
        Class.forName(driver)
        connection = DriverManager.getConnection(url, username, password)
        val statement = connection.createStatement()

        iter.foreach(row => {
            val mapRequest = Utils.getInsertMap(row)
            val query = Utils.getInsertRequest(squares_table, mapRequest)

            try { statement.execute(query) } 
            catch {
                case pe: PSQLException => println("exception caught: " + pe);
            }
        })
        connection.close()
})

In the above code, I open a new connection for postgres for each rdd section and close it. I think the correct way would be to use the connection pool for postgres, from which I can take the connections (as described here ), but its just pseudo-code:

rdd.foreachPartition { partitionOfRecords =>
// ConnectionPool is a static, lazily initialized pool of connections
val connection = ConnectionPool.getConnection()
partitionOfRecords.foreach(record => connection.send(record))
ConnectionPool.returnConnection(connection)  // return to the pool for future reuse
}

What is the correct way to connect to postgres with a spark connection pool?

+4

scala postgresql apache-spark

Rada reshef Jan 05 '16 at 17:45

source share

1 answer

giasuddin · Answer 1 · 2018-12-18T12:04:38+0000

spark 2 scala. jdbc spark.

Maven, . pom

    <dependency>
        <groupId>postgresql</groupId>
        <artifactId>postgresql</artifactId>
        <version>9.1-901-1.jdbc4</version>
    </dependency>

scala

import org.apache.spark.sql.SparkSession

object PostgresConnection {
  def main(args: Array[String]) {
    val spark =
        SparkSession.builder()
        .appName("DataFrame-Basic")
        .master("local[4]")
        .getOrCreate()

   val prop = new java.util.Properties
   prop.setProperty("driver","org.postgresql.Driver")
   prop.setProperty("user", "username")
   prop.setProperty("password", "password")
  val url = "jdbc:postgresql://127.0.0.1:5432/databaseName"
  val df = spark.read.jdbc(url, "table_name",prop)
  println(df.show(5))
 }
}

How to use connection pool for postgresql in spark mode

More articles: