How to import libraries into Spark Notebook

I am having problems importing magellan-1.0.4-s_2.11 in a spark laptop. I downloaded the jar from https://spark-packages.org/package/harsha2010/magellan and tried to place SPARK_HOME/bin/spark-shell --packages harsha2010:magellan:1.0.4-s_2.11 in the Start of Customized Settings section Start of Customized Settings file spark-notebook bin file.

Here is my import

 import magellan.{Point, Polygon, PolyLine} import magellan.coord.NAD83 import org.apache.spark.sql.magellan.MagellanContext import org.apache.spark.sql.magellan.dsl.expressions._ import org.apache.spark.sql.Row import org.apache.spark.sql.types._ 

And my mistakes ...

 <console>:71: error: object Point is not a member of package org.apache.spark.sql.magellan import magellan.{Point, Polygon, PolyLine} ^ <console>:72: error: object coord is not a member of package org.apache.spark.sql.magellan import magellan.coord.NAD83 ^ <console>:73: error: object MagellanContext is not a member of package org.apache.spark.sql.magellan import org.apache.spark.sql.magellan.MagellanContext 

Then I tried to import a new library, like any other library, placing it in the main script like this:

 $lib_dir/magellan-1.0.4-s_2.11.jar" 

This did not work, and I was left scratching my head, thinking that I had done wrong. How to import libraries like magellan into spark pad

+5
source share
3 answers

Try to evaluate something like

 :dp "harsha2010" % "magellan" % "1.0.4-s_2.11" 

It will load the library into Spark, allowing it to be import ed - assuming that it can be obtained through the Maven repository. In my case, this failed with the message:

 failed to load 'harsha2010:magellan:jar:1.0.4-s_2.11 (runtime)' from ["Maven2 local (file:/home/dev/.m2/repository/, releases+snapshots) without authentication", "maven-central (http://repo1.maven.org/maven2/, releases+snapshots) without authentication", "spark-packages (http://dl.bintray.com/spark-packages/maven/, releases+snapshots) without authentication", "oss-sonatype (https://oss.sonatype.org/content/repositories/releases/, releases+snapshots) without authentication"] into /tmp/spark-notebook/aether/b2c7d8c5-1f56-4460-ad39-24c4e93a9786 

I think the file was large and the connection was interrupted before the entire file could be downloaded.

Bypass

So, I downloaded the JAR manually from:

 http://dl.bintray.com/spark-packages/maven/harsha2010/magellan/1.0.4-s_2.11/ 

and copied it to:

 /tmp/spark-notebook/aether/b2c7d8c5-1f56-4460-ad39-24c4e93a9786/harsha2010/magellan/1.0.4-s_2.11 

And then the team worked :dp . Try calling it first, and if it fails to copy the JAR to the correct path so that everything can work.

The best solution

I have to find out why the download failed to fix in the first place ... or put this library in a local M2 repo. But that should make you go.

+1
source

I would suggest checking this out:

https://github.com/spark-notebook/spark-notebook/blob/master/docs/metadata.md#import-download-dependencies

and

https://github.com/spark-notebook/spark-notebook/blob/master/docs/metadata.md#add-spark-packages

I think the magic command :dp depreciating, you should add your custom dependencies to the laptop metadata instead. You can go to the menu "Edit"> "Edit metadata", add something like:

 "customDeps": [ "harsha2010 % magellan % 1.0.4-s_2.11" ] 

After completion, you need to restart the kernel, you can check the browser console if the package loads properly.

0
source

In a simple way, you need to set or add the EXTRA_CLASSPATH environment variable to point to your .jar file loaded: export EXTRA_CLASSPATH = </link/to/your.jar> or set EXTRA_CLASSPATH= </link/to/your.jar> in wondows OS. Here you will find a detailed solution .

0
source

Source: https://habr.com/ru/post/1265217/


All Articles