I am trying to connect the spark shell of amazon hadoop, but I always give the following error and do not know how to fix it or configure something that is missing.
spark.yarn.jars, spark.yarn.archive
spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/08/12 07:47:26 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
16/08/12 07:47:28 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
thank!!!
Error1
I am trying to run an SQL query, something completely simple:
val sqlDF = spark.sql("SELECT col1 FROM tabl1 limit 10")
sqlDF.show()
WARN YarnScheduler: The initial job did not receive any resources; Check your cluster user interface to make sure workers are registered and have sufficient resources.
Error2
Then I try to run a scala script, something simple in:
https://blogs.aws.amazon.com/bigdata/post/Tx2D93GZRHU3TES/Using-Spark-SQL-for-ETL
import org.apache.hadoop.io.Text;
import org.apache.hadoop.dynamodb.DynamoDBItemWritable
import com.amazonaws.services.dynamodbv2.model.AttributeValue
import org.apache.hadoop.dynamodb.read.DynamoDBInputFormat
import org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat
import org.apache.hadoop.mapred.JobConf
import org.apache.hadoop.io.LongWritable
import java.util.HashMap
var ddbConf = new JobConf(sc.hadoopConfiguration)
ddbConf.set("dynamodb.output.tableName", "tableDynamoDB")
ddbConf.set("dynamodb.throughput.write.percent", "0.5")
ddbConf.set("mapred.input.format.class", "org.apache.hadoop.dynamodb.read.DynamoDBInputFormat")
ddbConf.set("mapred.output.format.class", "org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat")
var genreRatingsCount = sqlContext.sql("SELECT col1 FROM table1 LIMIT 1")
var ddbInsertFormattedRDD = genreRatingsCount.map(a => {
var ddbMap = new HashMap[String, AttributeValue]()
var col1 = new AttributeValue()
col1.setS(a.get(0).toString)
ddbMap.put("col1", col1)
var item = new DynamoDBItemWritable()
item.setItem(ddbMap)
(new Text(""), item)
}
)
ddbInsertFormattedRDD.saveAsHadoopDataset(ddbConf)
scala.reflect.internal.Symbols $CyclicReference: InterfaceAudience scala.reflect.internal.Symbols $Symbol $$ anonfun $info $3.apply(Symbols.scala: 1502) at scala.reflect.internal.Symbols $Symbol $$ anonfun $info $3.apply(Symbols.scala: 1500) scala.Function0 $class.apply $mcV $sp (Function0.scala: 34)