How to activate spark search?

Question

How to activate spark search?

Using HTTP POST, the following script can insert a new field createtimeor update lastupdatetime:

curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
"doc": {
    "lastupdatetime": "2015-09-16T18:00:00"
}
"upsert" : {
    "createtime": "2015-09-16T18:00:00"
    "lastupdatetime": "2015-09-16T18:00",
}
}'

But in the spark script, after installation "es.write.operation": "upsert", I do not know how to paste createtimeat all. There is onlyes.update.script.* in the white paper ... So, can anyone give me an example?

The UPDATE . In my case, I want to save information about Android devices from the log into one elasticsearch type and set the time of first appearance as createtime. If the device appears again, I update only lastupdatetime, but leave it createtimeas it is.

So the document idis an Android identifier, if it exists, refresh lastupdatetime, otherwise insert createtimeand lastupdatetime. So here is the parameter (in python):

conf = {
    "es.resource.write": "stats-device/activation",
    "es.nodes": "NODE1:9200",
    "es.write.operation": "upsert",
    "es.mapping.id": "id"
    # ???
}

rdd.saveAsNewAPIHadoopFile(
    path='-',
    outputFormatClass="org.elasticsearch.hadoop.mr.EsOutputFormat",
    keyClass="org.apache.hadoop.io.NullWritable",
    valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
    conf=conf
)

I just don't know how to insert a new field if it iddoesn't exist.

+4

hadoop elasticsearch apache-spark pyspark

Terran Sep 16 '15 at 10:20

source share

2 answers

Spark script, . elasticsearch-hadoop ( Build.sbt, ) script :

import org.elasticsearch.spark._ 
val documents = sc.parallelize(Seq(Map(
                                   "id" -> 1, 
                                   "createtime" -> "2015-09-16T18:00:00"
                                   "lastupdatetime" -> "2015-09-16T18:00"),
                                  Map(<next document>), ...)
                   .saveToEs("test/type1", Map("es.mapping.id" -> "id"))

. saveToES , RDD ElasticSearch.

, Spark, , , , RDD → script. , , .

+3

Metropolis 19 . '15 14:16

Terran · Accepted Answer · 2015-09-22T03:07:22+0000

Finally, I got a solution that is not perfect:

add createtimeto all source documents;
save to es using the method createand ignore an already created error;
delete field createtime;
save es again using the method update;

(2015-09-27), 2 .

How to activate spark search?

More articles: