Libyan server: returns data frame as JSON?

I execute the instruction on the Livy server using the HTTP POST call localhost:8998/sessions/0/statements , with the following body

 { "code": "spark.sql(\"select * from test_table limit 10\")" } 

I need an answer in the following format

 (...) "data": { "application/json": "[ {"id": "123", "init_date": 1481649345, ...}, {"id": "133", "init_date": 1481649333, ...}, {"id": "155", "init_date": 1481642153, ...}, ]" } (...) 

but i get

 (...) "data": { "text/plain": "res0: org.apache.spark.sql.DataFrame = [id: string, init_date: timestamp ... 64 more fields]" } (...) 

What version of toString() the data block.

Is there a way to return a data frame as JSON using Livy Server?

EDIT

A JIRA problem has been found that fixes the problem: https://issues.cloudera.org/browse/LIVY-72

In the comments, can we say that Livy does not support and does not support such a function?

+5
source share
3 answers

I do not have much experience with Livy, but as far as I know, this endpoint is used as an interactive shell, and the output will be a string with the actual result that the shell will show. Therefore, with this in mind, I can come up with a way to imitate the desired result, but this may not be the best way to do this:

 { "code": "println(spark.sql(\"select * from test_table limit 10\").toJSON.collect.mkString(\"[\", \",\", \"]\"))" } 

Then you will have JSON wrapped in a string, so your client will be able to parse it.

+2
source

I recommend using the built-in (albeit difficult to find documentation) magic of %json and %table :

%json

 session_url = host + "/sessions/1" statements_url = session_url + '/statements' data = { 'code': textwrap.dedent("""\ val d = spark.sql("SELECT COUNT(DISTINCT food_item) FROM food_item_tbl") val e = d.collect %json e """)} r = requests.post(statements_url, data=json.dumps(data), headers=headers) print r.json() 

%table

 session_url = host + "/sessions/21" statements_url = session_url + '/statements' data = { 'code': textwrap.dedent("""\ val x = List((1, "a", 0.12), (3, "b", 0.63)) %table x """)} r = requests.post(statements_url, data=json.dumps(data), headers=headers) print r.json() 

Related: Apache Livy: Spark SQL query via REST: maybe?

+1
source

I think in general it is best to write your output to some kind of database. If you write to a table with an arbitrary name, you can read its code after running the script.

0
source

Source: https://habr.com/ru/post/1261209/


All Articles