We are currently creating a reporting platform as a data warehouse in which we used Shark. Since the development of Shark is discontinued, we are in the evaluation phase of Spark SQL. Based on the use cases, we had several questions.
1) We have data from different sources (MySQL, Oracle, Cassandra, Mongo). We would like to know how we can get this data in Spark SQL? Is there any utility we can use? Does this service support continuous updating of data (synchronization of a new add / update / delete to a data warehouse in Spark SQL?
2) Is there a way to create multiple databases in Spark SQL?
3) We use Jasper to represent the user interface, we would like to connect from Jasper to Spark SQL. When we did our initial search, we learned that there is currently no consumer support for connecting Spark SQL via JDBC, but in future releases you would like to add the same. We would like to know when Spark SQL will have a stable version that will support JDBC? Meanwhile, we took the source code https://github.com/amplab/shark/tree/sparkSql , but it was difficult for us to install it locally and evaluate it. It would be great if you could help us with the installation instructions. (I can share the question that we are faced with, please let me know where I can publish error logs)
4) We will also need an SQL prompt in which we can execute queries; currently Spark Shell provides a SCALA prompt where SCALA code can be executed, from SCALA code we can run SQL queries. Like Shark, we would like to have a SQL query in Spark SQL. When we did our search, we found that in a future version of Spark this would be added. It would be great if you could tell us which edition of Spark will touch the same.
source share