Spark Sql unable to query for multiple possible values ​​in an array

I have a LinkeIn account credential as shown below. I need to request skills that are in a for array, where the array can contain JAVA OR Java or Java or a JAVA developer or Java developer.

Linkedin dataschema

Dataset<Row> sqlDF = spark.sql("SELECT * FROM people"
            + " WHERE ARRAY_CONTAINS(skills,'Java') "
            + " OR ARRAY_CONTAINS(skills,'JAVA')"
            + " OR ARRAY_CONTAINS(skills,'Java developer') "
            + "AND ARRAY_CONTAINS(experience['description'],'Java developer')"  );

The above query is what I tried, and please suggest a better way. as well as how to use the insight question as a request?

+4
source share
1 answer
df.printschema()

root
 |-- skills: array (nullable = true)
 |    |-- element: string (containsNull = true)


df.show()

+--------------------+
|              skills|
+--------------------+
|        [Java, java]|
|[Java Developer, ...|
|               [dev]|
+--------------------+

Now let's register it as a temporary table:

>>> df.registerTempTable("t")

, LIKE:

>>> res = sqlContext.sql("select skills, lower(skill) as skill from (select skills, explode(skills) skill from t) a where lower(skill) like '%java%'")
>>> res.show()
+--------------------+--------------+
|              skills|         skill|
+--------------------+--------------+
|        [Java, java]|          java|
|        [Java, java]|          java|
|[Java Developer, ...|java developer|
|[Java Developer, ...|      java dev|
+--------------------+--------------+

.

+2

Source: https://habr.com/ru/post/1655021/


All Articles