Can Spark SQL be counted incorrectly or am I unable to write SQL correctly?

Question

Can Spark SQL be counted incorrectly or am I unable to write SQL correctly?

In a Python notebook on the Databricks "Community Edition," I experiment with the city of San Francisco by discovering 911 emergency calls for firefighters. (An old copy of the data in 2016 used in “Using Apache Spark 2.0 to Analyze the City of San Francisco Open Data” (YouTube) and made available on S3 for this tutorial.)

After setting the data and reading it with an explicitly defined schema in the DataFrame fire_service_calls_df, I jinxed the DataFrame as an SQL table:

sqlContext.registerDataFrameAsTable(fire_service_calls_df, "fireServiceCalls")

With this and the DataFrame API, I can count the types of calls that occurred:

fire_service_calls_df.select('CallType').distinct().count()

Out[n]: 34

... or with SQL in Python:

spark.sql("""
SELECT count(DISTINCT CallType)
FROM fireServiceCalls
""").show()

+------------------------+
|count(DISTINCT CallType)|
+------------------------+
|                      33|
+------------------------+

... or using an SQL cell:

%sql

SELECT count(DISTINCT CallType)
FROM fireServiceCalls

? (, 34 , "35".)

+4

apache-spark pyspark apache-spark-sql pyspark-sql databricks

das-g 13 . '18 20:27

1

das-g · Accepted Answer · 2018-03-13T20:57:18+0000

Spark SQL SQL?

: SQL.

Rule < insert number > SQL: NULL UNDEFINED.

%sql
SELECT count(*)
FROM (
  SELECT DISTINCT CallType
  FROM fireServiceCalls 
)

34

, , -, :

pault

30 , , , .

, . ( .) , , 34 , , SQL DataFrame. , NULL:

+--------------------------------------------+
|CallType                                    |
+--------------------------------------------+
|Elevator / Escalator Rescue                 |
|Marine Fire                                 |
|Aircraft Emergency                          |
|Confined Space / Structure Collapse         |
|Administrative                              |
|Alarms                                      |
|Odor (Strange / Unknown)                    |
|Lightning Strike (Investigation)            |
|null                                        |
|Citizen Assist / Service Call               |
|HazMat                                      |
|Watercraft in Distress                      |
|Explosion                                   |
|Oil Spill                                   |
|Vehicle Fire                                |
|Suspicious Package                          |
|Train / Rail Fire                           |
|Extrication / Entrapped (Machinery, Vehicle)|
|Other                                       |
|Transfer                                    |
|Outside Fire                                |
|Traffic Collision                           |
|Assist Police                               |
|Gas Leak (Natural and LP Gases)             |
|Water Rescue                                |
|Electrical Hazard                           |
|High Angle Rescue                           |
|Structure Fire                              |
|Industrial Accidents                        |
|Medical Incident                            |
|Mutual Aid / Assist Outside Agency          |
|Fuel Spill                                  |
|Smoke Investigation (Outside)               |
|Train / Rail Incident                       |
+--------------------------------------------+

Can Spark SQL be counted incorrectly or am I unable to write SQL correctly?

More articles: