Short answer
There is a quirk in ordering arguments, where --packages not accepted spark-submit if it comes after the argument my_job.py . To get around this, you can do the following when sending from the Dataproc CLI:
gcloud beta dataproc jobs submit pyspark --cluster <my-dataproc-cluster> \ --properties spark.jars.packages=com.databricks:spark-csv_2.11:1.2.0 my_job.py
Basically, just add --properties spark.jars.packages=com.databricks:spark-csv_2.11:1.2.0 before the .py file in your command.
Long answer
So this is actually a different issue than the well-known lack of support --jars in gcloud beta dataproc jobs submit pyspark ; that without Dataproc explicitly recognizes --packages as a special spark-submit -level flag, it tries to pass it after the application arguments, so spark-submit allows --packages fail as an application argument, and not correctly parse it as a feed level option. Indeed, in an SSH session, the following does not work:
But switching the order of the arguments works again, although in the case of pyspark both orders work:
# Works with dependencies on that package. spark-submit --packages com.databricks:spark-csv_2.11:1.2.0 job.py pyspark job.py --packages com.databricks:spark-csv_2.11:1.2.0 pyspark --packages com.databricks:spark-csv_2.11:1.2.0 job.py
Thus, while spark-submit job.py supposed to be a replacement for everything previously called pyspark job.py , the difference in parsing ordering for things like --packages means that this is not really a 100% compatible migration. This may be something that should be undertaken on the spark side.
Anyway, fortunately, there is a workaround, since --packages is just another alias for the Spark spark.jars.packages , and the Dataproc CLI supports the properties just fine. Therefore, you can simply do the following:
gcloud beta dataproc jobs submit pyspark --cluster <my-dataproc-cluster> \ --properties spark.jars.packages=com.databricks:spark-csv_2.11:1.2.0 my_job.py
Please note that --properties must be before my_job.py , otherwise it will be sent as an application argument, and not as a configuration flag. Hope this works for you! Note that the equivalent in an SSH session will be spark-submit --packages com.databricks:spark-csv_2.11:1.2.0 job.py