Merge more than 32 files in Google Cloud Storage

I have an Apache Spark script running on the Google Compute Engine to output Google cloud storage. I have over 300 part-00XXX files in a Cloud Storage folder. I would like to combine them.

I tried:

poiuytrez@spark-m :~$ gsutil compose gs://mybucket/data/* gs://mybucket/myfile.csv 

But I got this error:

 CommandException: "compose" called with too many component objects. Limit is 32. 

Any ideas on how to merge all abstract file files?

+5
source share
1 answer

You can only compose 32 objects in a single query, but a compound object can contain up to 1024 components. In particular, you could compose objects 0-31 into some object 0 ', 32-63 into 1', etc. - then each of these composite objects can be composed again, making up (0 ', 1', ..., floor (300/32) ').

+5
source

Source: https://habr.com/ru/post/1203934/


All Articles