Amazon s3 file renaming and rewriting, recommendations and risks

I have a bucket with two types of file names:

  • [Bucket]/[file]
  • [Bucket]/[folder]/[file]

For example, I could:

  • MyBucket/bar
  • MyBucket/foo/bar

I want to rename all the [Bucket]/[folder]/[file] files to the [Bucket]/[file] files (and thus overwrite / discard the [Bucket]/[file] files).
As in the previous example, I want MyBucket/foo/bar become MyBucket/bar (and overwrite / duscard the original MyBucket/bar ).

I tried two methods:

  • Using the s3cmd move command: s3cmd mv s3://MyBucket/foo/bar s3://MyBucket/bar
  • Using the Amazon SDK for php: rename(s3://MyBucket/foo/bar, s3://MyBucket/bar)

Both methods seem to work, but - given that I have to do this as a batch process on thousands of files,
my questions:

  • Which method is preferred?
  • Are there any other better methods?
  • Should I delete old files before moving / renaming? (it seems to work fine without him, but I may not be aware of the risks)

Thanks.

+6
source share
2 answers

Since I asked this question about 5 months ago, I had some time to get some ideas; so I will answer myself:

From what I saw, there is no significant difference in performance . I can imagine that calling s3cmd from PHP can be expensive due to an external process call for each request; but again, the Amazon SDK uses cURL to send requests, so the difference is not very big.

One difference I noticed is that the Amazon SDK tends to throw cURL exceptions (seemingly randomly and rarely), but s3cmd doesn't crash at all. My scripts run on 10 thousand files, so I had to study these cURL exceptions diligently.
My theory is that cURL crashes when there is a communication conflict on the server (for example, when two processes try to use the same resource). I am working on a development server where sometimes several processes simultaneously process S3 using cURL; these are the only situations in which cURL exhibits this behavior.

Conclusion:
Using s3cmd may be more stable, but using the SDK provides more flexibility and better PHP code integration with you; until you forget to handle rare cases (I would say 1 for every 1000 requests when several processes are running at the same time), in which the SDK throws a cURL exception.

+4
source

Since any s3cmd and SDK methods will give the same REST call, you can safely choose the one that suits you best.

When you move a file, if the target exists, it is always replaced, and then if you do not want this behavior, you will need to check if the name of the target file exists in order to perform the move operation or not.

+2
source

Source: https://habr.com/ru/post/914631/


All Articles