当前位置: 动力学知识库 > 问答 > 编程问答 >

amazon s3 - Faster s3 bucket duplication

问题描述:

I have been trying to find a better command line tool for duplicating buckets than s3cmd. s3cmd can duplicate buckets without having to download and upload each file. The command I normally run to duplicate buckets using s3cmd is:

s3cmd cp -r --acl-public s3://bucket1 s3://bucket2

This works, but it is very slow as copies each file via the API one at a time. If s3cmd could run in parallel mode, I'd be very happy.

Are there other options available as a command line tools or code that people use to duplicate buckets that are faster than s3cmd?

Edit: Looks like s3cmd-modification is exactly what I'm looking for. Too bad it does not work. Are there any other options?

网友答案:

AWS CLI seems to do the job perfectly, and has the bonus of being an officially supported tool.

aws s3 sync s3://mybucket s3://backup-mybucket

http://docs.aws.amazon.com/cli/latest/reference/s3/sync.html

Supports concurrent transfers by default. See http://docs.aws.amazon.com/cli/latest/topic/s3-config.html#max-concurrent-requests

To quickly transfer a huge number of small files, run from an EC2 instance to decrease latency, and increase max_concurrent_requests to reduce impact of latency. e.g.

aws configure set default.s3.max_concurrent_requests 200
网友答案:

If you don't mind using the AWS console, you can:

  1. Select all of the files/folders in the first bucket
  2. Click Actions > Copy
  3. Create a new bucket and select it
  4. Click Actions > Paste

It's still fairly slow, but you can leave it alone and let it do its thing.

网友答案:

I have tried cloning two buckets using the AWS web console, the s3cmd and the AWS CLI. Although these methods works most of the time, they are painfully slow.

Then I found s3s3mirror - a specialized tool for syncing two S3 buckets. It's multithreaded and a lot faster than the other approaches I have tried. I quickly moved GBs of data from one AWS region to another.

Check it out at https://github.com/cobbzilla/s3s3mirror, or download a Docker container from https://registry.hub.docker.com/u/pmoust/s3s3mirror/

网友答案:

I don't know of any other S3 command line tools but if nothing comes up here, it might be easiest to write your own.

Pick whatever language and Amazon SDK/Toolkit you prefer. Then you just need to list/retrieve the source bucket contents and copy each file (In parallel obviously)

Looking at the source for s3cmd-modification (and I admit I know nothing about python), it looks like they have not parallelised the bucket-to-bucket code but perhaps you could use the standard upload/download parallel code as a starting point to do this.

网友答案:

As this is about Google's first hit on this subject, adding extra information.

'Cyno' made a newer version of s3cmd-modification, which now supports parallel bucket-to-bucket syncing. Exactly what I was waiting for as well.

Pull request is at https://github.com/pcorliss/s3cmd-modification/pull/2, his version at https://github.com/pearltrees/s3cmd-modification

网友答案:

For adhoc solution use aws cli to sync between buckets:

aws s3 sync speed depends on:
- latency for an API call to S3 endpoint
- amount of API calls made in concurrent

To increase sync speed:
- run aws s3 sync from an AWS instance (c3.large on FreeBSD is OK ;-) )
- update ~/.aws/config with:
-- max_concurrent_requests = 128
-- max_queue_size = 8096

with following config and instance type I was able to sync bucket (309GB, 72K files, us-east-1) within 474 seconds.

For more generic solution consider - AWS DataPipeLine or S3 cross-region replication.

分享给朋友:
您可能感兴趣的文章:
随机阅读: