├── images ├── cloud_comparison_vm_1.png ├── cloud_comparison_vm_4.png ├── cloud_comparison_linux_1.png └── cloud_comparison_linux_4.png ├── tabulate.py ├── LICENSE ├── vm-backup-cloud.sh ├── linux-backup-cloud.sh ├── results.md └── README.md /images/cloud_comparison_vm_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gilbertchen/cloud-storage-comparison/HEAD/images/cloud_comparison_vm_1.png -------------------------------------------------------------------------------- /images/cloud_comparison_vm_4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gilbertchen/cloud-storage-comparison/HEAD/images/cloud_comparison_vm_4.png -------------------------------------------------------------------------------- /images/cloud_comparison_linux_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gilbertchen/cloud-storage-comparison/HEAD/images/cloud_comparison_linux_1.png -------------------------------------------------------------------------------- /images/cloud_comparison_linux_4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/gilbertchen/cloud-storage-comparison/HEAD/images/cloud_comparison_linux_4.png -------------------------------------------------------------------------------- /tabulate.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | 3 | import os 4 | import sys 5 | import re 6 | 7 | # 8 | # This script is written to extract elapsed times from linux-backup-cloud.sh or vm-backup-cloud.sh 9 | # 10 | # Usage: 11 | # 12 | # ./linux-backup-cloud.sh &> linux-backup-cloud.results 13 | # python tabulate.py linux-backup-cloud.results 14 | 15 | def getTime(minute, second): 16 | t = int(minute) * 60 + float(second) 17 | return "%.1f" % t 18 | 19 | if len(sys.argv) <= 1: 20 | print "usage:", sys.argv[0], "" 21 | sys.exit(1) 22 | 23 | i = 0 24 | for line in open(sys.argv[1]).readlines(): 25 | if line.startswith("====") and "init" in line: 26 | print "\n| | ", 27 | i += 1 28 | continue 29 | m = re.match(r"real\s+(\d+)m([\d.]+)s", line) 30 | if m: 31 | print getTime(m.group(1), m.group(2)), "|", 32 | continue 33 | 34 | print "" 35 | 36 | 37 | 38 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 Acrosync LLC 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /vm-backup-cloud.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # The first argument is the directory where snapshots of VirtualBox vms are stored. 4 | # We expect three files from there: CentOS7-1.vdi, CentOS7-2.vdi, and CentOS7-3.vdi 5 | if [ "$#" -eq 0 ]; then 6 | echo "Usage: $0 " 7 | exit 1 8 | fi 9 | 10 | if [ -z "$DUPLICACY_PATH" ]; then 11 | echo "DUPLICACY_PATH must be set to the path to the Dupicacy executable" 12 | exit 1 13 | fi 14 | 15 | # Set up directories 16 | VM_DIR=$1 17 | TEST_DIR=$2 18 | BACKUP_DIR=${TEST_DIR}/cloud 19 | STORAGE_URL=$3 20 | 21 | if [[ $4 ]]; then 22 | THREADS="-threads $4" 23 | fi 24 | 25 | PASSWORD=12345678 26 | 27 | # Save passwords in these variables; only fill in for those storages that you want to test 28 | # See https://github.com/gilbertchen/duplicacy/blob/master/GUIDE.md#managing-passwords 29 | export DUPLICACY_SSH_PASSWORD= 30 | export DUPLICACY_DROPBOX_TOKEN= 31 | export DUPLICACY_S3_ID= 32 | export DUPLICACY_S3_SECRET= 33 | export DUPLICACY_B2_ID= 34 | export DUPLICACY_B2_KEY= 35 | export DUPLICACY_AZURE_KEY= 36 | export DUPLICACY_GCD_TOKEN= 37 | export DUPLICACY_GCS_TOKEN= 38 | export DUPLICACY_ONE_TOKEN= 39 | export DUPLICACY_HUBIC_TOKEN= 40 | 41 | # Wasabi and Amazon S3 share the same variables and here you can add Wasabi credentials 42 | if [[ "${STORAGE_URL}" == *"wasabi"* ]]; then 43 | export DUPLICACY_S3_ID= 44 | export DUPLICACY_S3_SECRET= 45 | fi 46 | 47 | # DigitalOcean and Amazon S3 share the same variables and here you can add Wasabi credentials 48 | if [[ "${STORAGE_URL}" == *"digitalocean"* ]]; then 49 | export DUPLICACY_S3_ID= 50 | export DUPLICACY_S3_SECRET= 51 | fi 52 | 53 | # Or you can set those variables in a passwords.sh file 54 | if [ -f "${PWD}/passwords.sh" ]; then 55 | echo "loading passwords" 56 | source ${PWD}/passwords.sh 57 | fi 58 | 59 | function duplicacy_backup() 60 | { 61 | pushd ${BACKUP_DIR} 62 | time env DUPLICACY_PASSWORD=${PASSWORD} ${DUPLICACY_PATH} backup -stats -hash ${THREADS} | grep -v Uploaded | grep -v Skipped 63 | popd 64 | } 65 | 66 | function duplicacy_restore() 67 | { 68 | pushd ${BACKUP_DIR} 69 | time env DUPLICACY_PASSWORD=${PASSWORD} ${DUPLICACY_PATH} restore -r $1 -stats -overwrite ${THREADS} | grep -v "Downloaded chunk" 70 | popd 71 | } 72 | 73 | echo =========================================== init ======================================== 74 | rm -rf ${BACKUP_DIR}/* 75 | rm -rf ${BACKUP_DIR}/.duplicacy 76 | mkdir -p ${BACKUP_DIR}/.duplicacy 77 | 78 | pushd ${BACKUP_DIR} 79 | env DUPLICACY_PASSWORD=${PASSWORD} ${DUPLICACY_PATH} init test ${STORAGE_URL} -e 80 | 81 | cp ${VM_DIR}/CentOS7-1.vdi ${BACKUP_DIR}/CentOS7.vid 82 | duplicacy_backup 83 | 84 | cp ${VM_DIR}/CentOS7-2.vdi ${BACKUP_DIR}/CentOS7.vid 85 | duplicacy_backup 86 | 87 | cp ${VM_DIR}/CentOS7-3.vdi ${BACKUP_DIR}/CentOS7.vid 88 | duplicacy_backup 89 | 90 | rm -rf ${BACKUP_DIR}/* 91 | 92 | for i in `seq 1 3`; do 93 | duplicacy_restore $i 94 | done 95 | 96 | popd 97 | 98 | 99 | -------------------------------------------------------------------------------- /linux-backup-cloud.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | if [ "$#" -eq 0 ]; then 4 | echo "Usage: $0 " 5 | exit 1 6 | fi 7 | 8 | if [ -z "$DUPLICACY_PATH" ]; then 9 | echo "DUPLICACY_PATH must be set to the path to the Dupicacy executable" 10 | exit 1 11 | fi 12 | 13 | # Set up directories 14 | TEST_DIR=$1 15 | BACKUP_DIR=${TEST_DIR}/linux 16 | STORAGE_URL=$2 17 | 18 | if [[ $3 ]]; then 19 | THREADS="-threads $3" 20 | fi 21 | 22 | PASSWORD=12345678 23 | 24 | # Save passwords in these variables; only fill in for those storages that you want to test 25 | # See https://github.com/gilbertchen/duplicacy/blob/master/GUIDE.md#managing-passwords 26 | export DUPLICACY_SSH_PASSWORD= 27 | export DUPLICACY_DROPBOX_TOKEN= 28 | export DUPLICACY_S3_ID= 29 | export DUPLICACY_S3_SECRET= 30 | export DUPLICACY_B2_ID= 31 | export DUPLICACY_B2_KEY= 32 | export DUPLICACY_AZURE_KEY= 33 | export DUPLICACY_GCD_TOKEN= 34 | export DUPLICACY_GCS_TOKEN= 35 | export DUPLICACY_ONE_TOKEN= 36 | export DUPLICACY_HUBIC_TOKEN= 37 | 38 | # Wasabi and Amazon S3 share the same variables and here you can add Wasabi credentials 39 | if [[ "${STORAGE_URL}" == *"wasabi"* ]]; then 40 | export DUPLICACY_S3_ID= 41 | export DUPLICACY_S3_SECRET= 42 | fi 43 | 44 | # Or you can set those variables in a passwords.sh file 45 | if [ -f "${PWD}/passwords.sh" ]; then 46 | echo "loading passwords" 47 | source ${PWD}/passwords.sh 48 | fi 49 | 50 | # Download the github repository if needed 51 | if [ ! -d "${BACKUP_DIR}" ]; then 52 | git clone https://github.com/torvalds/linux.git ${BACKUP_DIR} 53 | fi 54 | 55 | function duplicacy_backup() 56 | { 57 | pushd ${BACKUP_DIR} 58 | time env DUPLICACY_PASSWORD=${PASSWORD} ${DUPLICACY_PATH} backup -stats -hash ${THREADS} | grep -v Uploaded 59 | popd 60 | } 61 | 62 | function duplicacy_restore() 63 | { 64 | pushd ${BACKUP_DIR} 65 | time env DUPLICACY_PASSWORD=${PASSWORD} ${DUPLICACY_PATH} restore -r $1 -stats -overwrite ${THREADS} | grep -v Downloaded 66 | popd 67 | } 68 | 69 | echo =========================================== init ======================================== 70 | rm -rf ${BACKUP_DIR}/.duplicacy 71 | mkdir -p ${BACKUP_DIR}/.duplicacy 72 | 73 | pushd ${BACKUP_DIR} 74 | env DUPLICACY_PASSWORD=${PASSWORD} ${DUPLICACY_PATH} init test ${STORAGE_URL} -e 75 | echo "-.git/" > ${BACKUP_DIR}/.duplicacy/filters 76 | 77 | git checkout -f 4f302921c1458d790ae21147f7043f4e6b6a1085 # commit on 07/02/2016 78 | duplicacy_backup 79 | 80 | git checkout -f 3481b68285238054be519ad0c8cad5cc2425e26c # commit on 08/03/2016 81 | duplicacy_backup 82 | 83 | git checkout -f 46e36683f433528bfb7e5754ca5c5c86c204c40a # commit on 09/02/2016 84 | duplicacy_backup 85 | 86 | git checkout -f 566c56a493ea17fd321abb60d59bfb274489bb18 # commit on 10/05/2016 87 | duplicacy_backup 88 | 89 | git checkout -f 1be81ea5860744520e06d0dfb9e3490b45902dbb # commit on 11/01/2016 90 | duplicacy_backup 91 | 92 | git checkout -f ef3d232245ab7a1bf361c52449e612e4c8b7c5ab # commit on 12/02/2016 93 | duplicacy_backup 94 | 95 | rm -rf ${BACKUP_DIR}/* 96 | 97 | for i in `seq 1 6`; do 98 | duplicacy_restore $i 99 | done 100 | 101 | popd 102 | 103 | 104 | -------------------------------------------------------------------------------- /results.md: -------------------------------------------------------------------------------- 1 | Here are the elapsed real times (in seconds) of the backup and restore operations as reported by the `time` command: 2 | 3 | 4 | | Storage | initial backup | 2nd | 3rd | 4th | 5th | 6th | initial restore | 2nd | 3rd | 4th | 5th | 6th | 5 | |:--------------------:|:------:|:----:|:-----:|:----:|:-----:|:----:|:-----:|:----:|:----:|:----:|:----:|:----:| 6 | | SFTP | 31.5 | 6.6 | 20.6 | 4.3 | 27.0 | 7.4 | 22.5 | 7.8 | 18.4 | 3.6 | 18.7 | 8.7 | 7 | | Amazon S3 | 41.1 | 5.9 | 21.9 | 4.1 | 23.1 | 7.6 | 27.7 | 7.6 | 23.5 | 3.5 | 23.7 | 7.2 | 8 | | Wasabi | 38.7 | 5.7 | 31.7 | 3.9 | 21.5 | 6.8 | 25.7 | 6.5 | 23.2 | 3.3 | 22.4 | 7.6 | 9 | | DigitalOcean Spaces | 51.6 | 7.1 | 31.7 | 3.8 | 24.7 | 7.5 | 29.3 | 6.4 | 27.6 | 2.7 | 24.7 | 6.2 | 10 | | Backblaze B2 | 106.7 | 24.0 | 88.2 | 13.5 | 46.3 | 14.8 | 67.9 | 14.4 | 39.1 | 6.2 | 38.0 | 11.2 | 11 | | Google Cloud Storage | 76.9 | 11.9 | 33.1 | 6.7 | 32.1 | 12.7 | 39.5 | 9.9 | 26.2 | 4.8 | 25.5 | 10.4 | 12 | | Google Drive | 139.3 | 14.7 | 45.2 | 9.8 | 60.5 | 19.8 | 129.4 | 17.8 | 54.4 | 8.4 | 67.3 | 17.4 | 13 | | Microsoft Azure | 35.0 | 5.4 | 20.4 | 3.9 | 22.1 | 6.1 | 30.7 | 7.1 | 21.5 | 3.6 | 21.6 | 9.2 | 14 | | Microsoft OneDrive | 250.0 | 31.6 | 80.2 | 16.9 | 82.7 | 36.4 | 333.4 | 26.2 | 82.0 | 12.9 | 71.1 | 24.4 | 15 | | Dropbox | 267.2 | 35.8 | 113.7 | 19.5 | 109.0 | 38.3 | 164.0 | 31.6 | 80.3 | 14.3 | 73.4 | 22.9 | 16 | 17 | The following table shows new results with 4 threads: 18 | 19 | | Storage | initial backup | 2nd | 3rd | 4th | 5th | 6th | initial restore | 2nd | 3rd | 4th | 5th | 6th | 20 | |:--------------------:|:------:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:|:----:| 21 | | SFTP | 45.0 | 7.7 | 25.4 | 3.5 | 24.7 | 5.3 | 20.5 | 8.7 | 19.8 | 3.8 | 21.1 | 8.5 | 22 | | Amazon S3 | 30.2 | 5.4 | 17.6 | 4.1 | 21.3 | 6.8 | 17.1 | 7.7 | 17.3 | 3.6 | 19.5 | 6.7 | 23 | | Wasabi | 30.7 | 5.4 | 18.3 | 3.8 | 19.5 | 7.4 | 16.8 | 7.1 | 17.3 | 3.5 | 16.4 | 6.2 | 24 | | DigitalOcean Spaces | 30.9 | 6.7 | 18.1 | 3.6 | 20.9 | 6.9 | 15.9 | 6.6 | 17.9 | 2.6 | 15.7 | 6.0 | 25 | | Backblaze B2 | 44.1 | 11.9 | 30.0 | 9.8 | 31.5 | 15.3 | 52.6 | 12.4 | 30.0 | 8.3 | 32.6 | 11.5 | 26 | | Google Cloud Storage | 36.8 | 6.9 | 19.3 | 5.3 | 23.8 | 7.3 | 17.3 | 6.7 | 20.0 | 4.5 | 17.6 | 6.1 | 27 | | Google Drive | 121.6 | 11.6 | 43.3 | 8.1 | 34.9 | 13.4 | 42.5 | 15.5 | 24.6 | 7.9 | 29.5 | 8.1 | 28 | | Microsoft Azure | 31.1 | 5.0 | 21.2 | 4.0 | 21.0 | 6.2 | 22.2 | 6.6 | 19.3 | 4.0 | 17.3 | 6.2 | 29 | | Microsoft OneDrive | 137.2 | 14.4 | 35.0 | 13.2 | 42.0 | 17.9 | 64.4 | 19.4 | 34.9 | 13.8 | 30.2 | 11.0 | 30 | 31 | 32 | The following table lists the elapsed real times (in seconds) of the backup and restore operations: 33 | 34 | | Storage | Initial backup | 2nd backup | 3rd backup | Initial restore | 2nd restore | 3rd restore | 35 | |:--------------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:| 36 | | SFTP | 143.5 | 89.6 | 65.5 | 129.6 | 119.8 | 80.5 | 37 | | Amazon S3 | 248.7 | 127.2 | 64.2 | 203.0 | 141.8 | 99.0 | 38 | | Wasabi | 176.5 | 98.2 | 66.5 | 153.6 | 127.2 | 86.3 | 39 | | DigitalOcean Spaces | 275.9 | 120.4 | 67.4 | 419.1 | 160.8 | 92.0 | 40 | | Backblaze B2 | 1510.0 | 740.3 | 138.9 | 767.7 | 295.2 | 113.6 | 41 | | Google Cloud Storage | 479.7 | 180.9 | 72.6 | 299.2 | 147.3 | 88.2 | 42 | | Google Drive | 700.4 | 275.9 | 84.5 | 819.4 | 337.6 | 118.8 | 43 | | Microsoft Azure | 188.9 | 96.7 | 64.6 | 202.4 | 171.1 | 103.4 | 44 | | Microsoft OneDrive | 1267.2 | 449.3 | 104.7 | 895.5 | 564.9 | 147.6 | 45 | | Dropbox | 1655.6 | 612.6 | 127.7 | 1034.5 | 386.3 | 135.6 | 46 | 47 | Similar performance improvements can be observed with 4 threads: 48 | 49 | | Storage | Initial backup | 2nd backup | 3rd backup | Initial restore | 2nd restore | 3rd restore | 50 | |:--------------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:| 51 | | SFTP | 143.5 | 96.5 | 69.3 | 136.3 | 121.5 | 87.5 | 52 | | Amazon S3 | 125.3 | 80.3 | 70.2 | 123.5 | 125.5 | 83.1 | 53 | | Wasabi | 114.4 | 80.0 | 64.0 | 173.7 | 158.4 | 87.5 | 54 | | DigitalOcean Spaces | 115.6 | 89.7 | 63.2 | 105.0 | 95.7 | 77.9 | 55 | | Backblaze B2 | 222.3 | 124.7 | 69.6 | 263.8 | 190.2 | 101.1 | 56 | | Google Cloud Storage | 149.6 | 87.6 | 62.3 | 95.5 | 109.0 | 85.2 | 57 | | Google Drive | 292.9 | 120.9 | 80.3 | 422.4 | 232.4 | 118.4 | 58 | | Microsoft Azure | 120.2 | 88.8 | 72.2 | 143.7 | 112.6 | 138.1 | 59 | | Microsoft OneDrive | 483.7 | 152.9 | 80.5 | 394.7 | 237.3 | 128.7 | 60 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Objective 2 | 3 | To compare the performances of major cloud services when used as backup storages for [Duplicacy](https://github.com/gilbertchen/duplicacy). 4 | 5 | ## Disclaimer 6 | As an independent developer, I am not affiliated with any companies behind these cloud storage services. Nor do I receive any financial support/incentive from any of them for publishing this report or building Duplicacy. 7 | 8 | ## Storages 9 | 10 | The table below lists the storages to be tested and compares their pricing. The only storage supported by Duplicacy but not included in the comparison is Hubic. That is because Hubic is considerably slower than others, likely caused by their https servers not allowing connections to be reused so there is too much overhead for re-establishing https connections with each file transfer. 11 | 12 | | Type | Storage (monthly) | Upload | Download | API Charge | 13 | |:------------:|:-------------:|:------------------:|:--------------:|:-----------:| 14 | | Amazon S3 | $0.023/GB | free | $0.09/GB | [yes](https://aws.amazon.com/s3/pricing/) | 15 | | Wasabi | $3.99 first 1TB
$0.0039/GB additional | free | $.04/GB | no | 16 | | DigitalOcean Spaces| $5 first 250GB
$0.02/GB additional | free | first 1TB free
$0.01/GB additional| no | 17 | | Backblaze B2 | $0.005/GB | free | $0.02/GB | [yes](https://www.backblaze.com/b2/b2-transactions-price.html) | 18 | | Google Cloud Storage| $0.026/GB | free |$ 0.12/GB | [yes](https://cloud.google.com/storage/pricing) | 19 | | Google Drive | 15GB free
$1.99/100GB
$9.99/TB | free | free | no | 20 | | Microsoft Azure | $0.0184/GB | free | free | [yes](https://azure.microsoft.com/en-us/pricing/details/storage/blobs/) | 21 | | Microsoft OneDrive | 5GB free
$1.99/50GB
$5.83/TB | free | free | no | 22 | | Dropbox | 2GB free
$8.25/TB | free | free | no | 23 | 24 | ## Setup 25 | 26 | All tests were performed on a Ubuntu 16.04.1 LTS virtual machine running on a dedicated ESXi server with an Intel Xeon D-1520 CPU (4 cores at 2.2 GHz) and 32G memory. The server is located at the east coast, so the results may be biased against those services who have their servers on the west coast. The network bandwidth is 200Mbps. 27 | 28 | The same 2 datasets in https://github.com/gilbertchen/benchmarking are used to test backup and restore speeds. It should be noted that this is not a simple file upload and download test. Before uploading a chunk to the storage, Duplicacy always checks first if the chunk already exists on the storage, in order to take advantage of [cross-computer deduplication](https://github.com/gilbertchen/duplicacy/blob/master/DESIGN.md) if two computers happen to have identical or similar files. This existence check means that at least one extra API call is needed for each chunk to be uploaded. 29 | 30 | A local SFTP storage is also included in the test to provide a base line for the comparisons. The SFTP server runs on a different virtual machine on the same ESXi host. 31 | 32 | All scripts to run the tests are available in this repo so you can run your own tests. 33 | 34 | ## Dataset 1: the Linux code base 35 | 36 | The first dataset is the [Linux code base](https://github.com/torvalds/linux) with a total size of 1.76 GB and about 58K files, so it is a relatively small repository consisting of small files, but it represents a popular use case where a backup tool runs alongside a version control program such as git to frequently save changes made between checkins. 37 | 38 | To test incremental backup, a random commit on July 2016 was selected, and the entire code base is rolled back to that commit. After the initial backup was finished, other commits were chosen such that they were about one month apart. The code base is then moved forward to these commits one by one to emulate incremental changes. Details can be found in linux-backup-cloud.sh. 39 | 40 | Restore was tested the same way. The first restore is a full restore of the first backup on an empty repository, and each subsequent restore is an incremental one that only patches files changed by each commit. 41 | 42 | All running times of the backup and restore operations were measured by the `time` command as the real elapsed times: 43 | 44 | ![cloud_comparison_linux_1](https://github.com/gilbertchen/cloud-storage-comparison/blob/master/images/cloud_comparison_linux_1.png) 45 | 46 | These results indicate that the performances of cloud storages vary a lot. While S3-compatible ones (Amazon, Wasabi, and DigitalOcean) and Azure can back up and restore at speeds close to those of the local SFTP storage, others are much slower. However, one of the advantages of cloud storages is that most of them support simultaneous connections, so we can keep increasing the number of threads until the local processing or the network becomes the bottleneck. 47 | 48 | The following charts shows new results with 4 threads: 49 | 50 | ![cloud_comparison_linux_4](https://github.com/gilbertchen/cloud-storage-comparison/blob/master/images/cloud_comparison_linux_4.png) 51 | 52 | Dropbox doesn't seem to support simultaneous writes, so it was missing from the table. Moreover, Google Drive was the only cloud storage that didn't benefit from the use of multiple threads, possibly due to strict per-user rate limiting. Amazon S3, Wasabi, DigitalOcean, and Azure all achieved comparable or even slightly superior performances than the SFTP storage. 53 | 54 | ## Dataset 2: a VirtualBox virtual machine 55 | 56 | The second test was targeted at the other end of the spectrum - datasets with fewer but much larger files. Virtual machine files typically fall into this category. The particular dataset for this test is a VirtualBox virtual machine file. The base disk image is 64 bit CentOS 7, downloaded from http://www.osboxes.org/centos/. Its size is about 4 GB, still small compared to virtual machines that are actually being used everyday, but it is enough to quantify performance differences. 57 | 58 | The first backup was performed right after the virtual machine had been set up without installing new software. The second backup was performed after installing common developer tools using the command `yum groupinstall 'Development Tools'`. The third backup was performed after a power-on immediately followed by a power-off. The first restore is a full restore of the first backup on an empty directory, while the second and third are incremental. 59 | 60 | The following chart compares real times (in seconds) of the backup and restore operations: 61 | 62 | ![cloud_comparison_vm_1](https://github.com/gilbertchen/cloud-storage-comparison/blob/master/images/cloud_comparison_vm_1.png) 63 | 64 | Similar performance improvements can be observed with 4 threads: 65 | 66 | ![cloud_comparison_vm_4](https://github.com/gilbertchen/cloud-storage-comparison/blob/master/images/cloud_comparison_vm_4.png) 67 | 68 | 69 | ## Conclusion 70 | 71 | As far as I know, this is perhaps the first head-to-head performance comparisons of popular cloud backup storages. Although results presented here are neither comprehensive nor conclusive, I do hope that they will at least provide some guidance for users of Duplicacy or other cloud backup tools when deciding which cloud service to choose. 72 | 73 | These results also suggest that storages designed to be primarily accessed via an API are generally faster than storages that are offered as cloud drives, since the latter are perhaps more optimized for their own clients with the API access merely being an addon. 74 | 75 | The more important message, however, is that cloud backup can be as fast as local backup, with only modest network bandwidth, especially if you can use multiple threads. It may be worth a try to add cloud components to your backup strategies if you haven't already done so. 76 | --------------------------------------------------------------------------------