├── .gitignore ├── README.md ├── compat.sh └── docker-compose.yml /.gitignore: -------------------------------------------------------------------------------- 1 | var 2 | etc 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Deploy CephFS in Docker Swarm 2 | 3 | Note: Due to lack of `privileged:true` containers in Swarm, there is no way to run Ceph Bluestore. 4 | 5 | ### Deployment 6 | Consider we have 5 nodes swarm cluster: 7 | ```bash 8 | $ docker node ls --format 'table {{.Hostname}}\t{{.ManagerStatus}}' 9 | HOSTNAME MANAGER STATUS 10 | node1.domain.local Leader 11 | node2.domain.local Reachable 12 | node3.domain.local Reachable 13 | node4.domain.local 14 | node5.domain.local 15 | ``` 16 | First 3 nodes are masters, and the rest are workers. Locations of corresponding Ceph roles: 17 | - `mon` to master nodes 18 | - `osd` to each node 19 | - `mds` two (active/standby) anywhere 20 | - `mgr` one anywhere 21 | 22 | As `osd` would work in directory mode, preparing disks on each swarm node manually: 23 | ```bash 24 | apt install xfsprogs 25 | mkfs.xfs -f -i size=2048 /dev/sdX 26 | echo '/dev/sdX /mnt/osd xfs rw,noatime,inode64 0 0' >> /etc/fstab 27 | mkdir -p /mnt/osd && mount /mnt/osd 28 | ``` 29 | 30 | Generate secrets and configs for uploading to swarm. This should be done on any swarm master node via throw-away container: 31 | ```bash 32 | docker run -d --rm --net=host \ 33 | --name ceph_mon \ 34 | -v `pwd`/etc:/etc/ceph \ 35 | -v `pwd`/var:/var/lib/ceph \ 36 | -e NETWORK_AUTO_DETECT=4 \ 37 | -e DEBUG=verbose \ 38 | ceph/daemon mon 39 | 40 | docker exec -it ceph_mon ceph mon getmap -o /etc/ceph/ceph.monmap 41 | 42 | docker stop ceph_mon 43 | ``` 44 | Need to fix main config and provide all `mon` hostnames (which are the same as swarm masters): 45 | ```ini 46 | # cat etc/ceph.conf 47 | [global] 48 | fsid = 1e4d9f52-314e-49f4-a2d3-5283da875e33 49 | mon initial members = node1, node2, node3 50 | mon host = node1.domain.local, node2.domain.local, node3.domain.local 51 | osd journal size = 100 52 | log file = /dev/null 53 | mon cluster log file = /var/lib/ceph/mon/$cluster-$id/$channel.log 54 | ``` 55 | Create secrets and configs in swarm: 56 | ```bash 57 | docker config create ceph.conf etc/ceph.conf 58 | docker config ls 59 | 60 | docker secret create ceph.monmap etc/ceph.monmap 61 | docker secret create ceph.client.admin.keyring etc/ceph.client.admin.keyring 62 | docker secret create ceph.mon.keyring etc/ceph.mon.keyring 63 | docker secret create ceph.bootstrap-osd.keyring var/bootstrap-osd/ceph.keyring 64 | docker secret ls 65 | 66 | # Cleanup 67 | rm -r ./var ./etc 68 | ``` 69 | Deploy the stack: 70 | ``` 71 | docker stack deploy -c docker-compose.yml ceph 72 | ``` 73 | After everything is up, login to any `mon` container: 74 | ```bash 75 | # docker exec -it `docker ps -qf name=ceph_mon` bash 76 | # ceph -s 77 | cluster: 78 | id: 1e4d9f52-314e-49f4-a2d3-5283da875e33 79 | health: HEALTH_OK 80 | 81 | services: 82 | mon: 3 daemons, quorum node3,node2,node1 83 | mgr: node1(active) 84 | osd: 5 osds: 5 up, 5 in 85 | 86 | data: 87 | pools: 0 pools, 0 pgs 88 | objects: 0 objects, 0 bytes 89 | usage: 209 MB used, 10020 MB / 10230 MB avail 90 | pgs: 91 | 92 | # Configure CephFS 93 | ceph osd pool create cephfs_data 64 94 | ceph osd pool create cephfs_metadata 64 95 | ceph fs new cephfs cephfs_metadata cephfs_data 96 | # User for mounting, save this key 97 | ceph fs authorize cephfs client.swarm / rw 98 | 99 | # Tweak for VMs 100 | ceph osd pool set cephfs_data nodeep-scrub 1 101 | ``` 102 | 103 | 104 | ### Client Mounting 105 | On each node specify at least 2 swarm master nodes, to mount from: 106 | ```bash 107 | # Save the key from previous step: 108 | echo 'AQDilPRa1BYKFxAanqbBx0JnutW4AdlYJmUehg==' > /root/.ceph 109 | apt install ceph-fs-common 110 | echo 'node1.domain.local,node2.domain.local:/ /mnt/ceph ceph _netdev,name=swarm,secretfile=/root/.ceph 0 0' >> /etc/fstab 111 | mkdir /mnt/ceph && mount /mnt/ceph 112 | ``` 113 | 114 | ### Basic comparison with GlusterFS 115 | Installing GlusterFS on same 3 swarm master nodes with one replica=3 volume mounted to `/mnt/gluster` on default settings: 116 | ``` 117 | gluster volume info 118 | 119 | Volume Name: data 120 | Type: Replicate 121 | Volume ID: 9a582ddc-b593-4694-921c-d5601787936d 122 | Status: Started 123 | Snapshot Count: 0 124 | Number of Bricks: 1 x 3 = 3 125 | Transport-type: tcp 126 | Bricks: 127 | Brick1: node1.domain.local:/var/lib/brick/data 128 | Brick2: node2.domain.local:/var/lib/brick/data 129 | Brick3: node3.domain.local:/var/lib/brick/data 130 | Options Reconfigured: 131 | transport.address-family: inet 132 | nfs.disable: on 133 | performance.client-io-threads: off 134 | ``` 135 | 136 | Write throughtput: 137 | ``` 138 | dd if=/dev/zero of=/mnt/gluster/test bs=1M count=100 139 | 100+0 records in 140 | 100+0 records out 141 | 104857600 bytes (105 MB, 100 MiB) copied, 2.28377 s, 45.9 MB/s 142 | 143 | dd if=/dev/zero of=/mnt/ceph/test bs=1M count=100 144 | 100+0 records in 145 | 100+0 records out 146 | 104857600 bytes (105 MB, 100 MiB) copied, 0.0868178 s, 1.2 GB/s 147 | ``` 148 | Metadata ops (same content ~125 dirs): 149 | ``` 150 | time ls -R /mnt/gluster >/dev/null 151 | real 0m0.101s 152 | user 0m0.000s 153 | sys 0m0.004s 154 | 155 | time ls -R /mnt/ceph >/dev/null 156 | real 0m0.004s 157 | user 0m0.000s 158 | sys 0m0.000s 159 | ``` 160 | -------------------------------------------------------------------------------- /compat.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | echo "Reading Swarm secrets" 4 | for f in ceph.mon.keyring ceph.client.admin.keyring; do 5 | [ -e /run/secrets/$f ] && cp /run/secrets/$f /etc/ceph/$f 6 | done 7 | for d in osd mds rbd rgw; do 8 | if [ -e /run/secrets/ceph.bootstrap-$d.keyring ]; then 9 | mkdir -p /var/lib/ceph/bootstrap-$d 10 | cp /run/secrets/ceph.bootstrap-$d.keyring /var/lib/ceph/bootstrap-$d/ceph.keyring 11 | fi 12 | done 13 | 14 | if [ "$1" == "mon" ]; then 15 | base="/var/lib/ceph/mon/ceph-`hostname`" 16 | if [ ! -e "$base/keyring" -a -e /run/secrets/monmap ]; then 17 | echo "Bootstrapping new mon from Swarm secrets" 18 | mkdir -p $base && chown ceph:ceph $base 19 | ceph-mon --setuser ceph --setgroup ceph --cluster ceph --mkfs -i `hostname` --monmap /run/secrets/monmap --keyring /etc/ceph/ceph.mon.keyring --mon-data $base 20 | fi 21 | elif [ "$1" == "mgr" -a "$ZABBIX" ]; then 22 | echo "Adding zabbix_sender to mgr" 23 | rpm -Uvh http://repo.zabbix.com/zabbix/3.4/rhel/7/x86_64/zabbix-release-3.4-2.el7.noarch.rpm 24 | yum install -y zabbix-sender 25 | fi 26 | 27 | exec /entrypoint.sh "$@" 28 | -------------------------------------------------------------------------------- /docker-compose.yml: -------------------------------------------------------------------------------- 1 | version: "3.6" 2 | 3 | volumes: 4 | etc: 5 | var: 6 | 7 | networks: 8 | hostnet: 9 | external: true 10 | name: host 11 | 12 | configs: 13 | compat.sh: 14 | file: ./compat.sh 15 | ceph.conf: 16 | external: true 17 | 18 | secrets: 19 | ceph.monmap: 20 | external: true 21 | ceph.mon.keyring: 22 | external: true 23 | ceph.client.admin.keyring: 24 | external: true 25 | ceph.bootstrap-osd.keyring: 26 | external: true 27 | 28 | services: 29 | mon: 30 | image: ceph/daemon 31 | entrypoint: /tmp/compat.sh 32 | command: mon 33 | networks: 34 | hostnet: {} 35 | volumes: 36 | - etc:/etc/ceph 37 | - var:/var/lib/ceph 38 | configs: 39 | - source: compat.sh 40 | target: /tmp/compat.sh 41 | mode: 0755 42 | - source: ceph.conf 43 | target: /etc/ceph/ceph.conf 44 | secrets: 45 | - ceph.monmap 46 | - ceph.mon.keyring 47 | - ceph.client.admin.keyring 48 | - ceph.bootstrap-osd.keyring 49 | environment: 50 | - "NETWORK_AUTO_DETECT=4" 51 | deploy: 52 | mode: global 53 | placement: 54 | constraints: 55 | - node.role == manager 56 | 57 | mgr: 58 | image: ceph/daemon 59 | entrypoint: /tmp/compat.sh 60 | command: mgr 61 | #hostname: "{{.Node.Hostname}}" 62 | networks: 63 | hostnet: {} 64 | volumes: 65 | - etc:/etc/ceph 66 | - var:/var/lib/ceph 67 | configs: 68 | - source: compat.sh 69 | target: /tmp/compat.sh 70 | mode: 0755 71 | - source: ceph.conf 72 | target: /etc/ceph/ceph.conf 73 | secrets: 74 | - ceph.client.admin.keyring 75 | # This will add zabbix_sender to mgr 76 | environment: 77 | - ZABBIX=1 78 | deploy: 79 | replicas: 1 80 | 81 | osd: 82 | image: ceph/daemon 83 | entrypoint: /tmp/compat.sh 84 | command: osd 85 | networks: 86 | hostnet: {} 87 | volumes: 88 | - etc:/etc/ceph 89 | - var:/var/lib/ceph 90 | - /mnt/osd:/var/lib/ceph/osd 91 | configs: 92 | - source: compat.sh 93 | target: /tmp/compat.sh 94 | mode: 0755 95 | - source: ceph.conf 96 | target: /etc/ceph/ceph.conf 97 | secrets: 98 | - ceph.bootstrap-osd.keyring 99 | deploy: 100 | mode: global 101 | 102 | mds: 103 | image: ceph/daemon 104 | entrypoint: /tmp/compat.sh 105 | command: mds 106 | networks: 107 | hostnet: {} 108 | volumes: 109 | - etc:/etc/ceph 110 | - var:/var/lib/ceph 111 | configs: 112 | - source: compat.sh 113 | target: /tmp/compat.sh 114 | mode: 0755 115 | - source: ceph.conf 116 | target: /etc/ceph/ceph.conf 117 | secrets: 118 | - ceph.client.admin.keyring 119 | deploy: 120 | replicas: 2 121 | --------------------------------------------------------------------------------