docker series-talk about storage driver Btrfs

docker series-talk about storage driver Btrfs


Btrfs is the next-generation copy-on-write file system, which supports many advanced storage technologies and is very suitable for Docker. Btrfs is included in the mainline Linux kernel.
Docker's btrfs storage driver utilizes many btrfs features for image and container management. These features include block-level operations, thin provisioning, copy-on-write snapshots, and easy management. You can easily combine multiple physical block devices into a single Btrfs file system.


  • Docker Engine-Community : For the community edition, it is only recommended to use btrfs on Ubuntu or Debian.

  • Docker EE : For Docker EE and cs engines, btrfs only supports SLES.

  • Changing the storage driver will make any containers that have been created inaccessible on the local system. Use docker save to save the container and push the existing image into the docker Hub or private repository so that you don't need to recreate it later.

  • Btrfs requires a dedicated block storage device, such as a physical disk. This block device must be formatted for Btrfs and installed in/var/lib/docker/.

  • The kernel must support btrfs. Run the following command to check:

    cat / proc / filesystems | grep btrfs
  • To manage the BTRFS file system at the operating system level, you need to use the BTRFS command. If this command is not available, install the btrfsprogs package (SLES) or btrfs-tools package (Ubuntu).

Configure Docker to use btrfs storage drives

  1. Stop Docker.
  2. Copy the contents of/var/lib/docker/to the backup location, and then empty the contents of/var/lib/docker/:
    cp - au /var/ lib / docker /var/ lib / docker . bk rm - rf /var/ lib / docker/*
  3. Format the dedicated block device or device as a Btrfs file system.
    the mkfs . Btrfs - F / dev / xvdf / dev / xvdg
  4. Mount the new Btrfs file system on the/var/lib/docker/mount point. Don't forget to make the changes permanent on reboot by adding an entry to/etc/fstabb.
    mount - t btrfs / dev / xvdf /var/ lib / docker
  5. Copy the contents of/var/lib/docker.bk to/var/lib/docker/.
    cp - au /var/ lib / docker . bk/* /var/ lib / docker /
  6. Configure Docker to use the btrfs storage driver. This is necessary even if/var/lib/docker/is currently using the Btrfs file system. Edit or create the file/etc/docker/daemon.json. If it is a new file, please add the following. If it is an existing file, only add the key and value, if it is not the last line before the closing brace (}), be careful to end it with a comma.
    { "storage-driver" : "btrfs" }
  7. Verify View Image through docker info

Manage Btrfs volumes

One of the advantages of Btrfs is that it is easy to manage the Btrfs file system without having to unmount the file system or restart Docker.

When the space is insufficient, Btrfs will automatically expand the volume in approximately 1 GB blocks.

To add a block device to a Btrfs volume, use the btrfs device add and btrfs filesystem balance commands.

btrfs device add / dev / svdh /var/ lib / docker btrfs filesystem balance /var/ lib / docker

How the btrfs storage driver works

The btrfs storage driver works differently from the device mapper or other storage drivers because the entire/var/lib/docker/directory is stored in the btrfs volume.

Image and container layer on disk

Information about the image layer and writable container layer is stored in/var/lib/docker/btrfs/subvolumes/. This subdirectory contains a directory for each image or container layer, and a unified file system constructed by one layer and all its parent layers. Subvolumes are native copy-on-write, and space is allocated for them from the underlying storage pool on demand. They can also be nested and snapped. The figure below shows 4 subvolumes. "Subvolume 2" and "Subvolume 3" are nested, while "Subvolume 4" displays its own internal directory tree.
View Image
only stores the mirrored base layer as a real subvolume. All other layers are stored as snapshots, which contain only the differences introduced in that layer. You can create a snapshot of the snapshot as shown in the figure below.
View Image
On disk, snapshots look and feel like subvolumes, but they are actually smaller and more space-efficient. Copy-on-write is used to maximize storage efficiency and minimize layer size, and to manage writes in the container writable layer at the block level. The figure below shows a subvolume and its snapshot share data. In order to achieve maximum efficiency,
View Image
will allocate more space in units of approximately 1 GB when a container needs more space.

Docker's btrfs storage driver stores each image layer and container in its own btrfs subvolume or snapshot. The base layer of the mirror is stored as a subvolume, and the sub-mirror layer and container are stored as snapshots. As shown below.
View Image
's high-level process for creating images and containers on the Docker host running the btrfs driver is as follows:

  1. The base layer of the image is stored in the Btrfs subvolume under/var/lib/docker/btrfs/subvolume.
  2. The subsequent mirror layer is stored as a child volume of the parent layer or a Btrfs snapshot of the snapshot, but this layer introduces changes. These differences are stored at the block level.
  3. The writable layer of the container is a Btrfs snapshot of the final image layer, and the difference lies in the running container. These differences are stored at the block level.

How the container uses btrfs to read and write data

Read file

  • The container is a space-saving snapshot of the image. The metadata in the snapshot points to the actual data blocks in the storage pool. This is the same as the subvolume. Therefore, the read operation performed on the snapshot is essentially the same as the read operation performed on the subvolume.

Modify file

  • Write new file
    : Writing a new file to the container invokes the on-demand allocation operation to allocate the new data block to the container's snapshot. Then write the file to this new space. The allocation-on-demand operation is inherent to all writes using Btrfs and is the same as writing new data to a subvolume. As a result, the snapshot that writes the new file to the container runs at native Btrfs speed.
  • Modify existing files
    : Updating an existing file in the container is a copy-on-write operation (redirection on write is a Btrfs term). Read the original data from the layer where the file is currently located, and write only the modified block to the writable layer of the container. Next, the Btrfs driver updates the file system metadata in the snapshot to point to this new data. This behavior generates little overhead.
  • Delete files or directories
    : If the container deletes a file or directory in the lower layer, Btrfs will mask the existence of the file or directory in the lower layer. If the container creates a file and then deletes it, this operation is performed in the Btrfs file system itself and the space is reclaimed.

With Btrfs, writing and updating a large number of small files can cause performance degradation.

Btrfs and Docker performance

Under the btrfs storage driver, there are several factors that affect the performance of Docker.

: By using Docker volumes to handle large write workloads instead of relying on storing data in the writable layer of the container, many of these factors can be mitigated. However, in the case of Btrfs, Docker volumes are still affected by these callbacks, unless/var/lib/docker/volumes/is not supported by Btrfs.

  • Page cache . Btrfs does not support page cache sharing. This means that every process that accesses the same file will copy the file into the memory of the Docker host. As a result, the btrfs driver may not be the best choice for high-density use cases such as PaaS.

  • Lowercase . Containers performing a lot of lowercase operations (this usage pattern is the same as what happens when many containers are started and stopped in a short period of time) can also lead to poor use of Btrfs blocks. This may fill the Btrfs file system prematurely and cause the Docker host to run out of space. To use btrfs filesys show, you must closely monitor the amount of free space that increases on the Btrfs device.

  • Write sequentially . Btrfs uses diary technology when writing to disk. This may affect the performance of sequential writes, thereby reducing performance by up to 50%.

  • Fragmentation . Fragmentation is a natural by-product of copy-on-write file systems such as Btrfs. Many small random writes can complicate this problem. When using an SSD, fragmentation may appear as a CPU peak, and when using a spinning disk, it may appear as a head jitter. Any one of these issues can hurt performance.
    If the Linux kernel version is 3.9 or higher, autodefrag can enable this function when mounting the Btrfs volume. Before deploying this feature to a production environment, please test the workload first, as some tests have shown a negative impact on performance.

  • SSD performance: Btrfs includes native optimizations for SSD media. To enable these features, use the -o ssdmount option to mount the Btrfs file system. These optimizations enhance the write performance of SSDs by avoiding optimizations (such as seek optimization not applicable to solid media).

  • Regularly balance the Btrfs file system: cron uses operating system utilities (such as jobs) to regularly balance the Btrfs file system during off-peak hours. This reclaims unallocated blocks and helps prevent the file system from filling up unnecessarily. Unless you add other physical block devices to the file system, you cannot rebalance the complete Btrfs file system.

  • Use fast storage: Solid state drives (SSD) provide faster read and write speeds than spinning disks.

  • Use volumes for heavy write workloads: Volumes provide the best and most predictable performance for heavy write workloads. This is because they bypass the storage driver and do not incur any potential overhead caused by thin provisioning and copy-on-write. Volumes have other benefits, for example, allowing data to be shared between containers and persisting data even if no running containers are using them.

Reference :