What and Why
There are many ways to host data for containers. Earlier I covered containers using LVM, so each container would sit on a separate thinly provisioned Logical Volume managed by LVM. In this instance I’m going to try to do the...
What and Why
There are many ways to host data for containers. Earlier I covered containers using LVM, so each container would sit on a separate thinly provisioned Logical Volume managed by LVM. In this instance I’m going to try to do the same, but with each container sitting on a volume on a ZFS pool. Operationally this should be relatively transparent, but under the hood it does expose some more interesting options, specifically compression and encryption. Typically you would do this on a real machine, however for ease of documentation I’m running inside a KVM based Virtual Machine.To Start
I have a machine installed with Ubuntu Server 23.04, it has 200G of storage, partitioned with a 25G root partition and two empty partitions of 25G for 150G. The larger partition will be a storage pool for container instances, and the smaller will be a compressed and encrypted pool for private data. So, the following will all be done as the root user;$ fdisk -l /dev/vda
Device Start End Sectors Size Type
/dev/vda1 2048 4095 2048 1M BIOS boot
/dev/vda2 4096 52432895 52428800 25G Linux filesystem
/dev/vda3 52432896 104861695 52428800 25G Linux filesystem
/dev/vda4 104861696 419428351 314566656 150G Linux filesystem
The first thing we need to do is install ZFS which includes the kernel modules, tools, and associated libraries.
$ apt install zfs-dkms
This should install ZFS and associated dependencies of which there will be a quite a few. It needs to compile and generate kernel modules so it may take a few minutes to complete.
Creating the ZFS Pools
Next we need to create the desired pools. The container pool is relatively straightforward, all we need is;$ zpool create default /dev/vda4 -m legacy
$ zpool list
NAME SIZE ALLOC FREE FRAG CAP DEDUP HEALTH
default 149G 100K 149G 0% 0% 1.00x ONLINE
$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
default 118K 144G 24K legacy
It gets a little more interesting as we add a pool with compression and encryption;
$ zpool create -O encryption=on -O keyformat=passphrase \
-O keylocation=prompt -o compatibility=off \
-o feature@encryption=enabled -m legacy \
private /dev/vda3
#
# It should then prompt for your passphrase, this should be
# secure and at least 14 characters.
#
Enter new passphrase: ....
Re-enter new passphrase: ....
$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
default 118K 144G 24K legacy
private 198K 23.7G 98K legacy
#
# Now turn on compression for the pool called "private"
#
$ zfs set compression=gzip private
And we should be done. The state of private with regards to encryption is persistent, so once the pool has been unlocked, so it will remain so until either explicitly locked or until the machine is shut down.
Testing ZFS compression
Just to make sure we’re actually getting some compression on the private pool, I’m going to try a simple test, bearing in mind all pools with get de-duplication by default. (so just copying a file full of zero’s won’t actually show compression because de-duplication will have already taken all the blanks out)$ tar cf archive.tar /usr
$ ls -lh
total 2.6G
-rw-r--r-- 1 root root 2.6G Oct 16 12:16 archive.tar
#
# If we create a temporary volume in "private"
#
$ zfs create private/tmp -o mountpoint=/mnt/tmp
#
# Then move our test archive onto it ..
#
$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
default 118K 144G 24K legacy
private 975M 22.8G 98K legacy
private/tmp 975M 22.8G =>975M /mnt/tmp
So the archive when stored on the normal ext4 root filesystem is 2.6G in size, however when moved to the encrypted / compressed filesystem on the private zfs pool, it’s only actually consuming 975M, which seems pretty reasonable. Just to see how much of this is compression and how much is de-duplication, I’ll move it over to the uncompressed default pool;
$ zfs create default/tmp -o mountpoint=/mnt/tmp2
$ mv /mnt/tmp/archive.tar /mnt/tmp2
$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
default 2.49G 142G 24K legacy
default/tmp 2.49G 142G =>2.49G /mnt/tmp2
private 442K 23.7G 98K legacy
private/tmp 98K 23.7G 98K /mnt/tmp
So although we’re getting a little bit of de-duplication saving (2.46G vs 2.6) the majority is coming from the compression we applied to the private volume. We could apply compression to default, however this would slow our containers down somewhat and in this instance I’m not too worries about storage space.
Note compression is typically applied to either an entire pool, or a volume, whereas encryption is typically enabled for an entire pool.
Accessing our encrypted data
If you now reboot your machine, when it comes back up you should see that it has auto-mounted /mnt/tmp2, which is a filesystem we created on the default pool, but not the /mnt/tmp we created on the private pool. (this is because the pool is locked by default) To get access to the private pool we can do;$zfs load-key -a
Enter passphrase for 'private': ...
1 / 1 key(s) successfully loaded
$ zfs mount -a
Now if you take a look at df you should see it has unlocked the private pool and automatically mounted the /mnt/tmp volume.
Adding containers into the mix
Adding containers was previously covered here; But to go over it again in a little less detail;$ snap install lxd
$ lxd init
Would you like to use LXD clustering? (yes/no) no
Do you want to configure a new storage pool? yes
Name of the new storage pool: default
Name of the storage backend to use: zfs
Create a new ZFS pool? (yes/no): no
Name of the existing ZFS pool or dataset: default
Would you like to connect to a MAAS server? no
Would you like to create a new local network bridge? yes
What should the new bridge be called? lxdbr0
What IPv4 address should be used? auto
What IPv6 address should be used? auto
Would you like the LXD server available over the network? yes
Address to bind LXD to (not including port): all
Port to bind LXD to: 8443
Would you like cached images to be updated automatically? yes
Would you like a YAML "lxd init" preseed to be printed? no
Now enable the LXD user interface;
$ snap set lxd ui.enable=true
$ systemctl reload snap.lxd.daemon
If you point your browser at the machine’s port 8443 (for from the machine itself; https://localhost:8443) and follow the instructions, you should be able to install the appropriate client certificates to get the GUI working.
Note I have found the process of installing client certificates in the browser for LXD to, on occasion, be problematic. If you end up with a strange text (JSON) response instead of a web page, you might like to try the following fix which has worked for me;
$ mkdir lxd-api-access-cert-key-files
$ cd lxd-api-access-cert-key-files
$ openssl genrsa -out lxd-webui.key 4096
$ openssl req -new -key lxd-webui.key -out lxd-webui.csropenssl x509 -req -days 3650 -in lxd-webui.csr -signkey lxd-webui.key -out lxd-webui.crt
$ openssl pkcs12 -keypbe PBE-SHA1-3DES -certpbe PBE-SHA1-3DES -export -in $ lxd-webui.crt -inkey lxd-webui.key -out lxd-webui.pfx -name "LXD WebUI"
$ lxc config trust add lxd-webui.crt
# Now download the lxd-webui.pfx file. Locally.
# Import the file to the browser.
IF the issue still persists (note, this will destroy any containers you’ve created), try;
$ snap remove --purge lxd
$ snap install lxd
# At this point you will need to remove all the ZFS volumes
# from **default** because "init" will try to recreate them
$ lxd init
Creating a ZFS based container
So assuming we now have a working UI on https://localhost:8443, we should be seeing something like this; So, if we click on create instance … And follow the yellow brick road, we end up with a running container. Then if you click on storage you it should show you your (ZFS) storage pool with details of space used and space remaining; So our first container has consumed a total of 659Mb of storage, which is probably what you might expect for a basic server installation. On further inspection however, what it’s actually done is to create an immutable base image for the version of Linux you’ve selected, and then a second (copy-on-write) volume containing differences to the base image.$ zfs list -r default/images
NAME USED AVAIL REFER MOUNT
default/images 617M 144G 24K legacy
default/images/5c0f660608... 617M 144G 617M legacy
$ zfs list -r default/containers
NAME USED AVAIL REFER MOUNTPOINT
default/containers 21.3M 144G 24K legacy
default/containers/zfs1 10.6M 144G 622M legacy
So the base image is consuming 617M, but the container itself is only using 10.6M. The useful thing to note is that if we create a second container using the same version of Linux, it can use the same base instance. So whereas the first container consumes 617M + 10.6M of space, the second (and subsequent) will only consume 10.6M of space. (which makes them incredibly space efficient, even before you start to look at de-duplication or compression) Just to prove the point, if I create a second instance;
Then go back an look at storage consumption in the default pool;
$ zfs list -r default/images
NAME USED AVAIL REFER MOUNT
default/images 617M 144G 24K legacy
default/images/5c0f660608... 617M 144G 617M legacy
$ zfs list -r default/containers
NAME USED AVAIL REFER MOUNTPOINT
default/containers 21.3M 144G 24K legacy
default/containers/zfs1 10.6M 144G 622M legacy
default/containers/zfs2 10.6M 144G 622M legacy
Summary
An alternative pool based storage system for LXD based containers.- One or more host managed storage pools
- Access to ZFS options such as compression, encryption, RAID etc
- Out of the box de-duplication
- Lazy space allocation / re-allocation
- Easy access to snapshots for backing up individual containers
- Fully integrated into LXD’s infrastructure and UI