Skip to content

Conversation

@Rudd-O
Copy link
Contributor

@Rudd-O Rudd-O commented Mar 2, 2023

This software allows a user who has a ZFS pool in his Qubes OS dom0 to create Qubes VM volumes directly on his ZFS pool. In this sense, it is analogous to the LVM or the reflink driver that Qubes OS ships with. With this software, it's possible for users of Qubes OS on ZFS to continue using ZFS as their storage backing, without having to choose one or the other.

Usage

Upon pool creation time, the user specifies the container dataset for the Qubes volumes like so:

qvm-pool create -o container mytankpool/vms newzfspool zfs

From then on, Qubes OS stores VM volumes under the mytankpool/vms dataset, in the following convention:

mytankpool/vms/a-vm/volatile
mytankpool/vms/a-vm/root
mytankpool/vms/a-vm/private

These will have corresponding backing device files /dev/mytankpool/vms/<VM>/<volume name>.

The driver automatically manages revisions of volumes as ZFS snapshots (but does not interfere with third party snapshots). A revision snapshot is taken of every persistent volume upon VM shutdown, which the user can later revert to. The standard revisions_to_keep parameter is honored (by default it is 1 and cannot be lower than 1).

Testing

A decently-comprehensive test suite has been included, and the integration tests have been amended to also exercise the ZFS driver. Tests will only run if the command zfs and the command zpool exist in the test system. Unit and integration tests create a ZFS pool backed by a sparse file in /rw or /var/tmp, depending on which one has more than 32 GiB free disk space. No preexisting ZFS pools are used by the tests, which avoids any potential data loss. The test step has been amended to install ZFS directly from the official project repository — thank you @marmarek for the necessary changes in the base image to be able to build kernel modules.

Every line of the driver is typehinted using typing; no type errors are present. It is suggested that the project adopt tox, configuring it to run mypy on every checkin (perhaps initially deliberately ignoring most of the source code until such time that it is ported to use types).

@Rudd-O Rudd-O force-pushed the zfs branch 5 times, most recently from d8badfa to a475cbf Compare March 2, 2023 15:53
@Rudd-O
Copy link
Contributor Author

Rudd-O commented Mar 3, 2023

All review comments resolved. Thank you!

@codecov
Copy link

codecov bot commented Mar 3, 2023

Codecov Report

Merging #522 (ed7fc59) into main (da6acdf) will increase coverage by 1.28%.
The diff coverage is 80.22%.

@@            Coverage Diff             @@
##             main     #522      +/-   ##
==========================================
+ Coverage   65.80%   67.09%   +1.28%     
==========================================
  Files          53       54       +1     
  Lines       10096    11072     +976     
==========================================
+ Hits         6644     7429     +785     
- Misses       3452     3643     +191     
Flag Coverage Δ
unittests 67.09% <80.22%> (+1.28%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
qubes/vm/templatevm.py 92.59% <ø> (ø)
qubes/storage/zfs.py 80.22% <80.22%> (ø)

... and 2 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@Rudd-O Rudd-O force-pushed the zfs branch 7 times, most recently from f188362 to 4a99727 Compare March 4, 2023 04:17
@Rudd-O Rudd-O requested review from marmarek and xaki23 and removed request for marmarek and xaki23 March 4, 2023 13:11
@Rudd-O Rudd-O force-pushed the zfs branch 3 times, most recently from b24c9a9 to 89a1c66 Compare March 5, 2023 03:26
@Rudd-O
Copy link
Contributor Author

Rudd-O commented Mar 5, 2023

OK. This is ready for final review. I've ironed out a bunch of bugs and this works now.

@qubesos-bot
Copy link

qubesos-bot commented Mar 5, 2023

OpenQA test summary

Complete test suite and dependencies: https://openqa.qubes-os.org/tests/overview?distri=qubesos&version=4.2&build=2023031403-4.2&flavor=pull-requests

New failures, excluding unstable

Compared to: https://openqa.qubes-os.org/tests/overview?distri=qubesos&version=4.2&build=2023021823-4.2&flavor=update

  • system_tests_basic_vm_qrexec_gui

  • system_tests_pvgrub_salt_storage

    • TC_41_HVMGrub_debian-11: test_010_template_based_vm (error)
      qubes.exc.QubesVMError: Cannot connect to qrexec agent for 90 secon...

    • TC_42_PVHGrub_debian-11: test_000_standalone_vm (error)
      qubes.exc.QubesVMError: Cannot connect to qrexec agent for 90 secon...

    • TC_42_PVHGrub_debian-11: test_010_template_based_vm (error)
      qubes.exc.QubesVMError: Cannot connect to qrexec agent for 90 secon...

  • system_tests_usbproxy

  • system_tests_network_ipv6

  • system_tests_whonix

    • whonix_torbrowser: unnamed test (unknown)

    • whonix_torbrowser: Failed (test died)
      # Test died: no candidate needle with tag(s) 'anon-whonix-tor-brows...

    • whonix_torbrowser: unnamed test (unknown)

  • system_tests_basic_vm_qrexec_gui_ext4

    • TC_20_NonAudio_whonix-gw-16-pool: test_100_qrexec_filecopy (error)
      qubes.exc.QubesVMError: Cannot connect to qrexec agent for 90 secon...
  • system_tests_qwt_win7@hw1

    • windows_install: Failed (test died)
      # Test died: command './install.sh' failed at /usr/lib/os-autoinst/...
  • system_tests_basic_vm_qrexec_gui_zfs

  • system_tests_vm_qrexec_gui_pipewire

Failed tests

27 failures
  • system_tests_basic_vm_qrexec_gui

  • system_tests_pvgrub_salt_storage

    • TC_41_HVMGrub_debian-11: test_010_template_based_vm (error)
      qubes.exc.QubesVMError: Cannot connect to qrexec agent for 90 secon...

    • TC_42_PVHGrub_debian-11: test_000_standalone_vm (error)
      qubes.exc.QubesVMError: Cannot connect to qrexec agent for 90 secon...

    • TC_42_PVHGrub_debian-11: test_010_template_based_vm (error)
      qubes.exc.QubesVMError: Cannot connect to qrexec agent for 90 secon...

  • system_tests_usbproxy

  • system_tests_network_ipv6

  • system_tests_whonix

    • whonix_torbrowser: unnamed test (unknown)

    • whonix_torbrowser: Failed (test died)
      # Test died: no candidate needle with tag(s) 'anon-whonix-tor-brows...

    • whonix_torbrowser: unnamed test (unknown)

  • system_tests_basic_vm_qrexec_gui_ext4

    • TC_20_NonAudio_whonix-gw-16-pool: test_100_qrexec_filecopy (error)
      qubes.exc.QubesVMError: Cannot connect to qrexec agent for 90 secon...
  • system_tests_qwt_win10@hw1

    • windows_install: Failed (test died)
      # Test died: command './install.sh' failed at /usr/lib/os-autoinst/...
  • system_tests_qwt_win7@hw1

    • windows_install: Failed (test died)
      # Test died: command './install.sh' failed at /usr/lib/os-autoinst/...
  • system_tests_basic_vm_qrexec_gui_zfs

  • system_tests_vm_qrexec_gui_pipewire

Fixed failures

Compared to: https://openqa.qubes-os.org/tests/60652#dependencies

9 fixed
  • system_tests_network

  • system_tests_pvgrub_salt_storage

    • StorageFile: test_001_non_volatile (error)
      subprocess.CalledProcessError: Command '/usr/lib/qubes/destroy-snap...
  • system_tests_splitgpg

  • system_tests_network_ipv6

  • system_tests_network_updates

    • TC_11_QvmTemplateMgmtVM_whonix-gw-16: test_000_template_list (failure)
      qvm-template: error: No matching templates to list
  • system_tests_dispvm

  • system_tests_qwt_win10@hw1

    • windows_install: wait_serial (wait serial expected)
      # wait_serial expected: qr/Rt7qO-\d+-/...
  • system_tests_basic_vm_qrexec_gui@hw1

Unstable tests

  • system_tests_update

    update/Failed (1/5 times with errors)
    • job 55329 # Test died: command '(set -o pipefail; qubesctl --show-output stat...
  • system_tests_gui_tools

    qubesmanager_vmsettings/ (1/3 times with errors)
    qubesmanager_vmsettings/ (1/3 times with errors)
    qubesmanager_vmsettings/Failed (1/3 times with errors)
    • job 60669 # Test died: no candidate needle with tag(s) 'vm-settings-devices-s...
    qubesmanager_vmsettings/Failed (1/3 times with errors)
    • job 60685 # Test died: no candidate needle with tag(s) 'vm-settings-applicati...
  • system_tests_basic_vm_qrexec_gui

    TC_00_AppVM_debian-11/test_223_audio_play_hvm (1/2 times with errors)
    • job 55534 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_fedora-36/test_223_audio_play_hvm (1/2 times with errors)
    • job 55534 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_whonix-ws-16/test_223_audio_play_hvm (1/2 times with errors)
    • job 55534 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_debian-11/test_224_audio_rec_muted_hvm (1/2 times with errors)
    • job 55534 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_fedora-36/test_224_audio_rec_muted_hvm (1/2 times with errors)
    • job 55534 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_whonix-ws-16/test_224_audio_rec_muted_hvm (1/2 times with errors)
    • job 55534 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_debian-11/test_225_audio_rec_unmuted_hvm (1/2 times with errors)
    • job 55534 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_fedora-36/test_225_audio_rec_unmuted_hvm (1/2 times with errors)
    • job 55534 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_whonix-ws-16/test_225_audio_rec_unmuted_hvm (1/2 times with errors)
    • job 55534 libvirt.libvirtError: internal error: libxenlight failed to create ...
  • system_tests_network

    VmNetworking_debian-11/test_112_reattach_after_provider_shutdown (1/2 times with errors)
    • job 60672 qubes.exc.QubesVMShutdownTimeoutError: Domain shutdown timed out: '...
  • system_tests_backupcompatibility

    TC_00_BackupCompatibility/test_220_r2_encrypted (1/2 times with errors)
    • job 55559 lxml.etree.XMLSyntaxError: Document is empty, line 1, column 1
    TC_01_BackupCompatibilityIntoLVM/test_220_r2_encrypted (1/2 times with errors)
    • job 55559 lxml.etree.XMLSyntaxError: Document is empty, line 1, column 1
  • system_tests_pvgrub_salt_storage

    StorageFile/test_001_non_volatile (1/2 times with errors)
    • job 60675 subprocess.CalledProcessError: Command '/usr/lib/qubes/destroy-snap...
  • system_tests_splitgpg

    TC_10_Thunderbird_debian-11/test_000_send_receive_default (1/2 times with errors)
    • job 55548 dogtail.tree.SearchError: descendent of [application | Thunderbird]...
    TC_10_Thunderbird_fedora-36/test_000_send_receive_default (1/2 times with errors)
    • job 55548 dogtail.tree.SearchError: descendent of [menu bar | Application]: c...
    TC_10_Thunderbird_fedora-36/test_010_send_receive_inline_signed_only (1/2 times with errors)
    • job 55548 dogtail.tree.SearchError: descendent of [menu bar | Application]: c...
    TC_10_Thunderbird_debian-11/test_020_send_receive_inline_with_attachment (1/2 times with errors)
    • job 55548 dogtail.tree.SearchError: descendent of [application | Thunderbird]...
    TC_10_Thunderbird_fedora-36/test_020_send_receive_inline_with_attachment (1/2 times with errors)
    • job 55548 dogtail.tree.SearchError: descendent of [menu bar | Application]: c...
  • system_tests_network_ipv6

    VmIPv6Networking_debian-11/test_020_simple_proxyvm_nm (1/2 times with errors)
    • job 60673 AssertionError: 1 != 0 : nm-applet window not found
  • system_tests_network_updates

    TC_10_QvmTemplate_debian-11/test_000_template_list (1/2 times with errors)
    • job 55545 qvm-template: error: No matching templates to list
    TC_10_QvmTemplate_fedora-36/test_000_template_list (1/2 times with errors)
    • job 55545 qvm-template: error: No matching templates to list
    TC_10_QvmTemplate_whonix-gw-16/test_000_template_list (1/2 times with errors)
    • job 55545 qvm-template: error: No matching templates to list
    TC_11_QvmTemplateMgmtVM_whonix-gw-16/test_000_template_list (1/2 times with errors)
    • job 60674 qvm-template: error: No matching templates to list
    TC_10_QvmTemplate_debian-11/test_010_template_install (1/2 times with errors)
    • job 55545 qvm-template: error: Template 'debian-11-minimal' not found.
    TC_10_QvmTemplate_fedora-36/test_010_template_install (1/2 times with errors)
    • job 55545 qvm-template: error: Template 'debian-11-minimal' not found.
    TC_10_QvmTemplate_whonix-gw-16/test_010_template_install (1/2 times with errors)
    • job 55545 qvm-template: error: Template 'debian-11-minimal' not found.
    TC_11_QvmTemplateMgmtVM_debian-11/test_010_template_install (1/2 times with errors)
    • job 55545 AssertionError: qvm-template failed: Downloading 'qubes-template-de...
    TC_11_QvmTemplateMgmtVM_fedora-36/test_010_template_install (1/2 times with errors)
    • job 55545 AssertionError: qvm-template failed: Downloading 'qubes-template-de...
    TC_11_QvmTemplateMgmtVM_whonix-gw-16/test_010_template_install (1/2 times with errors)
    • job 55545 AssertionError: qvm-template failed: Downloading 'qubes-template-de...
  • system_tests_dispvm

    TC_20_DispVM_fedora-36/test_100_open_in_dispvm (1/2 times with errors)
    • job 55539 self.assertEqual(test_txt_content.s... AssertionError: b'' != b'test1'
    TC_20_DispVM_whonix-ws-16/test_100_open_in_dispvm (1/2 times with errors)
    • job 60668 AssertionError: libvirt event impl drain timeout
  • system_tests_basic_vm_qrexec_gui_xfs

    TC_00_AppVM_debian-11-pool/test_223_audio_play_hvm (1/2 times with errors)
    • job 55537 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_fedora-36-pool/test_223_audio_play_hvm (1/2 times with errors)
    • job 55537 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_whonix-ws-16-pool/test_223_audio_play_hvm (1/2 times with errors)
    • job 55537 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_debian-11-pool/test_224_audio_rec_muted_hvm (1/2 times with errors)
    • job 55537 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_fedora-36-pool/test_224_audio_rec_muted_hvm (1/2 times with errors)
    • job 55537 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_whonix-ws-16-pool/test_224_audio_rec_muted_hvm (1/2 times with errors)
    • job 55537 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_debian-11-pool/test_225_audio_rec_unmuted_hvm (1/2 times with errors)
    • job 55537 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_fedora-36-pool/test_225_audio_rec_unmuted_hvm (1/2 times with errors)
    • job 55537 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_whonix-ws-16-pool/test_225_audio_rec_unmuted_hvm (1/2 times with errors)
    • job 55537 libvirt.libvirtError: internal error: libxenlight failed to create ...
  • system_tests_basic_vm_qrexec_gui_ext4

    TC_00_AppVM_debian-11-pool/test_220_audio_play (1/3 times with errors)
    • job 68388 subprocess.CalledProcessError: Command '['pkill', 'parecord']' retu...
    TC_00_AppVM_fedora-37-pool/test_220_audio_play (1/3 times with errors)
    • job 68388 subprocess.CalledProcessError: Command '['pkill', 'parecord']' retu...
    TC_00_AppVM_whonix-ws-16-pool/test_220_audio_play (1/3 times with errors)
    • job 68388 subprocess.CalledProcessError: Command '['pkill', 'parecord']' retu...
    TC_00_AppVM_debian-11-pool/test_223_audio_play_hvm (2/3 times with errors)
    • job 55536 libvirt.libvirtError: internal error: libxenlight failed to create ...
    • job 68388 subprocess.CalledProcessError: Command '['pkill', 'parecord']' retu...
    TC_00_AppVM_fedora-36-pool/test_223_audio_play_hvm (1/3 times with errors)
    • job 55536 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_fedora-37-pool/test_223_audio_play_hvm (1/3 times with errors)
    • job 68388 subprocess.CalledProcessError: Command '['pkill', 'parecord']' retu...
    TC_00_AppVM_whonix-ws-16-pool/test_223_audio_play_hvm (2/3 times with errors)
    • job 55536 libvirt.libvirtError: internal error: libxenlight failed to create ...
    • job 68388 subprocess.CalledProcessError: Command '['pkill', 'parecord']' retu...
    TC_00_AppVM_debian-11-pool/test_224_audio_rec_muted_hvm (1/3 times with errors)
    • job 55536 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_fedora-36-pool/test_224_audio_rec_muted_hvm (1/3 times with errors)
    • job 55536 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_whonix-ws-16-pool/test_224_audio_rec_muted_hvm (1/3 times with errors)
    • job 55536 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_debian-11-pool/test_225_audio_rec_unmuted_hvm (1/3 times with errors)
    • job 55536 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_fedora-36-pool/test_225_audio_rec_unmuted_hvm (1/3 times with errors)
    • job 55536 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_whonix-ws-16-pool/test_225_audio_rec_unmuted_hvm (1/3 times with errors)
    • job 55536 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_whonix-gw-16-pool/test_300_bug_1028_gui_memory_pinning (1/3 times with errors)
    • job 68388 AssertionError: Dom0 window doesn't match VM window content
  • system_tests_update@hw1

    update/Failed (1/5 times with errors)
    • job 55329 # Test died: command '(set -o pipefail; qubesctl --show-output stat...
  • system_tests_basic_vm_qrexec_gui@hw1

    TC_00_AppVM_debian-11/test_220_audio_play (1/3 times with errors)
    • job 68408 subprocess.CalledProcessError: Command '['pkill', 'parecord']' retu...
    TC_00_AppVM_fedora-37/test_220_audio_play (1/3 times with errors)
    • job 68408 subprocess.CalledProcessError: Command '['pkill', 'parecord']' retu...
    TC_00_AppVM_whonix-ws-16/test_220_audio_play (1/3 times with errors)
    • job 68408 subprocess.CalledProcessError: Command '['pkill', 'parecord']' retu...
    TC_00_AppVM_debian-11/test_223_audio_play_hvm (2/3 times with errors)
    • job 55534 libvirt.libvirtError: internal error: libxenlight failed to create ...
    • job 68408 subprocess.CalledProcessError: Command '['pkill', 'parecord']' retu...
    TC_00_AppVM_fedora-36/test_223_audio_play_hvm (1/3 times with errors)
    • job 55534 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_fedora-37/test_223_audio_play_hvm (1/3 times with errors)
    • job 68408 subprocess.CalledProcessError: Command '['pkill', 'parecord']' retu...
    TC_00_AppVM_whonix-ws-16/test_223_audio_play_hvm (2/3 times with errors)
    • job 55534 libvirt.libvirtError: internal error: libxenlight failed to create ...
    • job 68408 subprocess.CalledProcessError: Command '['pkill', 'parecord']' retu...
    TC_00_AppVM_debian-11/test_224_audio_rec_muted_hvm (1/3 times with errors)
    • job 55534 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_fedora-36/test_224_audio_rec_muted_hvm (1/3 times with errors)
    • job 55534 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_whonix-ws-16/test_224_audio_rec_muted_hvm (1/3 times with errors)
    • job 55534 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_debian-11/test_225_audio_rec_unmuted_hvm (1/3 times with errors)
    • job 55534 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_fedora-36/test_225_audio_rec_unmuted_hvm (1/3 times with errors)
    • job 55534 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_whonix-ws-16/test_225_audio_rec_unmuted_hvm (1/3 times with errors)
    • job 55534 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_whonix-gw-16/test_300_bug_1028_gui_memory_pinning (1/3 times with errors)
    • job 68408 AssertionError: Dom0 window doesn't match VM window content
  • system_tests_gui_tools@hw1

    qubesmanager_vmsettings/ (1/3 times with errors)
    qubesmanager_vmsettings/ (1/3 times with errors)
    qubesmanager_vmsettings/Failed (1/3 times with errors)
    • job 60669 # Test died: no candidate needle with tag(s) 'vm-settings-devices-s...
    qubesmanager_vmsettings/Failed (1/3 times with errors)
    • job 60685 # Test died: no candidate needle with tag(s) 'vm-settings-applicati...
  • system_tests_basic_vm_qrexec_gui_btrfs

    TC_00_AppVM_debian-11-pool/test_101_qrexec_filecopy_with_autostart (1/3 times with errors)
    • job 68387 AssertionError: qvm-copy-to-vm failed: b'Request refused\n'
    TC_00_AppVM_debian-11-pool/test_220_audio_play (1/3 times with errors)
    • job 68387 subprocess.CalledProcessError: Command '['pkill', 'parecord']' retu...
    TC_00_AppVM_fedora-37-pool/test_220_audio_play (1/3 times with errors)
    • job 68387 subprocess.CalledProcessError: Command '['pkill', 'parecord']' retu...
    TC_00_AppVM_whonix-ws-16-pool/test_220_audio_play (1/3 times with errors)
    • job 68387 subprocess.CalledProcessError: Command '['pkill', 'parecord']' retu...
    TC_00_AppVM_debian-11-pool/test_223_audio_play_hvm (2/3 times with errors)
    • job 55535 libvirt.libvirtError: internal error: libxenlight failed to create ...
    • job 68387 subprocess.CalledProcessError: Command '['pkill', 'parecord']' retu...
    TC_00_AppVM_fedora-36-pool/test_223_audio_play_hvm (1/3 times with errors)
    • job 55535 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_fedora-37-pool/test_223_audio_play_hvm (1/3 times with errors)
    • job 68387 subprocess.CalledProcessError: Command '['pkill', 'parecord']' retu...
    TC_00_AppVM_whonix-ws-16-pool/test_223_audio_play_hvm (2/3 times with errors)
    • job 55535 libvirt.libvirtError: internal error: libxenlight failed to create ...
    • job 68387 subprocess.CalledProcessError: Command '['pkill', 'parecord']' retu...
    TC_00_AppVM_debian-11-pool/test_224_audio_rec_muted_hvm (1/3 times with errors)
    • job 55535 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_fedora-36-pool/test_224_audio_rec_muted_hvm (1/3 times with errors)
    • job 55535 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_whonix-ws-16-pool/test_224_audio_rec_muted_hvm (1/3 times with errors)
    • job 55535 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_debian-11-pool/test_225_audio_rec_unmuted_hvm (1/3 times with errors)
    • job 55535 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_fedora-36-pool/test_225_audio_rec_unmuted_hvm (1/3 times with errors)
    • job 55535 libvirt.libvirtError: internal error: libxenlight failed to create ...
    TC_00_AppVM_whonix-ws-16-pool/test_225_audio_rec_unmuted_hvm (1/3 times with errors)
    • job 55535 libvirt.libvirtError: internal error: libxenlight failed to create ...

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Mar 5, 2023

Looks like I didn't add any OpenQA failures!

(They look a bit concerning tho.)

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Mar 6, 2023

And the pipeline failed again.

@marmarek
Copy link
Member

marmarek commented Mar 6, 2023

See updated report...

  • system_tests_basic_vm_qrexec_gui_zfs

Focus on this section. There are basically three concerning things:

  1. A lot of Not enough memory to start domain. This test is running with 6GB RAM (as the minimum requirements). With other pools it doesn't fail this way, and definitely not this often.
  2. File "/usr/lib/python3.11/site-packages/qubes/storage/zfs.py", line 2051, in import_volume raise NotImplementedError( NotImplementedError: Cannot import from testpool/test-inst-vm1/volatile -- volumes where save_on_stop=False cannot be exported for import - https://openqa.qubes-os.org/tests/68259#step/TC_05_StandaloneVM_fedora-37-pool/1 (and few others). This is creating a StandaloneVM by cloning a template. For other drivers, import to save_on_stop=False is no-op instead of a failure.
  3. https://openqa.qubes-os.org/tests/68259#step/TC_00_Basic/10 - udev in dom0 parsed VM's disk image, either device naming needs to be adjusted to fit current filtering, or udev rules need a change

@marmarek
Copy link
Member

marmarek commented Mar 8, 2023

https://openqa.qubes-os.org/tests/68435#step/switch_pool/43
Now starting VM fails with cannot create 'testpool/sys-net/root': parent does not exist. I guess too much stuff hidden via udev rules?

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Mar 8, 2023

I donno man. I have the exact same udev rules on my system, yet I can create, clone and start VMs no problem.

Lemme see how to repro this.

EDIT: I repro'd. Figuring a fix out right now.

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Mar 8, 2023

The fix is in! It had nothing to do with udev rules. Basically it was a new patch I added to resolve @DemiMarie 's request that clone / import be atomic, whereby renaming the dataset from a temporary one to its final destination caused failure if the parent of the final destination did not exist. Resolved easily with a -p.

    Address bug introduced with the atomic creation patch.
    
    The atomic creation mechanism is as follows:
    
    * Clone (from ZFS source) or create (for `dd of=`) a temporary volume.
      * This volume is stored in `pool/.tmp/name-of-dataset`
    * Ensure clone is finished.
    * Swap the dataset into place.
      * Its final place will be `pool/dataset/{private,root,volatile...}`
    
    The problem with this algorithm is that `pool/dataset` must exist
    before `pool/.tmp/name-of-dataset` can be atomically renamed to
    `pool/dataset/X`.
    
    The solution is to add `-p` flag to the rename call.
    
    Some cache optimizations are also included here.
    
    Finally, a detection of whether the pool of the source and the pool
    of the target has been added.  Without this, cross-pool cloning
    would just fail because the source pool and the target pool were
    different, and ZFS clone cannot clone across pools.

It bears noting that the reason the test https://openqa.qubes-os.org/tests/68435#step/switch_pool/43 fails with that particular VM and in that particular case is because the root volume from its template is ZFS, but the other volumes aren't, which means the parent of the root volume clone doesn't exist (up until copy-on-write creation) when the VM is about to start. The fix I just pushed solves that issue.

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Mar 8, 2023

It irks me enormously that errors do not dump tracebacks in qubesd's log. This is not right! Show me the traceback please!

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Mar 8, 2023

The test failure is because of the kernel version in the test VM. I can't rerun the CI.

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Mar 9, 2023

The manually-edited OpenQA run succeeded with the change newly-applied to PR QubesOS/qubes-core-admin-linux#118 .

https://openqa.qubes-os.org/tests/68567#step/system_tests/25

Woohoo!

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Mar 10, 2023

There is some weird failure with rpmcanon while installing the XFCE template here:

https://openqa.qubes-os.org/tests/68657/video?filename=video.ogv

(at the very end)

I cannot reproduce this locally.

@marmarek
Copy link
Member

There is some weird failure with rpmcanon while installing the XFCE template here:

Don't worry about it. It's because the template is not there at all.

@marmarek
Copy link
Member

FYI before merging, I'd like you to squash this into a reasonable number of commits - especially fixes for things added in this PR should be merged into the commit that adds the code (in other words: don't leave known-buggy commits). If that means adding the driver in one big commit, I'm okay with that.

The same code works fine in release 4.1 and is therefore safe to backport.
@Rudd-O
Copy link
Contributor Author

Rudd-O commented Mar 11, 2023

I've squashed it all.

@Rudd-O Rudd-O requested a review from marmarek March 11, 2023 03:27
@marmarek
Copy link
Member

marmarek commented Mar 11, 2023

There are 3 not resolved comments. The one about snapshots on import you can decide to ignore - up to you. But other two need handling.

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Mar 11, 2023

There are 3 not resolved comments. The one about snapshots on import you can decide to ignore - up to you. But other two need handling.

Thanks. I don't kow how I missed these. They've been addressed.

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Mar 11, 2023

Yeah we can't do the trick of import data as we had speculated, because of this (in qubes/api/admin.py):

    async def vm_volume_clear(self):
        self.enforce(self.arg in self.dest.volumes.keys())

        self.fire_event_for_permission()

        volume = self.dest.volumes[self.arg]
        size = volume.size

        # Clear the volume by importing empty data into it
        path = await self.dest.storage.import_data(self.arg, size)
        self.dest.fire_event('domain-volume-import-begin',
            volume=self.arg, size=size)
        pathlib.Path(path).touch()
        try:
            await self.dest.storage.import_data_end(self.arg, True)
        except:
            self.dest.fire_event('domain-volume-import-end',
                volume=self.arg, success=False)
            raise
        self.dest.fire_event('domain-volume-import-end',
            volume=self.arg, success=True)
        self.app.save()

The clear mechanism relies on import_data to have wiped out everything after returning (which I am glad to report was my original implementation to begin with, even before reading this code). That's not something we can do by rewinding the pool to a prior snapshot at all.

Why would there be a touch there tho? That would only ever work with a file-based storage volume. Is it gonna touch the block device for what? :-D

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Mar 11, 2023 via email

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Mar 12, 2023

The builder is failing:

lrwxrwxrwx. 1 root root 39 Nov 23 09:55 /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf
ping: gitlab.com: Temporary failure in name resolution

@DemiMarie
Copy link
Contributor

PipelineRetryFailed

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Mar 13, 2023

Anything else missing?

@marmarek
Copy link
Member

Anything else missing?

I think it's okay now. I'll let the current test run to finish first and do a few manual tests, but I hope it will be okay.

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Mar 15, 2023

The qrexec tests failed, but they failed for all storage variants, so I guess that's okay since every other test did pass.

@marmarek
Copy link
Member

FYI, I have checked manually that even if you have some pool and/or qubes on zfs but zfs itself is not supported (no tools and/or no kernel module), qubesd works just fine, and other qubes can be used normally.

@marmarek marmarek merged commit 45e3802 into QubesOS:main Mar 16, 2023
@marmarek marmarek mentioned this pull request Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants