Skip to content

Conversation

@albinsun
Copy link
Contributor

@albinsun albinsun commented May 8, 2024

Changes

  1. FIX Live migration fail caused by compatibility of emulated VM CPU
    • Set libvirt.cpu_mode to compatibility-oriented host-model (default value) instead of performance-oriented host-passthrough
    • See https://libvirt.org/formatdomain.html#cpu-model-and-topology

      ... However, for backward compatibility host-model may be implemented even for domains running on emulated CPUs in which case the best CPU the hypervisor is able to emulate may be used rather then trying to mimic the host CPU model.

    • image

Issue

Ref. [BUG] Live migration fail when upgrade v1.2.1 to v1.2.2-rc2 due to virError

Guest VM live migration fail due to Harvester's CPU doesn't match specification and missing feature flag waitpkg.

image

VirtualMachineInstance migration uid 5de2134c-25e2-404e-88b2-9307f54866c8 failed. reason:
Live migration failed error encountered during MigrateToURI3 libvirt api call: 
virError(Code=9, Domain=31, Message='operation failed: guest CPU doesn't match specification: missing features: waitpkg')

Cause

Ref. harvester/harvester#5755 (comment)

Some QEMU change between SLES SP4 and SP5. The issue happens when harvester nodes are in VMs and guests are in nested VMs. Here is the words from virtualization team:

but the bug is rather that you see the waitpkg flag in SP4, more than the fact that you don't see it in SP5

yes, SP5's QEMU behavior is correct, i.e., on your particular hardware, it's ok to not advertise that flag in a nested VM. It's actually SP4's QEMU that is at fault, i.e., it shouldn't advertise it in the first place, while instead it did. As I said, I can backport the fix to SP's QEMU, but this won't probably help you for that particular VM (or it would break it in even worse way, when/if the updated QEMU would reach SP4's KubeVirt)

@albinsun albinsun changed the title Change cpu_mode to default value (currently host-model) for reliability. Fix CPU compatibility problem by set cpu_mode to host-model May 8, 2024
@albinsun albinsun force-pushed the use_host-model_cpu_mode branch from 44815b9 to 1c36c07 Compare May 8, 2024 14:42
@albinsun albinsun force-pushed the use_host-model_cpu_mode branch from 1c36c07 to 3a91c35 Compare May 8, 2024 14:44
@bk201
Copy link

bk201 commented May 10, 2024

How about putting this to a setting and default it to host-passthrough?
So for machines with the issue, we can edit the setting.

@votdev
Copy link
Member

votdev commented May 13, 2024

How about putting this to a setting and default it to host-passthrough? So for machines with the issue, we can edit the setting.

In this case a FAQ is necessary to assist the user on identifying the problem and the root cause that leads to the problem. Otherwise, we have a settings option that nobody knows exactly what it is for and when to use it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants