Skip to content

[BUG] longhorn manager crash in installation #11743

@chriscchien

Description

@chriscchien

Describe the Bug

deploy longhorn master by kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/longhorn.yaml, longhorn manager can not ready

# kl get pods
NAME                                       READY   STATUS             RESTARTS      AGE
longhorn-driver-deployer-6fbcc5b4d-hgcw5   0/1     Init:0/1           0             3m31s
longhorn-manager-9vkz6                     1/2     CrashLoopBackOff   3 (16s ago)   3m31s
longhorn-manager-n8rz4                     1/2     Running            4 (63s ago)   3m31s
longhorn-manager-qfshl                     1/2     CrashLoopBackOff   4 (42s ago)   3m31s
longhorn-ui-f7ff9c74-ssq7h                 1/1     Running            0             3m31s
longhorn-ui-f7ff9c74-wvh67                 1/1     Running            0             3m31s
# kl logs longhorn-manager-9vkz6
Defaulted container "longhorn-manager" out of: longhorn-manager, pre-pull-share-manager-image
warning: GOCOVERDIR not set, no coverage data emitted
I0911 07:46:46.059575       1 leaderelection.go:257] attempting to acquire leader lease longhorn-system/longhorn-manager-webhook-lock...
time="2025-09-11T07:46:46.066717645Z" level=info msg="Webhook leader elected: ip-172-31-33-173" func=app.startWebhooksByLeaderElection.func4 file="daemon.go:234"
time="2025-09-11T07:47:07.469004658Z" level=info msg="Webhook leader elected: ip-172-31-34-193" func=app.startWebhooksByLeaderElection.func4 file="daemon.go:234"
time="2025-09-11T07:48:08.817754772Z" level=info msg="Webhook leader elected: ip-172-31-33-173" func=app.startWebhooksByLeaderElection.func4 file="daemon.go:234"
I0911 07:48:31.738395       1 leaderelection.go:271] successfully acquired lease longhorn-system/longhorn-manager-webhook-lock
W0911 07:48:31.738575       1 client_config.go:667] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0911 07:48:31.739771       1 shared_informer.go:349] "Waiting for caches to sync" controller="longhorn datastore"
I0911 07:48:31.748921       1 warnings.go:110] "Warning: v1 Endpoints is deprecated in v1.33+; use discovery.k8s.io/v1 EndpointSlice"
I0911 07:48:31.754985       1 warnings.go:110] "Warning: v1 Endpoints is deprecated in v1.33+; use discovery.k8s.io/v1 EndpointSlice"
I0911 07:48:31.839977       1 shared_informer.go:356] "Caches are synced" controller="longhorn datastore"
time="2025-09-11T07:48:31.840014774Z" level=info msg="Starting longhorn admission webhook server" func=webhook.StartWebhook file="webhook.go:27"
time="2025-09-11T07:48:31.840063089Z" level=info msg="Waiting for admission webhook to become ready" func=webhook.StartWebhook file="webhook.go:46"
time="2025-09-11T07:48:31.840228921Z" level=info msg="Add validation handler for nodes.longhorn.io (Node)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.840291743Z" level=info msg="Add validation handler for settings.longhorn.io (Setting)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.840315517Z" level=info msg="Add validation handler for recurringjobs.longhorn.io (RecurringJob)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.840343853Z" level=info msg="Add validation handler for backingimages.longhorn.io (BackingImage)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.840363521Z" level=info msg="Add validation handler for backupbackingimages.longhorn.io (BackupBackingImage)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.840392763Z" level=info msg="Add validation handler for backups.longhorn.io (Backup)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.840412485Z" level=info msg="Add validation handler for backupvolumes.longhorn.io (BackupVolume)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.840444182Z" level=info msg="Add validation handler for backuptargets.longhorn.io (BackupTarget)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.840475172Z" level=info msg="Add validation handler for volumes.longhorn.io (Volume)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.840503991Z" level=info msg="Add validation handler for orphans.longhorn.io (Orphan)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.84056088Z" level=info msg="Add validation handler for snapshots.longhorn.io (Snapshot)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.840587362Z" level=info msg="Add validation handler for supportbundles.longhorn.io (SupportBundle)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.840605626Z" level=info msg="Add validation handler for systembackups.longhorn.io (SystemBackup)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.840624322Z" level=info msg="Add validation handler for systemrestores.longhorn.io (SystemRestore)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.840646089Z" level=info msg="Add validation handler for volumeattachments.longhorn.io (VolumeAttachment)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.840673637Z" level=info msg="Add validation handler for engines.longhorn.io (Engine)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.840709194Z" level=info msg="Add validation handler for replicas.longhorn.io (Replica)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.840729716Z" level=info msg="Add validation handler for instancemanagers.longhorn.io (InstanceManager)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.840763074Z" level=info msg="Add validation handler for persistentvolumeclaims. (PersistentVolumeClaim)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.840781663Z" level=info msg="Add validation handler for engineimages.longhorn.io (EngineImage)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.840813802Z" level=info msg="Add mutation handler for backups.longhorn.io (Backup)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.840834269Z" level=info msg="Add mutation handler for backingimages.longhorn.io (BackingImage)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.840856281Z" level=info msg="Add mutation handler for backingimagemanagers.longhorn.io (BackingImageManager)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.840891311Z" level=info msg="Add mutation handler for backingimagedatasources.longhorn.io (BackingImageDataSource)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.840911834Z" level=info msg="Add mutation handler for nodes.longhorn.io (Node)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.840947742Z" level=info msg="Add mutation handler for volumes.longhorn.io (Volume)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.840969902Z" level=info msg="Add mutation handler for engines.longhorn.io (Engine)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.8409941Z" level=info msg="Add mutation handler for recurringjobs.longhorn.io (RecurringJob)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.841013201Z" level=info msg="Add mutation handler for engineimages.longhorn.io (EngineImage)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.841049821Z" level=info msg="Add mutation handler for orphans.longhorn.io (Orphan)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.841069868Z" level=info msg="Add mutation handler for sharemanagers.longhorn.io (ShareManager)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.841103245Z" level=info msg="Add mutation handler for backuptargets.longhorn.io (BackupTarget)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.841122053Z" level=info msg="Add mutation handler for backupvolumes.longhorn.io (BackupVolume)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.84114032Z" level=info msg="Add mutation handler for snapshots.longhorn.io (Snapshot)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.841160048Z" level=info msg="Add mutation handler for replicas.longhorn.io (Replica)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.841177917Z" level=info msg="Add mutation handler for supportbundles.longhorn.io (SupportBundle)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.841215958Z" level=info msg="Add mutation handler for systembackups.longhorn.io (SystemBackup)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.841240403Z" level=info msg="Add mutation handler for volumeattachments.longhorn.io (VolumeAttachment)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.841273207Z" level=info msg="Add mutation handler for instancemanagers.longhorn.io (InstanceManager)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.841298309Z" level=info msg="Add mutation handler for backupbackingimages.longhorn.io (BackupBackingImage)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.841332995Z" level=info msg="Add mutation handler for settings.longhorn.io (Setting)" func=server.addHandler file="handler.go:17"
time="2025-09-11T07:48:31.84394671Z" level=warning msg="Validating webhook configuration longhorn-webhook-validator is not ready" func=webhook.isAdmissionConfigurationReady file="webhook.go:76" error="validatingwebhookconfigurations.admissionregistration.k8s.io \"longhorn-webhook-validator\" not found"
E0911 07:48:31.846100       1 panic.go:262] "Observed a panic" panic="runtime error: invalid memory address or nil pointer dereference" panicGoValue="\"invalid memory address or nil pointer dereference\"" stacktrace=<
	goroutine 612 [running]:
	k8s.io/apimachinery/pkg/util/runtime.logPanic({0x32e36a8, 0xc000b76a50}, {0x296a280, 0x4d50340})
		/app/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:132 +0xbc
	k8s.io/apimachinery/pkg/util/runtime.handleCrash({0x32e3788, 0x4dc46c0}, {0x296a280, 0x4d50340}, {0xc000c9ba58, 0x0, 0xc000b72540?})
		/app/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:107 +0x116
	k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000bba8c0?})
		/app/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:64 +0x17b
	panic({0x296a280?, 0x4d50340?})
		/usr/lib64/go/1.25/src/runtime/panic.go:783 +0x132
	github.com/rancher/dynamiclistener/storage/kubernetes.(*storage).saveInK8s(0xc000274d20, 0x0)
		/app/vendor/github.com/rancher/dynamiclistener/storage/kubernetes/controller.go:215 +0x348
	github.com/rancher/dynamiclistener/storage/kubernetes.(*storage).update.func1()
		/app/vendor/github.com/rancher/dynamiclistener/storage/kubernetes/controller.go:247 +0x2d
	k8s.io/client-go/util/retry.OnError.func1()
		/app/vendor/k8s.io/client-go/util/retry/util.go:51 +0x30
	k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection(0x0?)
		/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:150 +0x3e
	k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff({0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0x0}, 0xc000c9be50)
		/app/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:477 +0x5a
	k8s.io/client-go/util/retry.OnError({0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0x0}, 0x14?, 0xc06f00?)
		/app/vendor/k8s.io/client-go/util/retry/util.go:50 +0x96
	github.com/rancher/dynamiclistener/storage/kubernetes.(*storage).update(0xc000274d20)
		/app/vendor/github.com/rancher/dynamiclistener/storage/kubernetes/controller.go:246 +0x85
	github.com/rancher/dynamiclistener/storage/kubernetes.(*storage).processQueue(0xc000274d20)
		/app/vendor/github.com/rancher/dynamiclistener/storage/kubernetes/controller.go:152 +0xaa
	github.com/rancher/dynamiclistener/storage/kubernetes.(*storage).runQueue.func1()
		/app/vendor/github.com/rancher/dynamiclistener/storage/kubernetes/controller.go:138 +0x25
	created by github.com/rancher/dynamiclistener/storage/kubernetes.(*storage).runQueue in goroutine 596
		/app/vendor/github.com/rancher/dynamiclistener/storage/kubernetes/controller.go:137 +0x4f
 >
panic: runtime error: invalid memory address or nil pointer dereference [recovered, repanicked]
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x2539ba8]

goroutine 612 [running]:
k8s.io/apimachinery/pkg/util/runtime.handleCrash({0x32e3788, 0x4dc46c0}, {0x296a280, 0x4d50340}, {0xc000c9ba58, 0x0, 0xc000b72540?})
	/app/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:114 +0x1a9
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000bba8c0?})
	/app/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:64 +0x17b
panic({0x296a280?, 0x4d50340?})
	/usr/lib64/go/1.25/src/runtime/panic.go:783 +0x132
github.com/rancher/dynamiclistener/storage/kubernetes.(*storage).saveInK8s(0xc000274d20, 0x0)
	/app/vendor/github.com/rancher/dynamiclistener/storage/kubernetes/controller.go:215 +0x348
github.com/rancher/dynamiclistener/storage/kubernetes.(*storage).update.func1()
	/app/vendor/github.com/rancher/dynamiclistener/storage/kubernetes/controller.go:247 +0x2d
k8s.io/client-go/util/retry.OnError.func1()
	/app/vendor/k8s.io/client-go/util/retry/util.go:51 +0x30
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection(0x0?)
	/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:150 +0x3e
k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff({0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0x0}, 0xc000c9be50)
	/app/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:477 +0x5a
k8s.io/client-go/util/retry.OnError({0x989680, 0x3ff0000000000000, 0x3fb999999999999a, 0x5, 0x0}, 0x14?, 0xc06f00?)
	/app/vendor/k8s.io/client-go/util/retry/util.go:50 +0x96
github.com/rancher/dynamiclistener/storage/kubernetes.(*storage).update(0xc000274d20)
	/app/vendor/github.com/rancher/dynamiclistener/storage/kubernetes/controller.go:246 +0x85
github.com/rancher/dynamiclistener/storage/kubernetes.(*storage).processQueue(0xc000274d20)
	/app/vendor/github.com/rancher/dynamiclistener/storage/kubernetes/controller.go:152 +0xaa
github.com/rancher/dynamiclistener/storage/kubernetes.(*storage).runQueue.func1()
	/app/vendor/github.com/rancher/dynamiclistener/storage/kubernetes/controller.go:138 +0x25
created by github.com/rancher/dynamiclistener/storage/kubernetes.(*storage).runQueue in goroutine 596
	/app/vendor/github.com/rancher/dynamiclistener/storage/kubernetes/controller.go:137 +0x4f

environment check complete and did not see error

# ./longhornctl check preflight
INFO[2025-09-11T07:49:59Z] Initializing preflight checker               
INFO[2025-09-11T07:49:59Z] Cleaning up preflight checker                
INFO[2025-09-11T07:49:59Z] Running preflight checker                    
INFO[2025-09-11T07:50:03Z] Retrieved preflight checker result:
ip-172-31-33-173:
  info:
  - Service iscsid is running
  - NFS4 is supported
  - Package nfs-common is installed
  - Package open-iscsi is installed
  - Package cryptsetup is installed
  - Package dmsetup is installed
  - Module dm_crypt is loaded
  warn:
  - Kube DNS "coredns" is set with fewer than 2 replicas; consider increasing replica count for high availability
  - multipathd.service is running. Please refer to https://longhorn.io/kb/troubleshooting-volume-with-multipath/ for more information.
ip-172-31-34-193:
  info:
  - Service iscsid is running
  - NFS4 is supported
  - Package nfs-common is installed
  - Package open-iscsi is installed
  - Package cryptsetup is installed
  - Package dmsetup is installed
  - Module dm_crypt is loaded
  warn:
  - Kube DNS "coredns" is set with fewer than 2 replicas; consider increasing replica count for high availability
  - multipathd.service is running. Please refer to https://longhorn.io/kb/troubleshooting-volume-with-multipath/ for more information.
ip-172-31-42-63:
  info:
  - Service iscsid is running
  - NFS4 is supported
  - Package nfs-common is installed
  - Package open-iscsi is installed
  - Package cryptsetup is installed
  - Package dmsetup is installed
  - Module dm_crypt is loaded
  warn:
  - Kube DNS "coredns" is set with fewer than 2 replicas; consider increasing replica count for high availability
  - multipathd.service is running. Please refer to https://longhorn.io/kb/troubleshooting-volume-with-multipath/ for more information. 
INFO[2025-09-11T07:50:03Z] Cleaning up preflight checker                
INFO[2025-09-11T07:50:03Z] Completed preflight checker  

To Reproduce

  1. In a new created cluster
  2. Install longhorn by kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/longhorn.yaml

Expected Behavior

Deploy success

Support Bundle for Troubleshooting

Longhorn not ready, can not generate support bundle

Environment

  • Longhorn version: master
  • Impacted volume (PV):
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl):
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version:
    • Number of control plane nodes in the cluster:
    • Number of worker nodes in the cluster:
  • Node config
    • OS type and version:
    • Kernel version:
    • CPU per node:
    • Memory per node:
    • Disk type (e.g. SSD/NVMe/HDD):
    • Network bandwidth between the nodes (Gbps):
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
  • Number of Longhorn volumes in the cluster:

Additional context

No response

Workaround and Mitigation

No response

Metadata

Metadata

Labels

area/install-uninstall-upgradeInstall, Uninstall or Upgrade relatedkind/bugreproduce/always100% reproducibleseverity/1Function broken (a critical incident with very high impact (ex: data corruption, failed upgrade)

Type

Projects

Status

Closed

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions