[WIP] High: fenced: Registration of a STONITH device without placement constraints fails. #3849

HideoYamauchi · 2025-03-19T06:17:56Z

Hi All,

Starting with Pacemaker 3.0.0, registration of STONITH devices without placement constraints is not performed, so unfencing of STONITH devices fails.

[root@rh94-dev01 pacemaker]# pcs status --full
Cluster name: dev_cluster
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: rh94-dev02 (2) (version 3.0.0-6d6734a86b) - partition with quorum
  * Last updated: Wed Mar 19 14:55:08 2025 on rh94-dev01
  * Last change:  Wed Mar 19 14:54:16 2025 by hacluster via hacluster on rh94-dev02
  * 2 nodes configured
  * 11 resource instances configured

Node List:
  * Node rh94-dev01 (1): online, feature set 3.20.0
  * Node rh94-dev02 (2): online, feature set 3.20.0

Full List of Resources:
  * Resource Group: pgsql-group:
    * filesystem1       (ocf:heartbeat:Dummy):   Stopped
    * filesystem2       (ocf:heartbeat:Dummy):   Stopped
    * filesystem3       (ocf:heartbeat:Dummy):   Stopped
    * ipaddr    (ocf:heartbeat:Dummy):   Stopped
    * pgsql     (ocf:heartbeat:Dummy):   Stopped
  * Clone Set: ping-clone [ping]:
    * ping      (ocf:pacemaker:ping):    Stopped
    * ping      (ocf:pacemaker:ping):    Stopped
  * Clone Set: storage-mon-clone [storage-mon]:
    * storage-mon       (ocf:heartbeat:storage-mon):     Stopped
    * storage-mon       (ocf:heartbeat:storage-mon):     Stopped
  * fence1-scsi (stonith:fence_scsi):    Stopped
  * fence2-scsi (stonith:fence_scsi):    Stopped

Migration Summary:

Failed Fencing Actions:
  * unfencing of rh94-dev02 failed: client=pacemaker-fenced.611291, origin=rh94-dev01, completed='2025-03-19 14:54:16.138143 +09:00'
  * unfencing of rh94-dev01 failed: client=pacemaker-controld.89352, origin=rh94-dev02, completed='2025-03-19 14:54:16.136143 +09:00'

Tickets:

PCSD Status:
  rh94-dev01: Online
  rh94-dev02: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled
[root@rh94-dev01 pacemaker]# 

---- cib.xml
(snip)
      <primitive id="fence1-scsi" class="stonith" type="fence_scsi">
        <instance_attributes id="fence1-scsi-instance_attributes">
          <nvpair id="fence1-scsi-instance_attributes-devices" name="devices" value="/dev/sdb"/>
          <nvpair id="fence1-scsi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="rh94-dev01"/> 
(snip)
      <primitive id="fence2-scsi" class="stonith" type="fence_scsi">
        <instance_attributes id="fence2-scsi-instance_attributes">
          <nvpair id="fence2-scsi-instance_attributes-devices" name="devices" value="/dev/sdb"/>
          <nvpair id="fence2-scsi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="rh94-dev02"/>
(snip)
    <fencing-topology>
      <fencing-level index="1" devices="fence1-scsi" target="rh94-dev01" id="fl-rh10-b01-1"/>
      <fencing-level index="1" devices="fence2-scsi" target="rh94-dev02" id="fl-rh10-b02-1"/>
    </fencing-topology>
(snip)

Pacemaker 2.1.9 and earlier did not have this problem, so it seems that some fix in the 3.0.0 series of Pacemaker is affecting it.

I have not tracked down the fix, so I will send a tentative fix. (It is probably a degradation of some process.)

I will leave it to the community to come up with a fix that takes the regression into account.

Best Regards,
Hideo Yamauchi.

…traints fails.

nrwahl2 · 2025-03-19T06:24:20Z

@HideoYamauchi What do you mean by placement constraint? A location constraint (in the <constraints> section)? Or something else? I am not very familiar with the fencer code, so forgive me if this is a stupid question.

HideoYamauchi · 2025-03-19T06:53:27Z

@nrwahl2

Sorry.
That was confusing.

What do you mean by placement constraint? A location constraint (in the section)?

yes.

It seems that the problem occurs if you do not specify a location rule (placement score).
In this case, the result of the state transition calculation is that allowed_nodes for the STONITH resource will not be set, so registration of the STONITH resource will be skipped.

*Since there are no issues up to 2.1.9, I think this is some kind of problem with the state transition calculation.

Thinking about it a bit more...

With just this fix, I don't think it will work well when -INFINITY constraints are mixed in.

---> I checked a bit and it seems that including the -INFINITY location constraint doesn't cause any problems.
---> I've modified cib.xml as follows: I've tried modifying host_list, removing topology, and adding the -INFINITY constraint.

(snip)
      <primitive id="fence1-scsi" class="stonith" type="fence_scsi">
        <instance_attributes id="fence1-scsi-instance_attributes">
          <nvpair id="fence1-scsi-instance_attributes-devices" name="devices" value="/dev/sdb"/>
          <nvpair id="fence1-scsi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="rh94-dev01,rh94-dev02"/>
(snip)
      <primitive id="fence2-scsi" class="stonith" type="fence_scsi">
        <instance_attributes id="fence2-scsi-instance_attributes">
          <nvpair id="fence2-scsi-instance_attributes-devices" name="devices" value="/dev/sdb"/>
          <nvpair id="fence2-scsi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="rh94-dev01,rh94-dev02"/>
(snip)
      <rsc_location id="location-fence1-scsi-hpww0101--INFINITY" rsc="fence1-scsi" node="rh94-dev01" score="-INFINITY"/>
      <rsc_location id="location-fence2-scsi-hpww0201--INFINITY" rsc="fence2-scsi" node="rh94-dev02" score="-INFINITY"/>
(snip) 

[root@rh94-dev01 ~]# crm_mon -rfA1
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: rh94-dev02 (version 3.0.0-6d6734a86b) - partition with quorum
  * Last updated: Wed Mar 19 16:27:17 2025 on rh94-dev01
  * Last change:  Wed Mar 19 16:20:32 2025 by hacluster via hacluster on rh94-dev02
  * 2 nodes configured
  * 11 resource instances configured

Node List:
  * Online: [ rh94-dev01 rh94-dev02 ]

Full List of Resources:
  * Resource Group: pgsql-group:
    * filesystem1       (ocf:heartbeat:Dummy):   Started rh94-dev01
    * filesystem2       (ocf:heartbeat:Dummy):   Started rh94-dev01
    * filesystem3       (ocf:heartbeat:Dummy):   Started rh94-dev01
    * ipaddr    (ocf:heartbeat:Dummy):   Started rh94-dev01
    * pgsql     (ocf:heartbeat:Dummy):   Started rh94-dev01
  * Clone Set: ping-clone [ping]:
    * Started: [ rh94-dev01 rh94-dev02 ]
  * Clone Set: storage-mon-clone [storage-mon]:
    * Started: [ rh94-dev01 rh94-dev02 ]
  * fence1-scsi (stonith:fence_scsi):    Started rh94-dev02
  * fence2-scsi (stonith:fence_scsi):    Started rh94-dev01

Node Attributes:
  * Node: rh94-dev01:
    * ping-status                       : 1         
  * Node: rh94-dev02:
    * ping-status                       : 1         

Migration Summary:
[root@rh94-dev01 ~]#

Best Regards,
Hideo Yamauchi.

wenningerk · 2025-03-19T09:00:07Z

When you don't see the fence-resource started would a fencing action like off or reboot work?
Asking because the scores would be calculated by pacemaker-schedulerd for the start thing while in case of real fencing the scheduler would just schedule a fencing-action and leave the rest to pacemaker-fenced (that in turn would check if a fence-resource is basically runable on a node by itself). At least iirc ...

HideoYamauchi · 2025-03-19T09:30:26Z

@wenningerk

Thank you for your comment.

The problem here is the registration of stonith devices with fenced.
Unfencing stonith resources that are not registered with fenced are not included in the initial query response, and so unfencing fails.
For this reason, this fix allows resources to be registered with fenced.
Just to be sure, we also check the reboot, and although a STONITH reboot depends on the execution transition calculated by schedullerd, if it is registered with fenced the reboot should be executed without any problems.

I know I'm repeating myself, but this problem is probably a regression.
There must be a better fix than this one.

Best Regards,
Hideo Yamauchi.

wenningerk · 2025-03-19T09:47:48Z

Yep that answers my question. Seems to be an issue when fenced is calling the scheduler-code not when it is called in the context of the scheduler-process (maybe there as well - but maybe worth checking if it is the context somehow).
And pls. excuse my ignorance for not looking close enough into what you had written (and at the actual patch).

clumens · 2025-03-19T15:37:21Z

@HideoYamauchi Do you know if it would be possible to write a scheduler regression test to reproduce the failure? If so, we could use that to git bisect between 2.1.9 and 3.0.0 and figure out where the bug was introduced.

HideoYamauchi · 2025-03-19T23:00:47Z

@clumens

Thanks for your comment.

If you're referring to CTS, I'm not familiar with the details of CTS.

However, I think the problem can be reproduced by following the steps below.
I confirmed that the problem occurred and was improved by fixing it in the main branch of Pacemaker 3.0 on RHEL 9.4.

Step 1) Assuming that this is the initial cluster startup, clear everything in /var/lib/pacemaker/.

Step 2) Configure the initial cluster with pcs cluster start --all.
Step 3) I import the settings with pcs cluster cib-push fence-scsi.xml --config, but unfencing fails.

[root@rh94-dev01 pacemaker]# pcs cluster cib-push /root/fence_scsi.xml --config
CIB updated


[root@rh94-dev01 pacemaker]# crm_mon -rfA1
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: rh94-dev01 (version 3.0.0-6d6734a86b) - partition with quorum
  * Last updated: Thu Mar 20 07:56:46 2025 on rh94-dev01
  * Last change:  Thu Mar 20 07:56:21 2025 by hacluster via hacluster on rh94-dev01
  * 2 nodes configured
  * 11 resource instances configured

Node List:
  * Online: [ rh94-dev01 rh94-dev02 ]

Full List of Resources:
  * Resource Group: pgsql-group:
    * filesystem1       (ocf:heartbeat:Dummy):   Stopped
    * filesystem2       (ocf:heartbeat:Dummy):   Stopped
    * filesystem3       (ocf:heartbeat:Dummy):   Stopped
    * ipaddr    (ocf:heartbeat:Dummy):   Stopped
    * pgsql     (ocf:heartbeat:Dummy):   Stopped
  * Clone Set: ping-clone [ping]:
    * Stopped: [ rh94-dev01 rh94-dev02 ]
  * Clone Set: storage-mon-clone [storage-mon]:
    * Stopped: [ rh94-dev01 rh94-dev02 ]
  * fence1-scsi (stonith:fence_scsi):    Stopped
  * fence2-scsi (stonith:fence_scsi):    Stopped

Migration Summary:

Failed Fencing Actions:
  * unfencing of rh94-dev01 for pacemaker-fenced.206214@rh94-dev02 last failed at 2025-03-20 07:56:21.104957 +09:00
  * unfencing of rh94-dev02 for pacemaker-controld.201738@rh94-dev01 last failed at 2025-03-20 07:56:21.102957 +09:00
[root@rh94-dev01 pacemaker]# 
------
[root@rh94-dev01 pacemaker]# grep Ignor /var/log/pacemaker/pacemaker.log | grep scsi
Mar 20 07:56:12.373 rh94-dev01 pacemaker-fenced    [201734] (register_if_fencing_device)        info: Ignoring fencing device fence1-scsi because local node is not allowed to run it
Mar 20 07:56:12.373 rh94-dev01 pacemaker-fenced    [201734] (register_if_fencing_device)        info: Ignoring fencing device fence2-scsi because local node is not allowed to run it
[root@rh94-dev01 pacemaker]#

fence_scsi.xml

<cib crm_feature_set="3.20.0" validate-with="pacemaker-4.0" epoch="28" num_updates="0" admin_epoch="0">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-priority-fencing-del" name="priority-fencing-delay" value="10s"/>
        <nvpair id="cib-bootstrap-options-node-health-strategy" name="node-health-strategy" value="only-green"/>
        <nvpair id="cib-bootstrap-options-fence-reaction" name="fence-reaction" value="panic"/>
      </cluster_property_set>
    </crm_config>
    <nodes/>
    <resources>
      <group id="pgsql-group">
        <primitive id="filesystem1" class="ocf" type="Dummy" provider="heartbeat">
          <operations>
            <op name="monitor" timeout="60s" interval="10s" on-fail="restart" id="filesystem1-monitor-interval-10s"/>
            <op name="start" timeout="60s" on-fail="restart" interval="0s" id="filesystem1-start-interval-0s"/>
            <op name="stop" timeout="60s" on-fail="fence" interval="0s" id="filesystem1-stop-interval-0s"/>
          </operations>
        </primitive>
        <primitive id="filesystem2" class="ocf" type="Dummy" provider="heartbeat">
          <operations>
            <op name="monitor" timeout="60s" interval="10s" on-fail="restart" id="filesystem2-monitor-interval-10s"/>
            <op name="start" timeout="60s" on-fail="restart" interval="0s" id="filesystem2-start-interval-0s"/>
            <op name="stop" timeout="60s" on-fail="fence" interval="0s" id="filesystem2-stop-interval-0s"/>
          </operations>
        </primitive>
        <primitive id="filesystem3" class="ocf" type="Dummy" provider="heartbeat">
          <operations>
            <op name="monitor" timeout="60s" interval="10s" on-fail="restart" id="filesystem3-monitor-interval-10s"/>
            <op name="start" timeout="60s" on-fail="restart" interval="0s" id="filesystem3-start-interval-0s"/>
            <op name="stop" timeout="60s" on-fail="fence" interval="0s" id="filesystem3-stop-interval-0s"/>
          </operations>
        </primitive>
        <primitive id="ipaddr" class="ocf" type="Dummy" provider="heartbeat">
          <operations>
            <op name="monitor" timeout="60s" interval="10s" on-fail="restart" id="ipaddr-monitor-interval-10s"/>
            <op name="start" timeout="60s" on-fail="restart" interval="0s" id="ipaddr-start-interval-0s"/>
            <op name="stop" timeout="60s" on-fail="fence" interval="0s" id="ipaddr-stop-interval-0s"/>
          </operations>
        </primitive>
        <primitive id="pgsql" class="ocf" type="Dummy" provider="heartbeat">
          <operations>
            <op name="monitor" timeout="60s" interval="10s" on-fail="restart" id="pgsql-monitor-interval-10s"/>
            <op name="start" timeout="300s" on-fail="restart" interval="0s" id="pgsql-start-interval-0s"/>
            <op name="stop" timeout="300s" on-fail="fence" interval="0s" id="pgsql-stop-interval-0s"/>
          </operations>
        </primitive>
        <meta_attributes id="pgsql-group-meta_attributes">
          <nvpair id="pgsql-group-meta_attributes-priority" name="priority" value="1"/>
        </meta_attributes>
      </group>
      <clone id="ping-clone">
        <primitive id="ping" class="ocf" type="ping" provider="pacemaker">
          <instance_attributes id="ping-instance_attributes">
            <nvpair id="ping-instance_attributes-attempts" name="attempts" value="2"/>
            <nvpair id="ping-instance_attributes-debug" name="debug" value="true"/>
            <nvpair id="ping-instance_attributes-host_list" name="host_list" value="192.168.28.254"/>
            <nvpair id="ping-instance_attributes-name" name="name" value="ping-status"/>
            <nvpair id="ping-instance_attributes-timeout" name="timeout" value="2"/>
          </instance_attributes>
          <operations>
            <op name="monitor" timeout="60s" interval="10s" on-fail="restart" id="ping-monitor-interval-10s"/>
            <op name="reload-agent" interval="0s" timeout="20s" id="ping-reload-agent-interval-0s"/>
            <op name="start" timeout="60s" on-fail="restart" interval="0s" id="ping-start-interval-0s"/>
            <op name="stop" timeout="60s" on-fail="fence" interval="0s" id="ping-stop-interval-0s"/>
          </operations>
        </primitive>
      </clone>
      <clone id="storage-mon-clone">
        <primitive id="storage-mon" class="ocf" type="storage-mon" provider="heartbeat">
          <instance_attributes id="storage-mon-instance_attributes">
            <nvpair id="storage-mon-instance_attributes-drives" name="drives" value="/dev/sda /dev/sdb"/>
            <nvpair id="storage-mon-instance_attributes-io_timeout" name="io_timeout" value="10"/>
          </instance_attributes>
          <operations>
            <op name="monitor" timeout="120s" interval="10s" on-fail="restart" id="storage-mon-monitor-interval-10s"/>
            <op name="start" timeout="120s" on-fail="restart" interval="0s" id="storage-mon-start-interval-0s"/>
            <op name="stop" timeout="120s" on-fail="fence" interval="0s" id="storage-mon-stop-interval-0s"/>
          </operations>
        </primitive>
      </clone>
      <primitive id="fence1-scsi" class="stonith" type="fence_scsi">
        <instance_attributes id="fence1-scsi-instance_attributes">
          <nvpair id="fence1-scsi-instance_attributes-devices" name="devices" value="/dev/sdb"/>
          <nvpair id="fence1-scsi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="rh94-dev01"/>
          <nvpair id="fence1-scsi-instance_attributes-pcmk_off_timeout" name="pcmk_off_timeout" value="27"/>
          <nvpair id="fence1-scsi-instance_attributes-pcmk_on_timeout" name="pcmk_on_timeout" value="27"/>
          <nvpair id="fence1-scsi-instance_attributes-pcmk_reboot_action" name="pcmk_reboot_action" value="off"/>
          <nvpair id="fence1-scsi-instance_attributes-power_wait" name="power_wait" value="24"/>
        </instance_attributes>
        <meta_attributes id="fence1-scsi-meta_attributes">
          <nvpair id="fence1-scsi-meta_attributes-provides" name="provides" value="unfencing"/>
        </meta_attributes>
        <operations>
          <op name="monitor" timeout="60s" interval="3600s" on-fail="restart" id="fence1-scsi-monitor-interval-3600s"/>
          <op name="start" timeout="60s" on-fail="restart" interval="0s" id="fence1-scsi-start-interval-0s"/>
          <op name="stop" timeout="60s" on-fail="ignore" interval="0s" id="fence1-scsi-stop-interval-0s"/>
        </operations>
      </primitive>
      <primitive id="fence2-scsi" class="stonith" type="fence_scsi">
        <instance_attributes id="fence2-scsi-instance_attributes">
          <nvpair id="fence2-scsi-instance_attributes-devices" name="devices" value="/dev/sdb"/>
          <nvpair id="fence2-scsi-instance_attributes-pcmk_host_list" name="pcmk_host_list" value="rh94-dev02"/>
          <nvpair id="fence2-scsi-instance_attributes-pcmk_off_timeout" name="pcmk_off_timeout" value="27"/>
          <nvpair id="fence2-scsi-instance_attributes-pcmk_on_timeout" name="pcmk_on_timeout" value="27"/>
          <nvpair id="fence2-scsi-instance_attributes-pcmk_reboot_action" name="pcmk_reboot_action" value="off"/>
          <nvpair id="fence2-scsi-instance_attributes-power_wait" name="power_wait" value="24"/>
        </instance_attributes>
        <meta_attributes id="fence2-scsi-meta_attributes">
          <nvpair id="fence2-scsi-meta_attributes-provides" name="provides" value="unfencing"/>
        </meta_attributes>
        <operations>
          <op name="monitor" timeout="60s" interval="3600s" on-fail="restart" id="fence2-scsi-monitor-interval-3600s"/>
          <op name="start" timeout="60s" on-fail="restart" interval="0s" id="fence2-scsi-start-interval-0s"/>
          <op name="stop" timeout="60s" on-fail="ignore" interval="0s" id="fence2-scsi-stop-interval-0s"/>
        </operations>
      </primitive>
    </resources>
    <constraints>
      <rsc_location id="location-pgsql-group-rh10-b01-200" rsc="pgsql-group" node="rh94-dev01" score="200"/>
      <rsc_location id="location-pgsql-group-rh10-b02-100" rsc="pgsql-group" node="rh94-dev02" score="100"/>
      <rsc_location id="location-pgsql-group" rsc="pgsql-group">
        <rule id="location-pgsql-group-rule" boolean-op="or" score="-INFINITY">
          <expression id="location-pgsql-group-rule-expr" operation="lt" attribute="ping-status" value="1"/>
          <expression id="location-pgsql-group-rule-expr-1" operation="not_defined" attribute="ping-status"/>
          <expression id="location-pgsql-group-rule-expr-2" operation="not_defined" attribute="#health-storage-mon"/>
        </rule>
      </rsc_location>
      <rsc_colocation rsc="pgsql-group" with-rsc="ping-clone" score="INFINITY" id="colocation-pgsql-group-ping-clone-INFINITY"/>
      <rsc_colocation rsc="pgsql-group" with-rsc="storage-mon-clone" score="INFINITY" id="colocation-pgsql-group-storage-mon-clone-INFINITY"/>
      <rsc_order first="ping-clone" then="pgsql-group" first-action="start" then-action="start" symmetrical="false" id="order-ping-clone-pgsql-group-mandatory"/>
      <rsc_order first="storage-mon-clone" then="pgsql-group" first-action="start" then-action="start" symmetrical="false" id="order-storage-mon-clone-pgsql-group-mandatory"/>
    </constraints>
    <rsc_defaults>
      <meta_attributes id="rsc_defaults-meta_attributes">
        <nvpair id="rsc_defaults-meta_attributes-resource-stickiness" name="resource-stickiness" value="200"/>
        <nvpair id="rsc_defaults-meta_attributes-migration-threshold" name="migration-threshold" value="1"/>
      </meta_attributes>
    </rsc_defaults>
    <fencing-topology>
      <fencing-level index="1" devices="fence1-scsi" target="rh94-dev01" id="fl-rh10-b01-1"/>
      <fencing-level index="1" devices="fence2-scsi" target="rh94-dev02" id="fl-rh10-b02-1"/>
    </fencing-topology>
  </configuration>
  <status/>
</cib>

Best Regards,
Hideo Yamauchi.

HideoYamauchi · 2025-03-20T05:57:49Z

@clumens

By the way, this problem does not seem to occur if cib.xml already exists in /var/lib/pacemaker.
It occurs after the cib.xml has been loaded during the initial startup.

Best Regards,
Hideo Yamauchi.

wenningerk · 2025-03-20T06:42:42Z

By the way, this problem does not seem to occur if cib.xml already exists in /var/lib/pacemaker. It occurs after the cib.xml has been loaded during the initial startup.

Best Regards, Hideo Yamauchi.

Maybe a pointer in the direction that it is the context the scheduler code is run in.

nrwahl2 · 2025-03-20T08:03:50Z

could be the empty <nodes> section in the CIB -- may be a race condition between when it gets populated and when unfencing/fence device registration should occur

I've been busy with other things and I have not personally looked at this in any detail yet. I just skimmed over the CIB that Hideo provided, along with the comment that the problem doesn't occur when cib.xml already exists (in which case the nodes have likely already been populated).

Are you able to fill in the nodes section in the fence_scsi.xml and see if the issue persists?

HideoYamauchi · 2025-03-20T09:47:21Z

@nrwahl2

I agree with you that the cause is the timing and the presence or absence of content in the node section of cib.xml.
I still don't know what change from 2.1.9 is affecting this.
I think we should look for a better fix.

Best Regards,
Hideo Yamauchi.

wenningerk · 2025-03-20T09:58:49Z

A cib-change will immediately trigger fenced to react while a scheduler run may not be triggered right away ...

HideoYamauchi · 2025-03-21T00:40:28Z

@nrwahl2

could be the empty <nodes> section in the CIB -- may be a race condition between when it gets populated and when unfencing/fence device registration should occur

I've been busy with other things and I have not personally looked at this in any detail yet. I just skimmed over the CIB that Hideo provided, along with the comment that the problem doesn't occur when cib.xml already exists (in which case the nodes have likely already been populated).

Are you able to fill in the nodes section in the fence_scsi.xml and see if the issue persists?

If the cib.xml you import contains a node section, the cluster will be configured successfully.
To confirm the problem, we imported an even smaller cib.xml and checked.


[root@rh94-dev01 ~]# pcs cluster start --all
rh94-dev02: Starting Cluster...
rh94-dev01: Starting Cluster...
[root@rh94-dev01 ~]# pcs cluster cib-push /root/fence_scsi_mini_node.xml --config
CIB updated
[root@rh94-dev01 ~]# 

[root@rh94-dev01 ~]# crm_mon -rfA1
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: rh94-dev01 (version 3.0.0-aa43822f5a) - partition with quorum
  * Last updated: Fri Mar 21 09:40:08 2025 on rh94-dev01
  * Last change:  Fri Mar 21 09:39:21 2025 by hacluster via hacluster on rh94-dev01
  * 2 nodes configured
  * 3 resource instances configured

Node List:
  * Online: [ rh94-dev01 rh94-dev02 ]

Full List of Resources:
  * Resource Group: pgsql-group:
    * filesystem1       (ocf:heartbeat:Dummy):   Started rh94-dev01
  * fence1-scsi (stonith:fence_scsi):    Started rh94-dev02
  * fence2-scsi (stonith:fence_scsi):    Started rh94-dev01

Migration Summary:
[root@rh94-dev01 ~]# 

----cib.xml
<cib crm_feature_set="3.20.0" validate-with="pacemaker-4.0" epoch="28" num_updates="0" admin_epoch="0">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair id="cib-bootstrap-options-priority-fencing-del" name="priority-fencing-delay" value="10s"/>
        <nvpair id="cib-bootstrap-options-fence-reaction" name="fence-reaction" value="panic"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="1" uname="rh94-dev01"/>
      <node id="2" uname="rh94-dev02"/>
    </nodes>
    <resources>
(snip)

Best Regards,
Hideo Yamauchi

HideoYamauchi · 2025-03-21T00:43:04Z

Hi All,

This is a significant issue for our users.
We will be working on identifying the cause and fix for this issue as best we can next week.

Apparently the problem is with the contents of the cib notified to fenced.
If the difference/processing of cib's dfiff has changed in the Pacemaker 3.0 series, that should be affecting it.

I will close this PR for now and submit another one.

I understand the difference with the 3.0 series that is the problem.
The problem exists in the schedullerd library.

Many thanks,
Hideo Yamauchi.

HideoYamauchi · 2025-03-24T00:52:34Z

It will be closed for re-PR.

This fixes a regression introduced by bf7ffcd. As of that commit, the fake local node is created after all resources have been unpacked. So it doesn't get added to resources' allowed_nodes tables. This prevents registration of fencing devices when the fencer receives a CIB diff that doesn't contain the local node. For example, the user may have replaced the CIB with a boilerplate configuration that has an empty nodes section. See the following pull requests from Hideo Yamauchi and their discussions: ClusterLabs#3849 ClusterLabs#3852 Thanks to Hideo for the report and finding the cause. Signed-off-by: Reid Wahl <[email protected]>

This fixes a regression introduced by bf7ffcd. As of that commit, the fake local node is created after all resources have been unpacked. So it doesn't get added to resources' allowed_nodes tables. This prevents registration of fencing devices when the fencer receives a CIB diff that removes the local node. For example, the user may have replaced the CIB with a boilerplate configuration that has an empty nodes section. Registering a fencing device requires that the local node be in the resource's allowed nodes table. One option would be to add the fake local node to all resources' allowed nodes tables immediately after creating it. However, it shouldn't necessarily be an allowed node for all resources. For example, if symmetric-cluster=false, then a node should not be placed in a resource's allowed nodes table by default. It's difficult to ensure correctness of the allowed nodes tables when a fake local node is involved. It may involve repeated code or a fragile and confusing dependence on the order of unpacking. Since the fake local node is a hack in the first place, it seems better to avoid using it where possible. Currently the only code that even sets the local_node_name member of scheduler->priv is in: * the fencer * crm_mon when showing the "times" section This commit works as follows. If the fencer receives a CIB diff notification that contains the nodes section, it triggers a full refresh of fencing device registrations. In our example above, where a user has replaced the CIB with a configuration that has an empty nodes section, this means all fencing device registrations will be removed. However, the controller also has a CIB diff notification callback: do_cib_updated(). The controller's callback repopulates the nodes section with up-to-date information from the cluster layer (or its node cache) if it finds that an untrusted client like cibadmin has sent a modified the nodes section. Then it updates the CIB accordingly. The fencer will receive this updated CIB and refresh fencing device registrations again, re-registering the fencing devices that were just removed. Note that in the common case, we're not doing all this wasted work. The "remove and then re-register" sequence should happen only when a user has modified the CIB in a sloppy way (for example, by deleting nodes from the CIB's nodes section that have not been removed from the cluster). In short, it seems better to rely on the controller's maintenance of the CIB's node list, than to rely on a "fake local node" hack in the scheduler. See the following pull requests from Hideo Yamauchi and their discussions: ClusterLabs#3849 ClusterLabs#3852 Thanks to Hideo for the report and finding the cause. Signed-off-by: Reid Wahl <[email protected]>

High: fenced: Registration of a STONITH device without placement cons…

aa43822

…traints fails.

clumens added the review: in progress PRs that are currently being reviewed label Mar 19, 2025

HideoYamauchi changed the title ~~High: fenced: Registration of a STONITH device without placement constraints fails.~~ [WIP] High: fenced: Registration of a STONITH device without placement constraints fails. Mar 20, 2025

HideoYamauchi closed this Mar 24, 2025

HideoYamauchi mentioned this pull request Mar 24, 2025

High: fenced: Fixed an issue where registering a STONITH device to fenced failed when the node name was unknown (for compatibility with 2.1 series). #3852

Closed

nrwahl2 mentioned this pull request Mar 28, 2025

Fix: fencer: Refresh CIB devices on change to nodes section #3855

Open

[WIP] High: fenced: Registration of a STONITH device without placement constraints fails. #3849

[WIP] High: fenced: Registration of a STONITH device without placement constraints fails. #3849

Uh oh!

Conversation

HideoYamauchi commented Mar 19, 2025

Uh oh!

nrwahl2 commented Mar 19, 2025

Uh oh!

HideoYamauchi commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wenningerk commented Mar 19, 2025

Uh oh!

HideoYamauchi commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wenningerk commented Mar 19, 2025

Uh oh!

clumens commented Mar 19, 2025

Uh oh!

HideoYamauchi commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HideoYamauchi commented Mar 20, 2025

Uh oh!

wenningerk commented Mar 20, 2025

Uh oh!

nrwahl2 commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HideoYamauchi commented Mar 20, 2025

Uh oh!

wenningerk commented Mar 20, 2025

Uh oh!

HideoYamauchi commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HideoYamauchi commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HideoYamauchi commented Mar 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HideoYamauchi commented Mar 19, 2025 •

edited

Loading

HideoYamauchi commented Mar 19, 2025 •

edited

Loading

HideoYamauchi commented Mar 19, 2025 •

edited

Loading

nrwahl2 commented Mar 20, 2025 •

edited

Loading

HideoYamauchi commented Mar 21, 2025 •

edited

Loading

HideoYamauchi commented Mar 21, 2025 •

edited

Loading