Skip to content

After URAFT election, the leader node got this message today "sfsmaster[33351]: [33351] info: main master server module: got invalid message in shadow state (type:400)" and saunafs went down. #554

@hradec

Description

@hradec

So, following the information on #533, I setup the timeouts for the 4 sfsmaster nodes to create a priority system.

When one of the was elected leader, sfsmaster started and got into a loop spitting out this:

Aug 29 15:10:58 sfo-storage-server sfsmaster[33351]: [33351] info: connecting to Master
Aug 29 15:10:58 sfo-storage-server sfsmaster[33351]: [33351] info: connected to Master
Aug 29 15:10:58 sfo-storage-server sfsmaster[33351]: [33351] info: main master server module: got invalid message in shadow state (type:400)
Aug 29 15:10:58 sfo-storage-server sfsmaster[33351]: [33351] info: main master server module: got invalid message in shadow state (type:400)
Aug 29 15:10:58 sfo-storage-server sfsmaster[33351]: [33351] info: main master server module: got invalid message in shadow state (type:400)
Aug 29 15:10:58 sfo-storage-server sfsmaster[33351]: [33351] info: main master server module: got invalid message in shadow state (type:400)
Aug 29 15:10:58 sfo-storage-server sfsmaster[33351]: [33351] info: main master server module: got invalid message in shadow state (type:400)
Aug 29 15:10:58 sfo-storage-server sfsmaster[33351]: [33351] info: main master server module: got invalid message in shadow state (type:400)
Aug 29 15:10:59 sfo-storage-server sfsmaster[33351]: [33351] info: connecting to Master
Aug 29 15:10:59 sfo-storage-server sfsmaster[33351]: [33351] info: connected to Master
Aug 29 15:10:59 sfo-storage-server sfsmaster[33351]: [33351] info: main master server module: got invalid message in shadow state (type:400)
Aug 29 15:10:59 sfo-storage-server sfsmaster[33351]: [33351] info: main master server module: got invalid message in shadow state (type:400)
Aug 29 15:11:00 sfo-storage-server sfsmaster[33351]: [33351] info: connecting to Master
Aug 29 15:11:00 sfo-storage-server sfsmaster[33351]: [33351] info: connected to Master
Aug 29 15:11:00 sfo-storage-server sfsmaster[33351]: [33351] info: main master server module: got invalid message in shadow state (type:400)
Aug 29 15:11:00 sfo-storage-server sfsmaster[33351]: [33351] info: main master server module: got invalid message in shadow state (type:400)
Aug 29 15:11:00 sfo-storage-server sfsmaster[33351]: [33351] info: main master server module: got invalid message in shadow state (type:400)
Aug 29 15:11:00 sfo-storage-server sfsmaster[33351]: [33351] info: main master server module: got invalid message in shadow state (type:400)
Aug 29 15:11:00 sfo-storage-server sfsmaster[33351]: [33351] info: main master server module: got invalid message in shadow state (type:400)
Aug 29 15:11:00 sfo-storage-server sfsmaster[33351]: [33351] info: main master server module: got invalid message in shadow state (type:400)
Aug 29 15:11:01 sfo-storage-server sfsmaster[33351]: [33351] info: connecting to Master
Aug 29 15:11:01 sfo-storage-server sfsmaster[33351]: [33351] info: connected to Master

All mount points got frozen since the unique IP was deleted, but wasn't assigned to the new leader since sfsmaster was stuck in a loop.
After manually killing sfsmaster, it came up online correctly, and the node with the second priority became the leader and saunafs was back online on all clients.

But now, this node that got the error nessages doesn't show up in the saunafs webui.

Image

I had to restart uraft for it to finally show up in the webui again.

I've created a per minute job to monitor this message every minute in all 4 nodes, and if happens again, it will restart uraft to hopefully prevent this from happening again.

I'm just not sure what happened there.
I attached the full sfsmaster log since it was started.

sfsmaster.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    Priority: HighHeavy performance degradation, major bugbugSomething isn't workingneeds investigationNeeds more information/reproduction

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions