Skip to content

Intermittent bootstrap failure on AWS Nitro instances #428

@Ivaylogi98

Description

@Ivaylogi98

Summary

On AWS Nitro-based instances with an encrypted ephemeral EBS disk, the bosh-agent intermittently fails to bootstrap with timed out pinging VM or timed out sending get_task to instance.

The agent attempts to partition the root disk instead of the ephemeral disk, which causes the bootstrap to fail.

Environment

  • bosh v282.1.2
  • bosh-aws-cpi v107.0.1
  • Instance type: t3a.nano (Nitro, no local instance storage)
  • Cloud config:
    cloud_properties:
      ephemeral_disk:
        encrypted: true
        size: 20000
      instance_type: t3a.nano
    

Observed behaviour

  • Intermittent — not consistently reproducible, timing-dependent
  • Agent attempts to partition the root disk
  • VM becomes unresponsive, deployment times out

Analysis

On Nitro instances, all disks appear as NVMe devices. The agent resolves /dev/sdb (the ephemeral disk name in settings) to an NVMe device via /dev/disk/by-id/ symlinks using IDDevicePathResolver. It is unclear whether encrypted EBS disks produce a different symlink name in /dev/disk/by-id/ than what the agent's glob pattern expects, or whether the disk ID passed from the CPI does not match the symlink.

Related

Co-authored with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Waiting for Changes | Open for Contribution

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions