Skip to content

Comments

fix: tsh label based commands always running serially#63555

Merged
rosstimothy merged 1 commit intomasterfrom
tross/ssh_current_user_roles
Feb 6, 2026
Merged

fix: tsh label based commands always running serially#63555
rosstimothy merged 1 commit intomasterfrom
tross/ssh_current_user_roles

Conversation

@rosstimothy
Copy link
Contributor

@rosstimothy rosstimothy commented Feb 5, 2026

When using tsh ssh user@foo=bar uptime, the command is supposed to be run in parallel on all matching hosts unless Per-Session MFA is required, or the user has a restriction on the number of concurrent sessions.

To determine if limits are enforced, tsh would list roles and take the maximum from all of them and add with a small buffer to prevent exceeding the limit. It turns out that this only works for users that have permissions to read Role resources. ListRoles does not return the roles of the current user if they do not have permissions to read roles like GetRole does. The result is that users that cannot read roles have all of the command run serially instead of in parallel even if they have no connection limits in their roles.

It looks like this has been the case for GetRoles and ListRoles since their inception. This has likely gone unnoticed because this is either a very uncommon workflow, or the users performing this regularly have read permissions for Role resources.

There does exist an RPC that provides a user with all of the roles they are assigned regardless of their permissions: GetCurrentUserRoles. This RPC is now used in place of ListRoles when determining connection limits in tsh label based ssh.

Changelog

changelog: Fixed tsh ssh user@foo=bar uptime from running serially if users did not have role:read permissions.

Manual Test Plan

  • Label based tsh ssh runs in parallel for users with permissions to read roles
  • Label based tsh ssh runs in parallel for users without permissions to read roles
  • Label based tsh ssh obeys users with max_connections specified in their roles

@rosstimothy
Copy link
Contributor Author

The below traces were captured from a user with only the access preset role attempting to run a command on multiple host matching a single label.

tsh 18.3.0

image

tsh from this branch

image

@rosstimothy rosstimothy force-pushed the tross/ssh_current_user_roles branch 2 times, most recently from 65d2611 to aac3cb0 Compare February 6, 2026 14:21
@rosstimothy rosstimothy marked this pull request as ready for review February 6, 2026 14:58
@github-actions github-actions bot requested a review from danielashare February 6, 2026 14:59
Copy link
Contributor

@okraport okraport left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@rosstimothy rosstimothy added this pull request to the merge queue Feb 6, 2026
github-merge-queue bot pushed a commit that referenced this pull request Feb 6, 2026
When using tsh ssh user@foo=bar uptime, the command is
supposed to be run in parallel on all matching hosts unless
Per-Session MFA is required, or the user has a restriction on
the number of concurrent sessions.

To determine if limits are enforced, tsh would list roles and
take the maximum from all of them and add with a small buffer
to prevent exceeding the limit. It turns out that this only
works for users that have permissions to read Role resources.
ListRoles does **not** return the roles of the current user
if they do not have permissions to read roles like GetRole
does. The result is that users that cannot read roles have
all of the command run serially instead of in parallel even
if they have _no_ connection limits in their roles.

It looks like this has been the case for GetRoles and ListRoles
since their inception. This has likely gone unnoticed because
this is either a very uncommon workflow, or the users performing
this regularly have read permissions for Role resources.

There does exist an RPC that provides a user with all of
the roles they are assigned regardless of their permissions:
GetCurrentUserRoles. This RPC is now used in place of ListRoles
when determining connection limits in tsh label based ssh.
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 6, 2026
@rosstimothy
Copy link
Contributor Author

It looks like this bug was also masking a data race in TestSSHOnMultipleNodes.

==================
WARNING: DATA RACE
Write at 0x00c007d43250 by goroutine 392369:
  bytes.(*Buffer).WriteTo()
      /opt/go/src/bytes/buffer.go:265 +0x4b
  io.copyBuffer()
      /opt/go/src/io/io.go:411 +0xd3
  io.Copy()
      /opt/go/src/io/io.go:388 +0x5d
  golang.org/x/crypto/ssh.(*Session).stdin.func1()
      /go/pkg/mod/golang.org/x/crypto@v0.46.0/ssh/session.go:489 +0x2a

Previous write at 0x00c007d43250 by goroutine 392365:
  bytes.(*Buffer).WriteTo()
      /opt/go/src/bytes/buffer.go:265 +0x4b
  io.copyBuffer()
      /opt/go/src/io/io.go:411 +0xd3
  io.Copy()
      /opt/go/src/io/io.go:388 +0x5d
  golang.org/x/crypto/ssh.(*Session).stdin.func1()
      /go/pkg/mod/golang.org/x/crypto@v0.46.0/ssh/session.go:489 +0x2a

Goroutine 392369 (running) created at:
  golang.org/x/crypto/ssh.(*Session).stdin()
      /go/pkg/mod/golang.org/x/crypto@v0.46.0/ssh/session.go:488 +0x228
  golang.org/x/crypto/ssh.(*Session).start()
      /go/pkg/mod/golang.org/x/crypto@v0.46.0/ssh/session.go:377 +0xbb
  golang.org/x/crypto/ssh.(*Session).Start()
      /go/pkg/mod/golang.org/x/crypto@v0.46.0/ssh/session.go:293 +0x264
  golang.org/x/crypto/ssh.(*Session).Run()
      /go/pkg/mod/golang.org/x/crypto@v0.46.0/ssh/session.go:310 +0x3e
  github.com/gravitational/teleport/api/observability/tracing/ssh.(*Session).Run()
      /__w/teleport/teleport/api/observability/tracing/ssh/session.go:301 +0x791
  github.com/gravitational/teleport/lib/client.(*NodeSession).runCommand.func2.1()
      /__w/teleport/teleport/lib/client/session.go:605 +0x94

Goroutine 392365 (finished) created at:
  golang.org/x/crypto/ssh.(*Session).stdin()
      /go/pkg/mod/golang.org/x/crypto@v0.46.0/ssh/session.go:488 +0x228
  golang.org/x/crypto/ssh.(*Session).start()
      /go/pkg/mod/golang.org/x/crypto@v0.46.0/ssh/session.go:377 +0xbb
  golang.org/x/crypto/ssh.(*Session).Start()
      /go/pkg/mod/golang.org/x/crypto@v0.46.0/ssh/session.go:293 +0x264
  golang.org/x/crypto/ssh.(*Session).Run()
      /go/pkg/mod/golang.org/x/crypto@v0.46.0/ssh/session.go:310 +0x3e
  github.com/gravitational/teleport/api/observability/tracing/ssh.(*Session).Run()
      /__w/teleport/teleport/api/observability/tracing/ssh/session.go:301 +0x791
  github.com/gravitational/teleport/lib/client.(*NodeSession).runCommand.func2.1()
      /__w/teleport/teleport/lib/client/session.go:605 +0x94
==================
==================
WARNING: DATA RACE
Read at 0x00c007d43230 by goroutine 392369:
  bytes.(*Buffer).Len()
      /opt/go/src/bytes/buffer.go:85 +0x65
  bytes.(*Buffer).WriteTo()
      /opt/go/src/bytes/buffer.go:266 +0x12
  io.copyBuffer()
      /opt/go/src/io/io.go:411 +0xd3
  io.Copy()
      /opt/go/src/io/io.go:388 +0x5d
  golang.org/x/crypto/ssh.(*Session).stdin.func1()
      /go/pkg/mod/golang.org/x/crypto@v0.46.0/ssh/session.go:489 +0x2a

Previous write at 0x00c007d43230 by goroutine 392365:
  bytes.(*Buffer).Reset()
      /opt/go/src/bytes/buffer.go:113 +0x1b1
  bytes.(*Buffer).WriteTo()
      /opt/go/src/bytes/buffer.go:283 +0x19b
  io.copyBuffer()
      /opt/go/src/io/io.go:411 +0xd3
  io.Copy()
      /opt/go/src/io/io.go:388 +0x5d
  golang.org/x/crypto/ssh.(*Session).stdin.func1()
      /go/pkg/mod/golang.org/x/crypto@v0.46.0/ssh/session.go:489 +0x2a

@rosstimothy rosstimothy force-pushed the tross/ssh_current_user_roles branch from aac3cb0 to 6469ecf Compare February 6, 2026 18:26
When using tsh ssh user@foo=bar uptime, the command is
supposed to be run in parallel on all matching hosts unless
Per-Session MFA is required, or the user has a restriction on
the number of concurrent sessions.

To determine if limits are enforced, tsh would list roles and
take the maximum from all of them and add with a small buffer
to prevent exceeding the limit. It turns out that this only
works for users that have permissions to read Role resources.
ListRoles does **not** return the roles of the current user
if they do not have permissions to read roles like GetRole
does. The result is that users that cannot read roles have
all of the command run serially instead of in parallel even
if they have _no_ connection limits in their roles.

It looks like this has been the case for GetRoles and ListRoles
since their inception. This has likely gone unnoticed because
this is either a very uncommon workflow, or the users performing
this regularly have read permissions for Role resources.

There does exist an RPC that provides a user with all of
the roles they are assigned regardless of their permissions:
GetCurrentUserRoles. This RPC is now used in place of ListRoles
when determining connection limits in tsh label based ssh.
@rosstimothy rosstimothy force-pushed the tross/ssh_current_user_roles branch from 6469ecf to 99f8275 Compare February 6, 2026 21:14
@rosstimothy rosstimothy added this pull request to the merge queue Feb 6, 2026
Merged via the queue into master with commit d3072a7 Feb 6, 2026
42 checks passed
@rosstimothy rosstimothy deleted the tross/ssh_current_user_roles branch February 6, 2026 21:56
@backport-bot-workflows
Copy link
Contributor

@rosstimothy See the table below for backport results.

Branch Result
branch/v17 Create PR
branch/v18 Create PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants