Skip to content

Conversation

@chaptersix
Copy link
Contributor

@chaptersix chaptersix commented Nov 19, 2025

What changed?

Describe what has changed in this PR.

Why?

Tell your future self why have you made these changes.

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Potential risks

Any change is risky. Identify all risks you are aware of. If none, remove this section.


Note

Cursor Bugbot is generating a summary for commit d8f1834. Configure here.

chaptersix and others added 30 commits October 30, 2025 14:45
The config loader was not properly validating templates before use.
Add validation logic and tests for template coverage.
Use go:embed to include the default config template directly in the
binary, eliminating the need for external template files at runtime.
The embedded template in common/config is now the single source of
truth for the default configuration template.
Change the default behavior of 'temporal server start' to use the
embedded config template with environment variables. This simplifies
setup for users who don't need custom configuration files.

Add --config-file flag for explicit file-based configuration.
Add comprehensive test to validate all fields in the embedded template
can be properly rendered. Refactor main.go to use new config loading
pattern.
The embedded template must match the template used in the Helm chart
to ensure consistent behavior across deployment methods.
Replace the multiple LoadConfig variants with a single Load function
that accepts functional options. This provides a cleaner API and makes
it easier to add new configuration sources.
Rename TEMPORAL_CONFIG_FILE to TEMPORAL_SERVER_CONFIG_FILE_PATH to make
the variable's purpose more explicit. Update Make targets to use the
new --config-file flag instead of the deprecated --env flag.
Reorganize the config loader code for better maintainability and
simplify the test suite by removing redundant test cases.
Update config tests to properly handle the new loading mechanism.
Fix sqlite file config to use correct path.
Ensure that existing configuration files and environment variables
continue to work with the new config loader implementation.
Expose config file path as an option in the FX dependency injection
provider to support programmatic configuration.
## What changed?
Fixing data race by extracting out close transfer task information prior
to release lock. Added more context to comment.


## Why?
Bug

## How did you test it?
- [x] built
- [x] run locally and tested manually
- [x] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)
## What changed?
- Revert the removal of slice count check in multi-cursor slice count
action.
- The check was incorrectly removed in [#8416
L117](https://github.com/temporalio/temporal/pull/8416/files#diff-deecce1e374d8c4db074d9c923c2b80d1b8e28cef778202aa01397e8d56e1bafL117)

## Why?
- Without the check, the code will panic later in
`pickCompactCandidates` when currentSliceCount < targetSliceCount.

## How did you test it?
- [x] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)
- [x] will follow up with a test PR
## What changed?
- Wire up chasm workflow library

## Why?
- #8485 makes workflow start to use CHASM as well and 
we need to register workflow as a chasm library.

## How did you test it?
- [x] built
- [ ] run locally and tested manually
- [ ] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)

## Potential risks
- Logic is only used when chasm feature flag is turned on. The change is
mainly for functional tests.
## What changed?
Add locking in newRateLimitManager and Stop.

## Why?
Unlikely but potential data race if we get a dynamic config subscription
callback after Subscribe returns before assigning to the field in the
constructor.

## How did you test it?
- [x] built and ran existing tests
## What changed?
- Fix last change tracking for pri/fairness tasks.
- Don't allow migration for sticky queues.
- Improve fairness migration tests:
  - Set config only on test task queue
  - Use only ApproximateBacklogCount

## Why?
This fixes some situations where we wouldn't update
ApproximateBacklogCount on partition unload. It should also make the
test less flaky.

## How did you test it?
- [x] covered by existing tests
## What changed?
No-oping close transfer tasks for SyncWorkflowState tasks


## Why?
We need non state based replication to be eligible for this
optimization.


## How did you test it?
- [x] built
- [x] run locally and tested manually
- [x] covered by existing tests
- [ ] added new unit test(s)
- [x] added new functional test(s)

`go test -v -tags test_dep ./tests/xdc -run
TestStreamBasedReplicationTestSuite/DisableTransitionHistory/TestCloseTransferTaskAckedReplication
-timeout 10m -count=1`

shows

`2025-10-20T08:15:08.290-0700 info Skipping close transfer task
generation - already acked on active cluster {"cluster-name":
"standby_aadnd", "host": "127.0.0.1:57179", "shard-id": 1, "address":
"127.0.0.1:57179", "wf-namespace-id":
"e550305e-0b43-4bcd-a490-8e3223f51ce1", "wf-id":
"test-replication-e2c094d3-c34f-42d9-a166-a967d4e7f602", "wf-run-id":
"019a0230-299a-74ba-a31f-ac247c05f2b9", "logging-call-at":
"/Users/michaely520/projects/temporal/service/history/workflow/task_generator.go:206"}
stream_based_replication_test.go:975: Verified IsCloseTransferTaskAcked
and IsForceReplication flags in SyncWorkflowStateTask`
Change help text to specify path is relative to current working
directory rather than root directory, improving user clarity.
Update test assertions to match new config loading behavior.
@chaptersix
Copy link
Contributor Author

cursor review

COPY --chmod=755 ./scripts/start-server.sh /etc/temporal/start-server.sh

USER temporal
ENTRYPOINT ["/etc/temporal/entrypoint.sh"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing runtime dependencies in server Dockerfile

The server.Dockerfile doesn't install any runtime dependencies, but entrypoint-server.sh uses getent which isn't available in Alpine's base image. Alpine uses musl libc instead of glibc, so getent needs to be installed or replaced with an alternative. The legacy-server.Dockerfile installs packages like bash and curl, but this Dockerfile is missing those installations entirely.

Fix in Cursor Fix in Web

# Setup Temporal Server in development mode if "develop" is passed as an argument.
for arg in "$@"; do
if [ "${arg}" = "develop" ]; then
/etc/temporal/setup-develop.sh
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing setup scripts referenced by entrypoint

The entrypoint script references /etc/temporal/auto-setup.sh and /etc/temporal/setup-develop.sh when autosetup or develop arguments are passed, but these scripts are never copied into the Docker image in server.Dockerfile. This causes the container to fail when these arguments are used.

Fix in Cursor Fix in Web

tags = compact([
"${IMAGE_REPO}/server:${IMAGE_SHA_TAG}",
"${IMAGE_REPO}/server:${SAFE_IMAGE_BRANCH_TAG}",
TAG_LATEST ? "${IMAGE_REPO}/server:latest" : "",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Docker image tag collision between targets

The admin-tools and legacy-admin-tools targets use identical tags, and server and legacy-server targets also use identical tags. When building the default group that includes all four targets, the later builds overwrite the earlier ones, making it impossible to have both variants available simultaneously. Each target needs distinct tags to avoid conflicts.

Fix in Cursor Fix in Web

mkdir -p "${BUILD_DIR}"

# Copy schema directory from repo root
cp -r "${REPO_ROOT}/schema" "${BUILD_DIR}/temporal"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Incorrect schema directory copy creates wrong path

The cp -r command copies schema to ${BUILD_DIR}/temporal, which creates different results depending on whether the destination exists. On first run when temporal doesn't exist, it creates ${BUILD_DIR}/temporal containing schema files directly. The Dockerfiles expect ./build/temporal/schema to exist, causing the build to fail. The command should use cp -r "${REPO_ROOT}/schema" "${BUILD_DIR}/temporal/" with a trailing slash or create the directory first.

Fix in Cursor Fix in Web

# Clone the temporal repo at the specific version
if [ ! -d "temporal" ]; then
git clone --depth 1 --branch "v${SERVER_VERSION}" https://github.com/temporalio/temporal.git
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Stale temporal repo causes wrong tdbg version

The script only clones the temporal repository if the directory doesn't exist. When running with different SERVER_VERSION values, a cached directory from a previous version remains, causing tdbg to be built from the wrong source version. The directory check prevents fetching the correct version-specific code. The script should either remove the directory before cloning or fetch and checkout the correct version if it exists.

Fix in Cursor Fix in Web

@chaptersix chaptersix closed this Nov 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants