Skip to content

Conversation

@shanicky
Copy link
Contributor

@shanicky shanicky commented Dec 5, 2025

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Introduce a new configuration and mechanism to specify separate parallelism settings for backfill operations in streaming jobs. This enables more efficient resource utilization by allowing different parallelism for backfill phases versus normal streaming.

  • Add streaming_parallelism_for_backfill session config parameter
  • Extend stream fragment graph and protobuf to carry backfill parallelism
  • Modify stream fragmenter and DDL controller to apply backfill parallelism
  • Implement global stream manager logic to restore normal parallelism after backfill
  • Add integration tests verifying parallelism switching before and after backfill
dev=> set STREAMING_PARALLELISM = 3;
SET_VARIABLE
dev=> set STREAMING_PARALLELISM_FOR_BACKFILL = 2;
SET_VARIABLE
dev=> create table t(v int);
CREATE_TABLE
dev=> insert into t select * from generate_series(1, 10);
INSERT 0 10
dev=> set BACKFILL_RATE_LIMIT = 1;
SET_VARIABLE
dev=> set BACKGROUND_DDL = true;
SET_VARIABLE
dev=> create materialized view m as select * from t;
CREATE_MATERIALIZED_VIEW
dev=> show jobs; select name, parallelism from rw_fragment_parallelism where name = 'm';
 Id |                   Statement                   | Create Type |   Progress
----+-----------------------------------------------+-------------+---------------
  7 | CREATE MATERIALIZED VIEW m AS SELECT * FROM t | BACKGROUND  | 40.00% (4/10)
(1 row)

 name | parallelism
------+-------------
 m    |           2
(1 rows)

dev=> show jobs; select name, parallelism from rw_fragment_parallelism;
 Id | Statement | Create Type | Progress
----+-----------+-------------+----------
(0 rows)

 name | parallelism
------+-------------
 m    |           3
(1 rows)

Checklist

  • I have written necessary rustdoc comments.

Documentation

  • My PR needs documentation updates.
Release note

Introduce a new configuration and mechanism to specify separate parallelism
settings for backfill operations in streaming jobs. This enables more
efficient resource utilization by allowing different parallelism for
backfill phases versus normal streaming.

- Add `streaming_parallelism_for_backfill` session config parameter
- Extend stream fragment graph and protobuf to carry backfill parallelism
- Modify stream fragmenter and DDL controller to apply backfill parallelism
- Implement global stream manager logic to restore normal parallelism after backfill
- Add integration tests verifying parallelism switching before and after backfill

Signed-off-by: Peng Chen <[email protected]>
Simplify and optimize the computation of backfill parallelism by removing
redundant `.clone()` calls on `normal_parallelism` and replacing
`.or_else(|| normal_parallelism.clone())` with `.or(normal_parallelism)`.

- Eliminate unnecessary cloning for a likely cheap-to-copy `Option<Parallelism>`
- Use `.or()` instead of `.or_else()` for more idiomatic and efficient fallback
- Preserve existing logic and fallback behavior without semantic changes
- Improve code readability and maintainability in stream fragmenter module

Signed-off-by: Peng Chen <[email protected]>
Introduce an optional backfill_parallelism field to the streaming_job model
to support independent parallelism configuration for backfill operations.

- Add nullable JSON binary column backfill_parallelism via migration
- Extend catalog, streaming job, DDL controllers to handle the new field
- Update stream scaling logic to skip rescheduling if backfill parallelism
  matches target parallelism
- Ensure backward compatibility by making the field optional and updating tests

Signed-off-by: Peng Chen <[email protected]>
@github-actions github-actions bot added the ci/run-backwards-compat-tests Run backwards compatibility tests in your PR. label Dec 6, 2025
Clarify and correct default backfill parallelism behavior to treat it as
optional rather than defaulting to normal parallelism. Fix a variable naming
bug in the DDL controller that affected parallelism selection. Enhance the
stream scaling rescheduling logic to avoid redundant rescheduling when the
current or backfill parallelism matches target or is unspecified.

- Change default backfill parallelism from normal_parallelism to None when
  config is Default
- Fix variable name mismatch in DdlController parallelism assignment
- Add early return conditions in GlobalStreamManager rescheduling logic
- Add integration test to verify optional and adaptive backfill parallelism behavior

Signed-off-by: Peng Chen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants