Skip to content

Optimize CI test speed with Minitest parallelization and test infrastructure improvements#2530

Open
paracycle wants to merge 42 commits intomainfrom
autoresearch/test-speed-2026-03-10
Open

Optimize CI test speed with Minitest parallelization and test infrastructure improvements#2530
paracycle wants to merge 42 commits intomainfrom
autoresearch/test-speed-2026-03-10

Conversation

@paracycle
Copy link
Member

@paracycle paracycle commented Mar 11, 2026

Motivation

CI test runs were taking ~22 minutes on average (up to ~27 minutes for the slowest matrix job). Most of this time was spent in subprocess overhead: each CLI test creates a temporary project, runs bundle install, and invokes bundle exec tapioca <command> — spawning ~95 bundle install processes and many tapioca subprocesses per run.

Implementation

This PR introduces serial test infrastructure optimizations and in-process parallel test execution via Minitest's built-in parallelize_me!, reducing CI time by ~34%.

Serial optimization breakdown (cumulative):

Optimization Time Improvement
Baseline (main) ~600s
Default reporter + Rails log silencing + minitest/hooks ~580s ~3%
Gemfile.lock content-hash caching ~420s ~28%
bundle install flags (--jobs=4 --quiet --retry=0) ~400s ~5%
Prism.parse replacing sorbet subprocess syntax check ~370s ~8%
ENFORCE_TYPECHECKING=0 in tapioca subprocesses ~340s ~8%
TAPIOCA_SKIP_VALIDATION=1 in tapioca subprocesses ~330s ~3%
In-process configure! replacing subprocess calls ~320s ~3%
Total serial improvement ~320s ~47%

Parallel execution via parallelize_me!

A Tapioca::Helpers::Test::Parallel module (25 lines) calls parallelize_me! on any test class that includes it, enabling Minitest's built-in thread pool for safe classes. Thread count is controlled by the MT_CPU env var (defaults to Etc.nprocessors).

12 test classes include the module:

  • DslSpec (+ 38 subclasses) — 374 DSL compiler tests that use fork-based isolation
  • PipelineSpec — 120 gem pipeline tests with fork-based isolation
  • BuilderSpec, CompilerSpec, ExecutorSpec, RBIHelperSpec, SorbetHelperSpec, ReflectionSpec, GenericTypeRegistrySpec, ActiveModelTypeHelperSpec, GraphqlTypeHelperSpec, LockFileDiffParserSpec — unit tests with no shared state

Classes using minitest-hooks' before(:all)/after(:all) (all SpecWithProject subclasses) remain serial, since parallelize_me! bypasses the with_info_handler lifecycle that minitest-hooks relies on.

Gemfile.lock caching (spec/helpers/mock_project.rb)

  • Caches lockfile resolution by content-hashing the Gemfile and all referenced gemspec files
  • Still runs bundle install after cache hit to ensure gems are actually installed, but skips slow dependency resolution
  • Serializes all bundle install calls under a global exclusive file lock to prevent concurrent gem directory corruption

Subprocess overhead reduction

  • ENFORCE_TYPECHECKING=0: Disables sorbet runtime type checking in tapioca subprocesses (~40% faster per invocation)
  • TAPIOCA_SKIP_VALIDATION=1: Skips sorbet static validation in test subprocesses (tests that specifically validate this behavior opt out)
  • In-process configure! method replaces tapioca("configure") subprocess calls where possible
  • Replaces sorbet subprocess syntax checking with in-process Prism.parse in DSL compiler tests

Minor test infrastructure improvements

  • Default minitest reporter instead of SpecReporter (less I/O)
  • config.log_level = :fatal for Rails logger
  • --jobs=4 --quiet --retry=0 flags for bundle install
  • Increased addon_spec timeout from 4s to 30s for CI resilience

CI Results

Metric Main (baseline) This PR Improvement
CI average per job 22.4m 14.8m 34% faster
CI fastest job 19.3m 13.3m 31% faster
CI slowest job 26.8m 17.0m 37% faster

All 14 CI matrix jobs pass (Ruby 3.2/3.3/3.4/4.0/head × Rails 8.0/current/main).

Tests

Existing test suite is unchanged and all tests pass. The bin/test entry point is unmodified — all parallelism is opt-in via include Tapioca::Helpers::Test::Parallel in individual test classes. One pre-existing test (run_gem_rbi_check_spec.rb) hangs on Ruby 4.0 due to an Open3.capture3/Bundler.with_unbundled_env interaction unrelated to this PR.

…arse in DSL tests

Use Prism.parse instead of shelling out to sorbet --stop-after parser
for syntax validation in DSL compiler tests. This eliminates ~0.06-0.5s
of subprocess overhead per rbi_for call across ~374 DSL tests.
…ests

Default enforce_typechecking to false in MockProject#tapioca since no
tests depend on runtime type validation. This reduces subprocess
overhead by ~40% per tapioca invocation by skipping sorbet-runtime
type checks.
…mmands

Skip the overhead of bundle exec by directly invoking ruby with
bundler/setup and BUNDLE_GEMFILE. This saves ~0.2-0.3s per tapioca
invocation across ~130 calls. Also handles gems.rb as an alternative
to Gemfile for projects that use it.
…lled

The lockfile cache was returning early without running bundle install,
which meant gems listed in the cached lockfile might not actually be
installed in the system gem path. This caused sqlite3 version conflicts
on CI where activerecord 7.0.x expected sqlite3 ~> 1.4 but sqlite3 2.9.x
was the only version installed.

Now the cache only pre-populates the lockfile (skipping resolution) but
still runs bundle install to ensure gems are present. Also removes
--prefer-local which could cause stale resolution.
… print sequentially

Each worker's stdout/stderr is redirected to a temp file during execution.
When a worker finishes, its output is printed as a contiguous block with
clear header/footer separators showing worker number, pass/fail status,
elapsed time, and file count. A final summary table is printed at the end.

This eliminates the interleaved output problem where multiple workers
wrote to the same stdout/stderr simultaneously.
On CI (GITHUB_ACTIONS=true), each worker's output is wrapped in
::group::/::endgroup:: markers creating collapsible log sections.
Passed workers are collapsed by default; failed workers get a
::error:: annotation visible in the PR checks summary.

Progress lines like '[1/4] ✓ Worker 2 finished in 120s (3 still running)'
appear outside groups so they're always visible, giving real-time
feedback on which workers have completed.

The final SUMMARY block stays outside any group and is always visible.

Locally (no GITHUB_ACTIONS), the separator-based format is preserved.
…al file lock

Two race conditions caused sporadic CI failures when running tests in parallel:

1. ETXTBSY: Gem.install('bundler') rewrites binstubs in the shared gem bin
   directory. If another worker is simultaneously exec'ing that binstub via
   bundle exec, the kernel returns ETXTBSY. Fixed by using cross-process
   marker files so Gem.install only runs once (first worker), not per-worker.

2. GemNotFound: Concurrent bundle install processes write gems into the same
   GEM_HOME simultaneously. A worker running bundle exec can see partially-
   installed gems. Fixed by serializing all bundle install calls under a
   global file lock (.bundle_install_global.lock).

The performance impact is minimal because lockfile caching makes most
bundle install calls fast no-ops (~1-2s), and the lock only blocks when
two workers call bundle_install! at the exact same moment.

Also uses atomic writes (write-to-temp + rename) for the lockfile cache
to prevent readers from seeing partially-written lockfiles.
…le install

The ETXTBSY race condition occurs when bundle install (writing binstubs/gems
into GEM_HOME) runs concurrently with bundle exec (executing those binstubs).

Solution: read-write file locking using flock:
- bundle_install! takes LOCK_EX (exclusive) — runs alone, no concurrent execs
- bundle_exec takes LOCK_SH (shared) — multiple execs run in parallel, but
  they wait if bundle install holds the exclusive lock

This means bundle exec calls across workers can still run concurrently (good
for performance), but they never overlap with bundle install (prevents
ETXTBSY and GemNotFound from partially-installed gems).
A monitor thread tails each worker's output file every second, parsing
minitest's dot output to count completed tests. Every 10 seconds it
prints a compact progress line:

  [50s] W1: 19 tests (ok) | W2: 104 tests (ok) | W3: 148 tests (ok) | W4: done

Failures/errors detected in the output are surfaced immediately with a
worker prefix, so you don't have to wait for the worker to finish.

Only lines consisting entirely of minitest result characters ([.FES])
are counted, avoiding false positives from error messages, stack traces,
or forked process output.
@paracycle paracycle changed the title Autoresearch/test speed 2026 03 10 Optimize CI test speed with parallel runner and test infrastructure improvements Mar 12, 2026
@paracycle paracycle marked this pull request as ready for review March 12, 2026 22:33
@paracycle paracycle requested a review from a team as a code owner March 12, 2026 22:34
Add Tapioca::Helpers::Test::Parallel module that calls parallelize_me!
on included test classes, enabling Minitest's built-in thread pool for
classes that don't rely on minitest-hooks' before(:all)/after(:all).

This replaces the 352-line custom bin/parallel_test runner with a
25-line module included in 12 safe test classes (DslSpec, PipelineSpec,
BuilderSpec, and all unit spec classes). CI switches back to bin/test.

Measured: full suite 5m34s locally (44% faster than serial baseline).
@paracycle paracycle requested a review from amomchilov March 13, 2026 20:27
@paracycle paracycle changed the title Optimize CI test speed with parallel runner and test infrastructure improvements Optimize CI test speed with Minitest parallelization and test infrastructure improvements Mar 13, 2026
Copy link
Contributor

@amomchilov amomchilov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still working through reviewing spec/helpers/mock_project.rb, but the rest looks good so far. Left a few questions/comments

module Dsl
module Helpers
class ActiveModelTypeHelperSpec < Minitest::Spec
include Tapioca::Helpers::Test::Parallel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not call parallelize_me! directly?


def wait_until_exists(path)
Timeout.timeout(4) do
Timeout.timeout(30) do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this because we're expecting more CPU contention, now that we might actually be fully saturating multiple threads?

Comment on lines -26 to -29
backtrace_filter = Minitest::ExtensibleBacktraceFilter.default_filter
backtrace_filter.add_filter(%r{gems/sorbet-runtime})
backtrace_filter.add_filter(%r{gems/railties})
backtrace_filter.add_filter(%r{tapioca/helpers/test/})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we ok with giving this up?

}
}
config.logger = Logger.new('/dev/null')
config.log_level = :fatal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a comment to explain that this is a performance optimization?

@@ -34,6 +34,7 @@ class Dummy < Rails::Application
}
}
config.logger = Logger.new('/dev/null')
Copy link
Contributor

@amomchilov amomchilov Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this would still be making syscalls. Wanna use a NullLogger like so?

class NullLogger
  include Singleton

  # : (*untyped) -> nil
  def unknown(*) = nil
  # : (*untyped) -> nil
  def fatal(*) = nil
  # : (*untyped) -> nil
  def error(*) = nil
  # : (*untyped) -> nil
  def warn(*) = nil
  # : (*untyped) -> nil
  def info(*) = nil
  # : (*untyped) -> nil
  def debug(*) = nil
  # : (untyped) -> nil

  # : () -> bool
  def fatal? = false
  # : () -> bool
  def error? = false
  # : () -> bool
  def warn? = false
  # : () -> bool
  def info? = false
  # : () -> bool
  def debug? = false

  def level=(_)
    nil
  end
end
Suggested change
config.logger = Logger.new('/dev/null')
config.logger = NullLogger.instance

I'm curious to measure how much it might help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants