test(profiling): regression test for #16661#16668
Conversation
Codeowners resolved as |
This comment has been minimized.
This comment has been minimized.
1. malloc -> malloc 2. malloc -> free 3. free -> free 4. free -> malloc
Replace memalloc_op_t enum with a simple bool _MEMALLOC_ON_THREAD for reentrancy detection. The enum distinguished malloc vs free operations, but this distinction is unnecessary — we only need to know whether we're already inside the allocator hook. In free, replace the RAII guard with a direct #ifdef MEMALLOC_ASSERT_ON_REENTRY check since we can't skip the untrack+free operations. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Performance SLOsComparing candidate taegyunkim/regression-16661 (c516b9d) with baseline main (5865898) 📈 Performance Regressions (3 suites)📈 iastaspects - 118/118✅ add_aspectTime: ✅ 103.749µs (SLO: <130.000µs 📉 -20.2%) vs baseline: +1.9% Memory: ✅ 43.594MB (SLO: <46.000MB -5.2%) vs baseline: +5.6% ✅ add_inplace_aspectTime: ✅ 100.998µs (SLO: <130.000µs 📉 -22.3%) vs baseline: -2.8% Memory: ✅ 43.642MB (SLO: <46.000MB -5.1%) vs baseline: +5.6% ✅ add_inplace_noaspectTime: ✅ 28.244µs (SLO: <40.000µs 📉 -29.4%) vs baseline: +0.3% Memory: ✅ 43.650MB (SLO: <46.000MB -5.1%) vs baseline: +5.7% ✅ add_noaspectTime: ✅ 48.672µs (SLO: <70.000µs 📉 -30.5%) vs baseline: -1.6% Memory: ✅ 43.603MB (SLO: <46.000MB -5.2%) vs baseline: +5.3% ✅ bytearray_aspectTime: ✅ 254.151µs (SLO: <400.000µs 📉 -36.5%) vs baseline: +2.1% Memory: ✅ 43.590MB (SLO: <46.000MB -5.2%) vs baseline: +5.3% ✅ bytearray_extend_aspectTime: ✅ 642.201µs (SLO: <800.000µs 📉 -19.7%) vs baseline: -0.9% Memory: ✅ 43.660MB (SLO: <46.000MB -5.1%) vs baseline: +5.6% ✅ bytearray_extend_noaspectTime: ✅ 265.561µs (SLO: <400.000µs 📉 -33.6%) vs baseline: -1.1% Memory: ✅ 43.605MB (SLO: <46.000MB -5.2%) vs baseline: +5.5% ✅ bytearray_noaspectTime: ✅ 140.933µs (SLO: <300.000µs 📉 -53.0%) vs baseline: -1.0% Memory: ✅ 43.660MB (SLO: <46.000MB -5.1%) vs baseline: +5.6% ✅ bytes_aspectTime: ✅ 219.322µs (SLO: <300.000µs 📉 -26.9%) vs baseline: -0.5% Memory: ✅ 43.613MB (SLO: <46.000MB -5.2%) vs baseline: +5.4% ✅ bytes_noaspectTime: ✅ 133.453µs (SLO: <200.000µs 📉 -33.3%) vs baseline: -0.3% Memory: ✅ 43.624MB (SLO: <46.000MB -5.2%) vs baseline: +5.3% ✅ bytesio_aspectTime: ✅ 3.784ms (SLO: <5.000ms 📉 -24.3%) vs baseline: +0.7% Memory: ✅ 43.510MB (SLO: <46.000MB -5.4%) vs baseline: +5.0% ✅ bytesio_noaspectTime: ✅ 314.371µs (SLO: <420.000µs 📉 -25.1%) vs baseline: -0.5% Memory: ✅ 43.678MB (SLO: <46.000MB -5.0%) vs baseline: +5.5% ✅ capitalize_aspectTime: ✅ 88.735µs (SLO: <300.000µs 📉 -70.4%) vs baseline: -0.6% Memory: ✅ 43.656MB (SLO: <46.000MB -5.1%) vs baseline: +5.1% ✅ capitalize_noaspectTime: ✅ 253.301µs (SLO: <300.000µs 📉 -15.6%) vs baseline: +1.8% Memory: ✅ 43.661MB (SLO: <46.000MB -5.1%) vs baseline: +5.8% ✅ casefold_aspectTime: ✅ 92.313µs (SLO: <500.000µs 📉 -81.5%) vs baseline: +3.8% Memory: ✅ 43.598MB (SLO: <46.000MB -5.2%) vs baseline: +5.5% ✅ casefold_noaspectTime: ✅ 307.132µs (SLO: <500.000µs 📉 -38.6%) vs baseline: +0.7% Memory: ✅ 43.747MB (SLO: <46.000MB -4.9%) vs baseline: +5.7% ✅ decode_aspectTime: ✅ 86.890µs (SLO: <100.000µs 📉 -13.1%) vs baseline: +0.6% Memory: ✅ 43.600MB (SLO: <46.000MB -5.2%) vs baseline: +5.5% ✅ decode_noaspectTime: ✅ 153.827µs (SLO: <210.000µs 📉 -26.7%) vs baseline: ~same Memory: ✅ 43.657MB (SLO: <46.000MB -5.1%) vs baseline: +5.7% ✅ encode_aspectTime: ✅ 85.200µs (SLO: <200.000µs 📉 -57.4%) vs baseline: +1.3% Memory: ✅ 43.689MB (SLO: <46.000MB -5.0%) vs baseline: +5.7% ✅ encode_noaspectTime: ✅ 141.009µs (SLO: <200.000µs 📉 -29.5%) vs baseline: +0.3% Memory: ✅ 43.586MB (SLO: <46.000MB -5.2%) vs baseline: +5.3% ✅ format_aspectTime: ✅ 14.583ms (SLO: <19.200ms 📉 -24.0%) vs baseline: +0.1% Memory: ✅ 43.856MB (SLO: <46.000MB -4.7%) vs baseline: +5.4% ✅ format_map_aspectTime: ✅ 16.420ms (SLO: <21.500ms 📉 -23.6%) vs baseline: ~same Memory: ✅ 43.680MB (SLO: <46.000MB -5.0%) vs baseline: +4.7% ✅ format_map_noaspectTime: ✅ 369.530µs (SLO: <500.000µs 📉 -26.1%) vs baseline: -1.1% Memory: ✅ 43.608MB (SLO: <46.000MB -5.2%) vs baseline: +5.5% ✅ format_noaspectTime: ✅ 304.544µs (SLO: <500.000µs 📉 -39.1%) vs baseline: -3.3% Memory: ✅ 43.565MB (SLO: <46.000MB -5.3%) vs baseline: +5.2% ✅ index_aspectTime: ✅ 123.432µs (SLO: <300.000µs 📉 -58.9%) vs baseline: -2.1% Memory: ✅ 43.775MB (SLO: <46.000MB -4.8%) vs baseline: +5.9% ✅ index_noaspectTime: ✅ 40.542µs (SLO: <300.000µs 📉 -86.5%) vs baseline: -0.2% Memory: ✅ 43.636MB (SLO: <46.000MB -5.1%) vs baseline: +5.2% ✅ join_aspectTime: ✅ 212.761µs (SLO: <300.000µs 📉 -29.1%) vs baseline: -0.5% Memory: ✅ 43.673MB (SLO: <46.000MB -5.1%) vs baseline: +5.7% ✅ join_noaspectTime: ✅ 145.881µs (SLO: <300.000µs 📉 -51.4%) vs baseline: -1.6% Memory: ✅ 43.633MB (SLO: <46.000MB -5.1%) vs baseline: +5.2% ✅ ljust_aspectTime: ✅ 497.882µs (SLO: <700.000µs 📉 -28.9%) vs baseline: +0.8% Memory: ✅ 43.652MB (SLO: <46.000MB -5.1%) vs baseline: +5.5% ✅ ljust_noaspectTime: ✅ 258.800µs (SLO: <300.000µs 📉 -13.7%) vs baseline: +0.3% Memory: ✅ 43.524MB (SLO: <46.000MB -5.4%) vs baseline: +5.1% ✅ lower_aspectTime: ✅ 294.765µs (SLO: <500.000µs 📉 -41.0%) vs baseline: -1.2% Memory: ✅ 43.693MB (SLO: <46.000MB -5.0%) vs baseline: +5.6% ✅ lower_noaspectTime: ✅ 235.594µs (SLO: <300.000µs 📉 -21.5%) vs baseline: +0.3% Memory: ✅ 43.632MB (SLO: <46.000MB -5.1%) vs baseline: +5.5% ✅ lstrip_aspectTime: ✅ 0.339ms (SLO: <3.000ms 📉 -88.7%) vs baseline: 📈 +22.6% Memory: ✅ 43.646MB (SLO: <46.000MB -5.1%) vs baseline: +5.7% ✅ lstrip_noaspectTime: ✅ 0.177ms (SLO: <3.000ms 📉 -94.1%) vs baseline: +0.6% Memory: ✅ 43.632MB (SLO: <46.000MB -5.1%) vs baseline: +5.1% ✅ modulo_aspectTime: ✅ 14.250ms (SLO: <18.750ms 📉 -24.0%) vs baseline: -0.3% Memory: ✅ 43.752MB (SLO: <46.000MB -4.9%) vs baseline: +5.3% ✅ modulo_aspect_for_bytearray_bytearrayTime: ✅ 14.802ms (SLO: <19.350ms 📉 -23.5%) vs baseline: -0.1% Memory: ✅ 43.760MB (SLO: <46.000MB -4.9%) vs baseline: +5.3% ✅ modulo_aspect_for_bytesTime: ✅ 14.407ms (SLO: <18.900ms 📉 -23.8%) vs baseline: ~same Memory: ✅ 44.019MB (SLO: <46.000MB -4.3%) vs baseline: +6.0% ✅ modulo_aspect_for_bytes_bytearrayTime: ✅ 14.578ms (SLO: <19.150ms 📉 -23.9%) vs baseline: -0.9% Memory: ✅ 43.680MB (SLO: <46.000MB -5.0%) vs baseline: +5.2% ✅ modulo_noaspectTime: ✅ 0.366ms (SLO: <3.000ms 📉 -87.8%) vs baseline: +0.3% Memory: ✅ 43.641MB (SLO: <46.000MB -5.1%) vs baseline: +5.5% ✅ replace_aspectTime: ✅ 18.352ms (SLO: <24.000ms 📉 -23.5%) vs baseline: -0.5% Memory: ✅ 43.621MB (SLO: <46.000MB -5.2%) vs baseline: +5.1% ✅ replace_noaspectTime: ✅ 291.668µs (SLO: <300.000µs -2.8%) vs baseline: +3.2% Memory: ✅ 43.629MB (SLO: <46.000MB -5.2%) vs baseline: +5.1% ✅ repr_aspectTime: ✅ 316.514µs (SLO: <420.000µs 📉 -24.6%) vs baseline: -1.5% Memory: ✅ 43.713MB (SLO: <46.000MB -5.0%) vs baseline: +5.7% ✅ repr_noaspectTime: ✅ 47.133µs (SLO: <90.000µs 📉 -47.6%) vs baseline: +0.6% Memory: ✅ 43.560MB (SLO: <46.000MB -5.3%) vs baseline: +5.3% ✅ rstrip_aspectTime: ✅ 382.536µs (SLO: <500.000µs 📉 -23.5%) vs baseline: -2.8% Memory: ✅ 43.818MB (SLO: <46.000MB -4.7%) vs baseline: +5.9% ✅ rstrip_noaspectTime: ✅ 183.848µs (SLO: <300.000µs 📉 -38.7%) vs baseline: +0.3% Memory: ✅ 43.615MB (SLO: <46.000MB -5.2%) vs baseline: +5.5% ✅ slice_aspectTime: ✅ 181.753µs (SLO: <300.000µs 📉 -39.4%) vs baseline: -0.5% Memory: ✅ 43.754MB (SLO: <46.000MB -4.9%) vs baseline: +5.9% ✅ slice_noaspectTime: ✅ 54.103µs (SLO: <90.000µs 📉 -39.9%) vs baseline: +0.3% Memory: ✅ 43.634MB (SLO: <46.000MB -5.1%) vs baseline: +5.4% ✅ stringio_aspectTime: ✅ 3.821ms (SLO: <5.000ms 📉 -23.6%) vs baseline: +0.2% Memory: ✅ 43.617MB (SLO: <46.000MB -5.2%) vs baseline: +5.7% ✅ stringio_noaspectTime: ✅ 348.706µs (SLO: <500.000µs 📉 -30.3%) vs baseline: +0.7% Memory: ✅ 43.748MB (SLO: <46.000MB -4.9%) vs baseline: +5.9% ✅ strip_aspectTime: ✅ 270.279µs (SLO: <350.000µs 📉 -22.8%) vs baseline: -1.7% Memory: ✅ 43.632MB (SLO: <46.000MB -5.1%) vs baseline: +5.7% ✅ strip_noaspectTime: ✅ 176.744µs (SLO: <240.000µs 📉 -26.4%) vs baseline: ~same Memory: ✅ 43.634MB (SLO: <46.000MB -5.1%) vs baseline: +5.4% ✅ swapcase_aspectTime: ✅ 332.845µs (SLO: <500.000µs 📉 -33.4%) vs baseline: ~same Memory: ✅ 43.570MB (SLO: <46.000MB -5.3%) vs baseline: +5.6% ✅ swapcase_noaspectTime: ✅ 272.074µs (SLO: <400.000µs 📉 -32.0%) vs baseline: -0.3% Memory: ✅ 43.567MB (SLO: <46.000MB -5.3%) vs baseline: +5.0% ✅ title_aspectTime: ✅ 320.888µs (SLO: <500.000µs 📉 -35.8%) vs baseline: +0.3% Memory: ✅ 43.653MB (SLO: <46.000MB -5.1%) vs baseline: +5.5% ✅ title_noaspectTime: ✅ 259.581µs (SLO: <400.000µs 📉 -35.1%) vs baseline: -0.5% Memory: ✅ 43.589MB (SLO: <46.000MB -5.2%) vs baseline: +5.2% ✅ translate_aspectTime: ✅ 557.579µs (SLO: <700.000µs 📉 -20.3%) vs baseline: 📈 +11.6% Memory: ✅ 43.517MB (SLO: <46.000MB -5.4%) vs baseline: +5.3% ✅ translate_noaspectTime: ✅ 420.852µs (SLO: <500.000µs 📉 -15.8%) vs baseline: -2.3% Memory: ✅ 43.654MB (SLO: <46.000MB -5.1%) vs baseline: +5.7% ✅ upper_aspectTime: ✅ 295.138µs (SLO: <500.000µs 📉 -41.0%) vs baseline: -0.5% Memory: ✅ 43.673MB (SLO: <46.000MB -5.1%) vs baseline: +5.7% ✅ upper_noaspectTime: ✅ 235.428µs (SLO: <400.000µs 📉 -41.1%) vs baseline: -0.6% Memory: ✅ 43.594MB (SLO: <46.000MB -5.2%) vs baseline: +5.3% 📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 519.437µs (SLO: <700.000µs 📉 -25.8%) vs baseline: 📈 +24.3% Memory: ✅ 43.746MB (SLO: <46.000MB -4.9%) vs baseline: +5.3% ✅ ospathbasename_noaspectTime: ✅ 429.002µs (SLO: <700.000µs 📉 -38.7%) vs baseline: +0.8% Memory: ✅ 43.647MB (SLO: <46.000MB -5.1%) vs baseline: +5.0% ✅ ospathjoin_aspectTime: ✅ 628.913µs (SLO: <700.000µs 📉 -10.2%) vs baseline: +0.9% Memory: ✅ 43.692MB (SLO: <46.000MB -5.0%) vs baseline: +5.3% ✅ ospathjoin_noaspectTime: ✅ 637.610µs (SLO: <700.000µs -8.9%) vs baseline: +1.3% Memory: ✅ 43.688MB (SLO: <46.000MB -5.0%) vs baseline: +5.4% ✅ ospathnormcase_aspectTime: ✅ 350.141µs (SLO: <700.000µs 📉 -50.0%) vs baseline: +0.5% Memory: ✅ 43.686MB (SLO: <46.000MB -5.0%) vs baseline: +5.4% ✅ ospathnormcase_noaspectTime: ✅ 361.337µs (SLO: <700.000µs 📉 -48.4%) vs baseline: +1.3% Memory: ✅ 43.726MB (SLO: <46.000MB -4.9%) vs baseline: +5.3% ✅ ospathsplit_aspectTime: ✅ 488.053µs (SLO: <700.000µs 📉 -30.3%) vs baseline: +0.6% Memory: ✅ 43.686MB (SLO: <46.000MB -5.0%) vs baseline: +5.5% ✅ ospathsplit_noaspectTime: ✅ 499.679µs (SLO: <700.000µs 📉 -28.6%) vs baseline: +0.8% Memory: ✅ 43.667MB (SLO: <46.000MB -5.1%) vs baseline: +5.3% ✅ ospathsplitdrive_aspectTime: ✅ 376.496µs (SLO: <700.000µs 📉 -46.2%) vs baseline: +2.0% Memory: ✅ 43.728MB (SLO: <46.000MB -4.9%) vs baseline: +5.8% ✅ ospathsplitdrive_noaspectTime: ✅ 72.936µs (SLO: <700.000µs 📉 -89.6%) vs baseline: -0.5% Memory: ✅ 43.693MB (SLO: <46.000MB -5.0%) vs baseline: +5.4% ✅ ospathsplitext_aspectTime: ✅ 455.357µs (SLO: <700.000µs 📉 -34.9%) vs baseline: -0.9% Memory: ✅ 43.720MB (SLO: <46.000MB -5.0%) vs baseline: +5.5% ✅ ospathsplitext_noaspectTime: ✅ 466.420µs (SLO: <700.000µs 📉 -33.4%) vs baseline: +1.5% Memory: ✅ 43.667MB (SLO: <46.000MB -5.1%) vs baseline: +5.1% 📈 telemetryaddmetric - 30/30✅ 1-count-metric-1-timesTime: ✅ 2.427µs (SLO: <20.000µs 📉 -87.9%) vs baseline: 📈 +10.2% Memory: ✅ 36.628MB (SLO: <38.000MB -3.6%) vs baseline: +6.3% ✅ 1-count-metrics-100-timesTime: ✅ 157.072µs (SLO: <220.000µs 📉 -28.6%) vs baseline: +0.3% Memory: ✅ 36.549MB (SLO: <38.000MB -3.8%) vs baseline: +6.1% ✅ 1-distribution-metric-1-timesTime: ✅ 2.501µs (SLO: <20.000µs 📉 -87.5%) vs baseline: -0.7% Memory: ✅ 36.353MB (SLO: <38.000MB -4.3%) vs baseline: +5.4% ✅ 1-distribution-metrics-100-timesTime: ✅ 163.679µs (SLO: <230.000µs 📉 -28.8%) vs baseline: -3.0% Memory: ✅ 36.294MB (SLO: <38.000MB -4.5%) vs baseline: +5.5% ✅ 1-gauge-metric-1-timesTime: ✅ 2.022µs (SLO: <20.000µs 📉 -89.9%) vs baseline: -0.1% Memory: ✅ 36.648MB (SLO: <38.000MB -3.6%) vs baseline: +6.5% ✅ 1-gauge-metrics-100-timesTime: ✅ 136.834µs (SLO: <150.000µs -8.8%) vs baseline: -3.4% Memory: ✅ 36.392MB (SLO: <38.000MB -4.2%) vs baseline: +5.5% ✅ 1-rate-metric-1-timesTime: ✅ 2.287µs (SLO: <20.000µs 📉 -88.6%) vs baseline: -1.8% Memory: ✅ 36.274MB (SLO: <38.000MB -4.5%) vs baseline: +5.4% ✅ 1-rate-metrics-100-timesTime: ✅ 167.993µs (SLO: <250.000µs 📉 -32.8%) vs baseline: -2.0% Memory: ✅ 36.726MB (SLO: <38.000MB -3.4%) vs baseline: +6.7% ✅ 100-count-metrics-100-timesTime: ✅ 15.833ms (SLO: <22.000ms 📉 -28.0%) vs baseline: +0.2% Memory: ✅ 36.431MB (SLO: <38.000MB -4.1%) vs baseline: +5.8% ✅ 100-distribution-metrics-100-timesTime: ✅ 1.746ms (SLO: <2.550ms 📉 -31.5%) vs baseline: -2.6% Memory: ✅ 36.707MB (SLO: <38.000MB -3.4%) vs baseline: +6.6% ✅ 100-gauge-metrics-100-timesTime: ✅ 1.408ms (SLO: <1.550ms -9.2%) vs baseline: -4.4% Memory: ✅ 36.510MB (SLO: <38.000MB -3.9%) vs baseline: +5.9% ✅ 100-rate-metrics-100-timesTime: ✅ 1.759ms (SLO: <2.550ms 📉 -31.0%) vs baseline: -1.6% Memory: ✅ 36.569MB (SLO: <38.000MB -3.8%) vs baseline: +6.0% ✅ flush-1-metricTime: ✅ 3.617µs (SLO: <20.000µs 📉 -81.9%) vs baseline: +0.4% Memory: ✅ 36.687MB (SLO: <38.000MB -3.5%) vs baseline: +5.5% ✅ flush-100-metricsTime: ✅ 176.718µs (SLO: <250.000µs 📉 -29.3%) vs baseline: +0.5% Memory: ✅ 36.785MB (SLO: <38.000MB -3.2%) vs baseline: +5.6% ✅ flush-1000-metricsTime: ✅ 2.208ms (SLO: <2.500ms 📉 -11.7%) vs baseline: +0.8% Memory: ✅ 37.493MB (SLO: <38.750MB -3.2%) vs baseline: +6.1% 🟡 Near SLO Breach (2 suites)🟡 djangosimple - 29/29✅ appsecTime: ✅ 19.588ms (SLO: <22.300ms 📉 -12.2%) vs baseline: +0.1% Memory: ✅ 68.813MB (SLO: <73.500MB -6.4%) vs baseline: +5.1% ✅ exception-replay-enabledTime: ✅ 1.384ms (SLO: <1.450ms -4.5%) vs baseline: +0.5% Memory: ✅ 66.667MB (SLO: <71.500MB -6.8%) vs baseline: +5.3% ✅ iastTime: ✅ 19.688ms (SLO: <22.250ms 📉 -11.5%) vs baseline: +0.3% Memory: ✅ 68.773MB (SLO: <75.000MB -8.3%) vs baseline: +5.1% ✅ profilerTime: ✅ 15.217ms (SLO: <16.550ms -8.1%) vs baseline: -0.4% Memory: ✅ 59.881MB (SLO: <61.000MB 🟡 -1.8%) vs baseline: +5.4% ✅ resource-renamingTime: ✅ 19.539ms (SLO: <21.750ms 📉 -10.2%) vs baseline: +0.1% ✅ span-code-originTime: ✅ 19.890ms (SLO: <28.200ms 📉 -29.5%) vs baseline: +0.5% Memory: ✅ 68.405MB (SLO: <75.000MB -8.8%) vs baseline: +5.1% ✅ tracerTime: ✅ 19.645ms (SLO: <21.750ms -9.7%) vs baseline: ~same Memory: ✅ 68.734MB (SLO: <75.000MB -8.4%) vs baseline: +5.0% ✅ tracer-and-profilerTime: ✅ 21.090ms (SLO: <23.500ms 📉 -10.3%) vs baseline: -0.1% Memory: ✅ 70.346MB (SLO: <75.000MB -6.2%) vs baseline: +5.0% ✅ tracer-dont-create-db-spansTime: ✅ 19.687ms (SLO: <21.500ms -8.4%) vs baseline: +0.5% Memory: ✅ 68.754MB (SLO: <75.000MB -8.3%) vs baseline: +5.1% ✅ tracer-minimalTime: ✅ 16.870ms (SLO: <17.500ms -3.6%) vs baseline: +0.5% Memory: ✅ 68.479MB (SLO: <75.000MB -8.7%) vs baseline: +5.2% ✅ tracer-nativeTime: ✅ 19.526ms (SLO: <21.750ms 📉 -10.2%) vs baseline: +0.3% Memory: ✅ 68.734MB (SLO: <72.500MB -5.2%) vs baseline: +5.0% ✅ tracer-no-cachesTime: ✅ 17.541ms (SLO: <19.650ms 📉 -10.7%) vs baseline: ~same Memory: ✅ 68.656MB (SLO: <75.000MB -8.5%) vs baseline: +5.5% ✅ tracer-no-databasesTime: ✅ 19.194ms (SLO: <20.100ms -4.5%) vs baseline: ~same Memory: ✅ 68.400MB (SLO: <75.000MB -8.8%) vs baseline: +5.2% ✅ tracer-no-middlewareTime: ✅ 19.308ms (SLO: <21.500ms 📉 -10.2%) vs baseline: +0.1% Memory: ✅ 68.773MB (SLO: <75.000MB -8.3%) vs baseline: +5.7% ✅ tracer-no-templatesTime: ✅ 19.730ms (SLO: <22.000ms 📉 -10.3%) vs baseline: +1.9% Memory: ✅ 68.793MB (SLO: <73.500MB -6.4%) vs baseline: +5.1% 🟡 flasksimple - 18/18✅ appsec-getTime: ✅ 3.426ms (SLO: <4.750ms 📉 -27.9%) vs baseline: ~same Memory: ✅ 55.876MB (SLO: <66.500MB 📉 -16.0%) vs baseline: +4.9% ✅ appsec-postTime: ✅ 2.897ms (SLO: <6.750ms 📉 -57.1%) vs baseline: +0.3% Memory: ✅ 55.896MB (SLO: <66.500MB 📉 -15.9%) vs baseline: +4.9% ✅ appsec-telemetryTime: ✅ 3.447ms (SLO: <4.750ms 📉 -27.4%) vs baseline: +1.0% Memory: ✅ 55.876MB (SLO: <66.500MB 📉 -16.0%) vs baseline: +4.8% ✅ debuggerTime: ✅ 1.873ms (SLO: <2.000ms -6.4%) vs baseline: ~same Memory: ✅ 49.427MB (SLO: <51.500MB -4.0%) vs baseline: +5.1% ✅ iast-getTime: ✅ 1.858ms (SLO: <2.000ms -7.1%) vs baseline: ~same Memory: ✅ 46.183MB (SLO: <49.000MB -5.7%) vs baseline: +5.4% ✅ profilerTime: ✅ 1.898ms (SLO: <2.100ms -9.6%) vs baseline: ~same Memory: ✅ 52.298MB (SLO: <52.500MB 🟡 -0.4%) vs baseline: +5.5% ✅ resource-renamingTime: ✅ 3.394ms (SLO: <3.650ms -7.0%) vs baseline: ~same Memory: ✅ 55.935MB (SLO: <60.000MB -6.8%) vs baseline: +4.9% ✅ tracerTime: ✅ 3.414ms (SLO: <3.650ms -6.5%) vs baseline: +0.2% Memory: ✅ 55.856MB (SLO: <60.000MB -6.9%) vs baseline: +4.9% ✅ tracer-nativeTime: ✅ 3.417ms (SLO: <3.650ms -6.4%) vs baseline: +0.4% Memory: ✅ 55.876MB (SLO: <60.000MB -6.9%) vs baseline: +4.9%
|
11e2fda to
db80a44
Compare
…ime flag Replace DD_PROFILING_ENABLE_ASSERTS with DD_PROFILING_MEMALLOC_ASSERT_ON_REENTRY for the memalloc reentry assertion build flag, matching the original taegyunkim/avoid-decref-memalloc branch. DD_PROFILING_ENABLE_ASSERTS is a runtime flag while DD_PROFILING_MEMALLOC_ASSERT_ON_REENTRY is specifically the build-time flag that controls compile-time abort on reentrant allocator hook calls. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…ack unwinding (#16661) ## Description The memory profiler's allocator hook previously used CPython public frame APIs (`PyThreadState_GetFrame`, `PyFrame_GetBack`, `PyFrame_GetCode`) to walk the stack. These APIs return new references, requiring `Py_INCREF`/`Py_DECREF` calls that can trigger reentrant allocations or frees from within the allocator hook — leading to undefined behavior. This PR replaces those APIs with direct struct field access (`_memalloc_frame.h`), using only borrowed references with zero refcount overhead. Also adds a compile-time reentry detection flag (`MEMALLOC_ASSERT_ON_REENTRY`) for test and debug builds. ## Testing - Existing profiling test suites pass - Compile-time assert (`MEMALLOC_ASSERT_ON_REENTRY`) enables early detection in CI/debug builds - Regression test in #16668 reproduces the reentrant crash — tests fail without this fix ## Risks Low — the new frame-walking code reads the same internal structs that CPython's own public APIs wrap, but skips the refcount bookkeeping. The GIL is held during the allocator hook, so the struct reads are safe. ## Additional Notes - `_memalloc_frame.h` has version-specific paths for Python 3.9 through 3.14+ - Non-ASCII function/file names fall back to `<non-ascii>` to avoid `PyUnicode_AsUTF8AndSize` which can allocate a UTF-8 cache - Would refactor ddtrace/profiling/collector/_memalloc_frame.h so that we can share some of these between memory and stack profilers Co-authored-by: taegyun.kim <[email protected]>
…ack unwinding (#16661) ## Description The memory profiler's allocator hook previously used CPython public frame APIs (`PyThreadState_GetFrame`, `PyFrame_GetBack`, `PyFrame_GetCode`) to walk the stack. These APIs return new references, requiring `Py_INCREF`/`Py_DECREF` calls that can trigger reentrant allocations or frees from within the allocator hook — leading to undefined behavior. This PR replaces those APIs with direct struct field access (`_memalloc_frame.h`), using only borrowed references with zero refcount overhead. Also adds a compile-time reentry detection flag (`MEMALLOC_ASSERT_ON_REENTRY`) for test and debug builds. ## Testing - Existing profiling test suites pass - Compile-time assert (`MEMALLOC_ASSERT_ON_REENTRY`) enables early detection in CI/debug builds - Regression test in #16668 reproduces the reentrant crash — tests fail without this fix ## Risks Low — the new frame-walking code reads the same internal structs that CPython's own public APIs wrap, but skips the refcount bookkeeping. The GIL is held during the allocator hook, so the struct reads are safe. ## Additional Notes - `_memalloc_frame.h` has version-specific paths for Python 3.9 through 3.14+ - Non-ASCII function/file names fall back to `<non-ascii>` to avoid `PyUnicode_AsUTF8AndSize` which can allocate a UTF-8 cache - Would refactor ddtrace/profiling/collector/_memalloc_frame.h so that we can share some of these between memory and stack profilers Co-authored-by: taegyun.kim <[email protected]> (cherry picked from commit 0952bf8) Co-authored-by: Taegyun Kim <[email protected]>
…ack unwinding (#16661) ## Description The memory profiler's allocator hook previously used CPython public frame APIs (`PyThreadState_GetFrame`, `PyFrame_GetBack`, `PyFrame_GetCode`) to walk the stack. These APIs return new references, requiring `Py_INCREF`/`Py_DECREF` calls that can trigger reentrant allocations or frees from within the allocator hook — leading to undefined behavior. This PR replaces those APIs with direct struct field access (`_memalloc_frame.h`), using only borrowed references with zero refcount overhead. Also adds a compile-time reentry detection flag (`MEMALLOC_ASSERT_ON_REENTRY`) for test and debug builds. ## Testing - Existing profiling test suites pass - Compile-time assert (`MEMALLOC_ASSERT_ON_REENTRY`) enables early detection in CI/debug builds - Regression test in #16668 reproduces the reentrant crash — tests fail without this fix ## Risks Low — the new frame-walking code reads the same internal structs that CPython's own public APIs wrap, but skips the refcount bookkeeping. The GIL is held during the allocator hook, so the struct reads are safe. ## Additional Notes - `_memalloc_frame.h` has version-specific paths for Python 3.9 through 3.14+ - Non-ASCII function/file names fall back to `<non-ascii>` to avoid `PyUnicode_AsUTF8AndSize` which can allocate a UTF-8 cache - Would refactor ddtrace/profiling/collector/_memalloc_frame.h so that we can share some of these between memory and stack profilers Co-authored-by: taegyun.kim <[email protected]>
…ack unwinding (#16661) ## Description The memory profiler's allocator hook previously used CPython public frame APIs (`PyThreadState_GetFrame`, `PyFrame_GetBack`, `PyFrame_GetCode`) to walk the stack. These APIs return new references, requiring `Py_INCREF`/`Py_DECREF` calls that can trigger reentrant allocations or frees from within the allocator hook — leading to undefined behavior. This PR replaces those APIs with direct struct field access (`_memalloc_frame.h`), using only borrowed references with zero refcount overhead. Also adds a compile-time reentry detection flag (`MEMALLOC_ASSERT_ON_REENTRY`) for test and debug builds. ## Testing - Existing profiling test suites pass - Compile-time assert (`MEMALLOC_ASSERT_ON_REENTRY`) enables early detection in CI/debug builds - Regression test in #16668 reproduces the reentrant crash — tests fail without this fix ## Risks Low — the new frame-walking code reads the same internal structs that CPython's own public APIs wrap, but skips the refcount bookkeeping. The GIL is held during the allocator hook, so the struct reads are safe. ## Additional Notes - `_memalloc_frame.h` has version-specific paths for Python 3.9 through 3.14+ - Non-ASCII function/file names fall back to `<non-ascii>` to avoid `PyUnicode_AsUTF8AndSize` which can allocate a UTF-8 cache - Would refactor ddtrace/profiling/collector/_memalloc_frame.h so that we can share some of these between memory and stack profilers Co-authored-by: taegyun.kim <[email protected]> (cherry picked from commit 0952bf8) Co-authored-by: Taegyun Kim <[email protected]>
…ack unwinding (#16661) ## Description The memory profiler's allocator hook previously used CPython public frame APIs (`PyThreadState_GetFrame`, `PyFrame_GetBack`, `PyFrame_GetCode`) to walk the stack. These APIs return new references, requiring `Py_INCREF`/`Py_DECREF` calls that can trigger reentrant allocations or frees from within the allocator hook — leading to undefined behavior. This PR replaces those APIs with direct struct field access (`_memalloc_frame.h`), using only borrowed references with zero refcount overhead. Also adds a compile-time reentry detection flag (`MEMALLOC_ASSERT_ON_REENTRY`) for test and debug builds. ## Testing - Existing profiling test suites pass - Compile-time assert (`MEMALLOC_ASSERT_ON_REENTRY`) enables early detection in CI/debug builds - Regression test in #16668 reproduces the reentrant crash — tests fail without this fix ## Risks Low — the new frame-walking code reads the same internal structs that CPython's own public APIs wrap, but skips the refcount bookkeeping. The GIL is held during the allocator hook, so the struct reads are safe. ## Additional Notes - `_memalloc_frame.h` has version-specific paths for Python 3.9 through 3.14+ - Non-ASCII function/file names fall back to `<non-ascii>` to avoid `PyUnicode_AsUTF8AndSize` which can allocate a UTF-8 cache - Would refactor ddtrace/profiling/collector/_memalloc_frame.h so that we can share some of these between memory and stack profilers Co-authored-by: taegyun.kim <[email protected]> (cherry picked from commit 0952bf8) Co-authored-by: Taegyun Kim <[email protected]>
Description
Testing
Risks
Additional Notes