-
Introduce OAL V2 engine:
- Immutable AST models for thread safety and predictable behavior
- Type-safe enums replacing string-based filter operators
- Precise error location reporting with file, line, and column numbers
- Clean separation between parsing and code generation phases
- Enhanced testability with models that can be constructed without parsing
-
Introduce MAL/LAL/Hierarchy V2 engine — replace Groovy-based DSL runtime with ANTLR4 parser + Javassist bytecode generation:
- Remove Groovy runtime dependency from OAP backend
- Fail-fast compilation at startup — syntax and type errors are caught immediately instead of at first execution
- Thread-safe generated classes with no ThreadLocal or shared mutable state
- Immutable AST models for all three DSLs (MAL, LAL, Hierarchy rules)
- Explicit context passing replaces Groovy binding/closure capture
- v1 (Groovy) and v2 (ANTLR4+Javassist) cross-version checker validates behavioral equivalence across 1,290+ expressions
- JMH benchmarks confirm v2 runtime speedups: MAL execute ~6.8x, LAL compile ~39x / execute ~2.8x, Hierarchy execute ~2.6x faster than Groovy v1
- Generated class names follow
{yamlFileName}_L{lineNo}_{ruleName}pattern for all DSLs (MAL/LAL/Hierarchy) for stack trace traceability
-
Fix E2E test metrics verify: make it failure if the metric values all null.
-
Support building, testing, and publishing with Java 25.
-
Add
CLAUDE.mdas AI assistant guide for the project. -
Upgrade Byte Buddy to 1.18.7 and configure explicit
-javaagentfor Mockito/Byte Buddy in Surefire to avoid JDK 25 dynamic agent loading warnings. -
Upgrade Groovy to 5.0.3 in OAP backend.
-
Bump up nodejs to v24.13.0 for the latest UI(booster-ui) compiling.
-
Drop Elasticsearch 7.x (EOL) and OpenSearch 1.x from E2E tests, upgrade all ES tests to 8.18.8, and update skywalking-helm to use ECK 8.18.8.
-
Add
library-batch-queuemodule — a partitioned, self-draining queue with type-based dispatch, adaptive partitioning, idle backoff, and throughput-weighted drain rebalancing (DrainBalancer). Designed to replace DataCarrier in high-fan-out scenarios. -
Replace DataCarrier with BatchQueue for L1 metrics aggregation, L2 metrics persistence, TopN persistence, all three exporters (gRPC metrics, Kafka trace, Kafka log), and gRPC remote client. All metric types (OAL + MAL) now share unified queues instead of separate OAL/MAL pools. Each exporter keeps its own dedicated queue with 1 thread, preserving original buffer strategies. Thread count comparison on an 8-core machine (gRPC remote client excluded — unchanged 1 thread per peer):
Queue Old threads Old channels Old buffer slots New threads New partitions New buffer slots New policy L1 Aggregation (OAL) 24 ~1,240 ~12.4M 8 (unified) ~330 adaptive ~6.6M cpuCores(1.0)L1 Aggregation (MAL) 2 ~100 ~100K (unified above) L2 Persistence (OAL) 2 ~620 ~1.24M 3 (unified) ~330 adaptive ~660K cpuCoresWithBase(1, 0.25)L2 Persistence (MAL) 1 ~100 ~100K (unified above) TopN Persistence 4 4 4K 1 4 adaptive 4K fixed(1)Exporters (gRPC/Kafka) 3 6 120K 3 (1 per exporter) — 60K fixed(1)eachTotal 36 ~2,070 ~13.9M 15 ~664 ~7.3M -
Remove
library-datacarrier-queuemodule. All usages have been replaced bylibrary-batch-queue. -
Enable throughput-weighted drain rebalancing for L1 aggregation and L2 persistence queues (10s interval). Periodically reassigns partitions across drain threads to equalize load when metric types have skewed throughput.
-
Add benchmark framework under
benchmarks/with Kind-based Kubernetes environments, automated thread dump collection and analysis. First case:thread-analysisonistio-cluster_oap-banyandbenvironment. -
Add virtual thread support (JDK 25+) for gRPC and Armeria HTTP server handler threads. Set
SW_VIRTUAL_THREADS_ENABLED=falseto disable.Pool Threads (JDK < 25) Threads (JDK 25+) gRPC server handler ( core-grpc,receiver-grpc,als-grpc,ebpf-grpc)Cached platform (unbounded) Virtual threads HTTP blocking ( core-http,receiver-http,promql-http,logql-http,zipkin-query-http,zipkin-http,firehose-http)Cached platform (max 200) Virtual threads VT carrier threads (ForkJoinPool) N/A ~9 shared On JDK 25+, all 11 thread pools above share ~9 carrier threads instead of up to 1,400+ platform threads.
-
Change default Docker base image to JDK 25 (
eclipse-temurin:25-jre). JDK 11 kept as-java11variant. -
Thread count benchmark comparison — 2-node OAP cluster on JDK 25 with BanyanDB, Istio bookinfo traffic (10-core machine, JVM-internal threads excluded):
Pool v10.3.0 threads v10.4.0 threads Notes L1 Aggregation (OAL + MAL) 26 (DataCarrier) 10 (BatchQueue) Unified OAL + MAL L2 Persistence (OAL + MAL) 3 (DataCarrier) 4 (BatchQueue) Unified OAL + MAL TopN Persistence 4 (DataCarrier) 1 (BatchQueue) gRPC Remote Client 1 (DataCarrier) 1 (BatchQueue) Per peer Armeria HTTP event loop 20 5 min(5, cores)shared groupArmeria HTTP handler on-demand platform(increasing with payload) - Virtual threads on JDK 25+ gRPC event loop 10 10 Unchanged gRPC handler on-demand platform(increasing with payload) - Virtual threads on JDK 25+ ForkJoinPool (Virtual Thread carrier) 0 ~10 JDK 25+ virtual thread scheduler HttpClient-SelectorManager 4 2 SharedKubernetesClient Schedulers + others ~24 ~24 Mostly unchanged Total (OAP threads) 150+ ~72 ~50% reduction, stable in high payload. -
Replace PowerMock Whitebox with standard Java Reflection in
server-library,server-core, andserver-configurationto support JDK 25+. -
Fix
/debugging/config/dumpmay leak sensitive information if there are second level properties in the configuration.
-
KubernetesCoordinator: make self instance return real pod IP address instead of
127.0.0.1. -
Enhance the alarm kernel with recovered status notification capability
-
Fix BrowserWebVitalsPerfData
clsTimetoclsand make it double type. -
Init
log-mal-rulesat module provider start stage to avoid re-init for every LAL. -
Fail fast if SampleFamily is empty after MAL filter expression.
-
Fix range matrix and scalar binary operation in PromQL.
-
Add
LatestLabeledFunctionfor meter. -
MAL Labeled metrics support additional attributes.
-
Bump up netty to 4.2.9.Final.
-
Add support for OpenSearch/ElasticSearch client certificate authentication.
-
Fix BanyanDB logs paging query.
-
Replace BanyanDB Java client with native implementation.
-
Remove
bydb.dependencies.propertiesand set the compatible BanyanDB API version number in${SW_STORAGE_BANYANDB_COMPATIBLE_SERVER_API_VERSIONS}. -
Fix trace profiling query time range condition.
-
Add named ThreadFactory to all
Executors.newXxx()calls to replace anonymouspool-N-thread-Mthread names with meaningful names for easier thread dump analysis. Complete OAP server thread inventory (counts on an 8-core machine, exporters and JDBC are optional):Catalog Thread Name Count Policy Partitions Data Pipeline BatchQueue-METRICS_L1_AGGREGATION-N8 cpuCores(1.0)~330 adaptive Data Pipeline BatchQueue-METRICS_L2_PERSISTENCE-N3 cpuCoresWithBase(1, 0.25)~330 adaptive Data Pipeline BatchQueue-TOPN_PERSISTENCE-N1 fixed(1)~4 adaptive Data Pipeline BatchQueue-GRPC_REMOTE_{host}_{port}-N1 per peer fixed(1)fixed(1)Data Pipeline BatchQueue-EXPORTER_GRPC_METRICS-N1 fixed(1)fixed(1)Data Pipeline BatchQueue-EXPORTER_KAFKA_TRACE-N1 fixed(1)fixed(1)Data Pipeline BatchQueue-EXPORTER_KAFKA_LOG-N1 fixed(1)fixed(1)Data Pipeline BatchQueue-JDBC_ASYNC_BATCH_PERSISTENT-N4 (configurable) fixed(N)fixed(N)Scheduler RemoteClientManager1 scheduled — Scheduler PersistenceTimer1 scheduled — Scheduler PersistenceTimer-prepare-N2 (configurable) fixed pool — Scheduler DataTTLKeeper1 scheduled — Scheduler CacheUpdateTimer1 scheduled — Scheduler HierarchyAutoMatching1 scheduled — Scheduler WatermarkWatcher1 scheduled — Scheduler AlarmCore1 scheduled — Scheduler HealthChecker1 scheduled — Scheduler EndpointUriRecognition1 (conditional) scheduled — Scheduler FileChangeMonitor1 scheduled — Scheduler BanyanDB-ChannelManager1 scheduled — Scheduler GRPCClient-HealthCheck-{host}:{port}1 per client scheduled — Scheduler EBPFProfiling-Nconfigurable fixed pool — -
Fix BanyanDB time range overflow in profile thread snapshot query.
-
BrowserErrorLog, OAP Server generated UUID to replace the original client side ID, because Browser scripts can't guarantee generated IDs are globally unique. -
MQE: fix multiple labeled metric query and ensure no results are returned if no label value combinations match.
-
Fix
BrowserErrorLogBanyanDB storage query order. -
BanyanDB Client: Property query supportOrder By. -
MQE: trim the label values condition for the labeled metrics query to enhance the readability.
-
PromQL service: fix time parse issue when using RFC3339 time format for querying.
-
Envoy metrics service receiver: support adapter listener metrics.
-
Envoy metrics service receiver: support config MAL rules files.
-
Fix
HttpAlarmCallbackcreating a newHttpClienton every alarmpost()call, leaking NIO selector threads. Replace with a shared static singleton. -
Add
SharedKubernetesClientsingleton inlibrary-kubernetes-supportto replace 9 separateKubernetesClientBuilder().build()calls across 7 files. FixesKubernetesCoordinatorclient leak (never closed, NIO selector thread persisted). UsesKubernetesHttpClientFactorywith virtual threads on JDK 25+ or a single fixed executor thread on JDK <25. -
Reduce Armeria HTTP server event loop threads. All 7 HTTP servers now share one event loop group instead of each creating their own (Armeria default:
cores * 2per server = 140 on 10-core). Event loop:min(5, cores)shared — non-blocking I/O multiplexing needs few threads. Blocking executor: JDK 25+ uses virtual threads; JDK <25 keeps Armeria's default cached pool (up to 200 on-demand threads) because HTTP handlers block on long storage/DB queries. -
Add the spring-ai components and the GenAI layer.
-
Bump up netty to 4.2.10.Final.
-
Bump up log4j to 2.25.3 and jackson to 2.18.5.
-
Remove PowerMock dependency. Replace
WhiteboxwithReflectUtil(standard Java reflection +sun.misc.Unsafefor final fields) across all modules to support JDK 25+. -
Support TraceQL and Tempo API for Zipkin trace query.
-
Remove
initExpfrom MAL configuration. It was an internal Groovy startup validation mechanism, not an end-user feature. The v2 ANTLR4 compiler performs fail-fast validation at startup natively. -
Update hierarchy rule documentation:
auto-matching-rulesinhierarchy-definition.ymlno longer use Groovy scripts. Rules now use a dedicated expression grammar supporting property access, String methods, if/else, comparisons, and logical operators. All shipped rules are fully compatible. -
Activate
otlp-traceshandler inreceiver-otelby default.
- Fix the missing icon in new native trace view.
- Enhance the alert page to show the recovery time of resolved alerts.
- Implement a common pagination component.
- Fix validation guard for router.
- Add the
coldStageto theDurationfor queries. - Optimize the pages theme.
- Fix incorrect virtual service names.
- Add benchmark selection into banyanDB storage documentation.
- Fix progressive TTL doc for banyanDB.
- Restructure
docs/README.mdfor better navigation with high-level documentation overview. - Move Marketplace as a top-level menu section with Overview introduction in
menu.yml. - Polish
marketplace.mdas the overview page for all out-of-box monitoring features. - Add "What's Next" section to Quick Start docs guiding users to Marketplace.
- Restructure agent compatibility page with OAP 10.x focus and clearer format for legacy versions.
- Remove outdated FAQ docs (v3, v6 upgrade guides and 7.x metrics issue).
- Remove "since 7/8/9.x" version statements from documentation as features are standard in 10.x.
All issues and pull requests are here