open-policy-agent · anivar · Sep 23, 2025 · Sep 23, 2025 · Sep 23, 2025 · Sep 25, 2025
diff --git a/docs/docs/monitoring.md b/docs/docs/monitoring.md
@@ -20,10 +20,17 @@ for all OpenTelemetry-related configurables.
 
 ## Prometheus
 
-OPA exposes an HTTP endpoint that can be used to collect performance metrics
+OPA exposes an HTTP endpoint at `/metrics` that can be used to collect performance metrics
 for all API calls. The Prometheus endpoint is enabled by default when you run
 OPA as a server.
 
+OPA provides two ways to access performance metrics:
+
+1. **System-wide metrics** via the `/metrics` Prometheus endpoint - Instance-level metrics across all OPA operations
+2. **Per-query metrics** via API responses with `?metrics=true` - Metrics for individual query executions
+
+These serve different purposes: system metrics for OPA instance monitoring and alerting, per-query metrics for debugging and optimization.
+
 You can enable metric collection from OPA with the following `prometheus.yml` config:
 
 ```yaml
@@ -86,6 +93,24 @@ When Prometheus is enabled in the status plugin (see [Configuration](./configura
 | last_success_bundle_request    | gauge       | Last successful bundle request in UNIX nanoseconds.    | STABLE |
 | bundle_loading_duration_ns     | histogram   | A histogram of duration for bundle loading.            | STABLE |
 
+## Available Metrics
+
+The Prometheus `/metrics` endpoint exposes the following instance-level metrics:
+
+- **URL**: `http://localhost:8181/metrics` (default configuration)
+- **Method**: HTTP GET
- **Method**: HTTP GET
- **Method**: HTTP GET
+- **Format**: Prometheus text format
- **Format**: Prometheus text format
- **Format**: Prometheus text format
+- **Contents**: Instance-level counters, timers, histograms, Go runtime metrics
- **Contents**: Instance-level counters, timers, histograms, Go runtime metrics
+- **Data**: HTTP request metrics (counters, latencies, status codes), Go runtime internals (memory allocation, garbage collection, goroutines etc.)
- **Contents**: Instance-level counters, timers, histograms, Go runtime metrics
+- **Data**: HTTP request metrics (counters, latencies, status codes), Go runtime internals (memory allocation, garbage collection, goroutines etc.)
+- **Use case**: Monitoring dashboards, alerting, performance trends
- **Use case**: Monitoring dashboards, alerting, performance trends
- **Use case**: Monitoring dashboards, alerting, performance trends
+
+### Additional Resources
+
+- **Per-query metrics**: See [REST API Performance Metrics](./rest-api#performance-metrics) for debugging individual queries
+- **Policy performance**: See [Policy Performance](./policy-performance#performance-metrics) for optimization guidance
+- **Status API**: See [Status API](./management-status) for metrics reporting via status updates
+- **Decision logs**: See [Decision Logs](./management-decision-logs) for including metrics in decision logs
+- **CLI tools**: See [opa eval](./cli#eval) and [opa bench](./cli#bench) for command-line metric collection
+
 ## Health Checks
 
 OPA exposes a `/health` API endpoint that can be used to perform health checks.

diff --git a/docs/docs/policy-performance.md b/docs/docs/policy-performance.md
@@ -977,6 +977,66 @@ This feature can be enabled for `opa run`, `opa eval`, and `opa bench` by settin
 
 Users are recommended to do performance testing to determine the optimal configuration for their use case.
 
+## Performance Metrics
+
+OPA exposes metrics for policy evaluation performance. These are available through:
+
+- **System-wide metrics** at the `/metrics` Prometheus endpoint
+- **Per-query metrics** with individual API responses when `?metrics=true` is specified
+
+See [Monitoring](./monitoring#metrics-overview) for more details.
+
+### Common Built-in Function Metrics
+
+#### HTTP Built-ins
+
+`http.send` metrics help identify I/O bottlenecks:
+
+- `timer_rego_builtin_http_send_ns` - Total time spent in http.send calls
+- `counter_rego_builtin_http_send_interquery_cache_hits` - Inter-query cache hits
+- `counter_rego_builtin_http_send_network_requests` - Actual network requests made
+
+High cache hit ratios indicate effective caching and reduced network overhead.
+
+#### Regex Built-ins
+
+Regex operation metrics help optimize pattern matching:
+
+- `timer_rego_builtin_regex_interquery_ns` - Time spent in regex operations
+- `counter_rego_builtin_regex_interquery_cache_hits` - Regex pattern cache hits
+- `counter_rego_builtin_regex_interquery_value_cache_hits` - Regex value cache hits
+
+Effective regex caching improves performance when the same patterns are used repeatedly.
+
+### Core Query Metrics
+
+Basic query evaluation phases:
+
+- `timer_rego_query_parse_ns` - Time parsing the query string
+- `timer_rego_query_compile_ns` - Time compiling the query
+- `timer_rego_query_eval_ns` - Time executing the compiled query
+
+Compilation time often dominates in complex policies.
+
+### High-Level Metrics
+
+Server-level metrics for overall performance:
+
+- `timer_server_handler_ns` - Total request handler execution time
+- `counter_server_query_cache_hit` - Server-level query cache hits
+
+### Using Metrics for Optimization
+
+1. **Query phases**: Compare parse, compile, and eval times to identify bottlenecks
+2. **Cache effectiveness**: Low cache hit rates suggest tuning opportunities
+3. **I/O bottlenecks**: High `http.send` network request counts indicate caching issues
+4. **Pattern matching**: Monitor regex cache hits for frequently used patterns
+
+Access metrics via:
+- REST API: Add `?metrics=true` to policy evaluation requests
+- CLI: Use `--metrics` flag with `opa eval` or `opa bench`
+- Prometheus: See [Monitoring](./monitoring#prometheus) for system-wide metrics
+
 ## Key Takeaways
 
 For high-performance use cases:
@@ -987,3 +1047,4 @@ For high-performance use cases:
 - Write your policies with indexed statements so that [rule-indexing](https://blog.openpolicyagent.org/optimizing-opa-rule-indexing-59f03f17caf3) is effective.
 - Use the profiler to help identify portions of the policy that would benefit the most from improved performance.
 - Use the benchmark tools to help get real world timing data and detect policy performance changes.
+- Monitor performance metrics to track optimization impact and identify bottlenecks.
diff --git a/docs/docs/policy-reference/builtins/glob.mdx b/docs/docs/policy-reference/builtins/glob.mdx
@@ -27,3 +27,13 @@ The following table shows examples of how `glob.match` works:
 | `output := glob.match("{cat,bat,[fr]at}", [], "bat")`            | `true`   | A glob with pattern-alternatives matchers.    |
 | `output := glob.match("{cat,bat,[fr]at}", [], "rat")`            | `true`   | A glob with pattern-alternatives matchers.    |
 | `output := glob.match("{cat,bat,[fr]at}", [], "at")`             | `false`  | A glob with pattern-alternatives matchers.    |
+
+## Performance Metrics
+
+When OPA is configured with metrics enabled, `glob.match` operations expose the following metrics in per-query metrics (accessible when `?metrics=true` is specified in API requests):
+
+| Metric | Description |
+| ------ | ----------- |
+| `counter_rego_builtin_glob_interquery_value_cache_hits` | Number of inter-query cache hits for compiled glob patterns |
+
+Effective glob pattern caching improves performance when the same patterns are used repeatedly across queries. High cache hit ratios indicate that glob compilation overhead is being minimized through caching.
-Effective glob pattern caching improves performance when the same patterns are used repeatedly across queries. High cache hit ratios indicate that glob compilation overhead is being minimized through caching.
+Caching of parsed glob patterns improves performance when the same patterns are used repeatedly across queries. High cache hit ratios indicate that glob compilation overhead is being minimized through caching.
-Effective glob pattern caching improves performance when the same patterns are used repeatedly across queries. High cache hit ratios indicate that glob compilation overhead is being minimized through caching.
+Caching of parsed glob patterns improves performance when the same patterns are used repeatedly across queries. High cache hit ratios indicate that glob compilation overhead is being minimized through caching.
diff --git a/docs/docs/policy-reference/builtins/http.mdx b/docs/docs/policy-reference/builtins/http.mdx
@@ -113,3 +113,15 @@ The table below shows examples of calling `http.send`:
 | Files containing TLS material                 | `http.send({"method": "get", "url": "https://127.0.0.1:65331", "tls_ca_cert_file": "testdata/ca.pem", "tls_client_cert_file": "testdata/client-cert.pem", "tls_client_key_file": "testdata/client-key.pem"})`     |
 | Environment variables containing TLS material | `http.send({"method": "get", "url": "https://127.0.0.1:65360", "tls_ca_cert_env_variable": "CLIENT_CA_ENV", "tls_client_cert_env_variable": "CLIENT_CERT_ENV", "tls_client_key_env_variable": "CLIENT_KEY_ENV"})` |
 | Unix Socket URL Format                        | `http.send({"method": "get", "url": "unix://localhost/?socket=%F2path%F2file.socket"})`                                                                                                                           |
+
+## Performance Metrics
+
+When OPA is configured with metrics enabled, `http.send` operations expose the following metrics in per-query metrics (accessible when `?metrics=true` is specified in API requests):
+
+| Metric | Description |
+| ------ | ----------- |
+| `timer_rego_builtin_http_send_ns` | Total time spent in `http.send` calls during query evaluation |
+| `counter_rego_builtin_http_send_interquery_cache_hits` | Number of inter-query cache hits for `http.send` requests |
+| `counter_rego_builtin_http_send_network_requests` | Number of actual network requests made by `http.send` |
+
+High cache hit ratios indicate effective caching and reduced network overhead. These metrics help identify I/O bottlenecks in policies that make external HTTP requests.
diff --git a/docs/docs/policy-reference/builtins/regex.mdx b/docs/docs/policy-reference/builtins/regex.mdx
@@ -110,3 +110,13 @@ overlap. This can be useful when using patterns to define permissions or access
 rules. The function returns `true` if the two patterns overlap and `false` otherwise.
 
 <PlaygroundExample dir={require.context('../_examples/regex/globs_match/role_patterns')} />
+
+## Performance Metrics
+
+When OPA is configured with metrics enabled, regex operations expose the following metrics in per-query metrics (accessible when `?metrics=true` is specified in API requests):
+
+| Metric | Description |
+| ------ | ----------- |
+| `counter_rego_builtin_regex_interquery_value_cache_hits` | Number of regex cache hits for compiled patterns |
+
+Effective regex caching improves performance when the same patterns are used repeatedly. High cache hit ratios indicate that regex compilation overhead is being minimized through caching.
-Effective regex caching improves performance when the same patterns are used repeatedly. High cache hit ratios indicate that regex compilation overhead is being minimized through caching.
+Caching of parsed regular expressions improves performance when the same patterns are used repeatedly. High cache hit ratios indicate that regex compilation overhead is being minimized through caching.
-Effective regex caching improves performance when the same patterns are used repeatedly. High cache hit ratios indicate that regex compilation overhead is being minimized through caching.
+Caching of parsed regular expressions improves performance when the same patterns are used repeatedly. High cache hit ratios indicate that regex compilation overhead is being minimized through caching.
diff --git a/docs/docs/rest-api.md b/docs/docs/rest-api.md
@@ -2333,9 +2333,12 @@ Query instrumentation can help diagnose performance problems, however, it can
 add significant overhead to query evaluation. We recommend leaving query
 instrumentation off unless you are debugging a performance problem.
 
-When instrumentation is enabled there are several additional performance metrics
-for the compilation stages. They follow the format of `timer_compile_stage_*_ns`
-and `timer_query_compile_stage_*_ns` for the query and module compilation stages.
+When query instrumentation is enabled (`instrument=true`), the following additional detailed evaluation metrics are included:
+- **timer_eval_op_***: Various evaluation operation timers (e.g., `timer_eval_op_plug_ns`, `timer_eval_op_resolve_ns`)
+- **histogram_eval_op_***: Histograms tracking evaluation operation time distributions
- **histogram_eval_op_***: Histograms tracking evaluation operation time distributions
+- **histogram_eval_op_***: Histograms tracking time distributions of individual eval operations.
- **histogram_eval_op_***: Histograms tracking evaluation operation time distributions
+- **histogram_eval_op_***: Histograms tracking time distributions of individual eval operations.
+- **timer_rego_builtin_***: Built-in function execution times
+- **counter_rego_builtin_***: Built-in function call counts and cache hits
+- **timer_compile_stage_*_ns**: Compilation stage timers for the query and module compilation stages
 
 ## Provenance