Conversation
Codeowners resolved as |
09ce55b to
0c722ba
Compare
This comment has been minimized.
This comment has been minimized.
Performance SLOsComparing candidate dubloom/mlflow-integration (e6f19d9) with baseline main (c48139e) 📈 Performance Regressions (2 suites)📈 iastaspects - 118/118✅ add_aspectTime: ✅ 103.714µs (SLO: <130.000µs 📉 -20.2%) vs baseline: +2.5% Memory: ✅ 43.470MB (SLO: <46.000MB -5.5%) vs baseline: +4.7% ✅ add_inplace_aspectTime: ✅ 102.031µs (SLO: <130.000µs 📉 -21.5%) vs baseline: +0.4% Memory: ✅ 43.411MB (SLO: <46.000MB -5.6%) vs baseline: +4.9% ✅ add_inplace_noaspectTime: ✅ 28.366µs (SLO: <40.000µs 📉 -29.1%) vs baseline: +0.7% Memory: ✅ 43.411MB (SLO: <46.000MB -5.6%) vs baseline: +4.8% ✅ add_noaspectTime: ✅ 49.093µs (SLO: <70.000µs 📉 -29.9%) vs baseline: +0.5% Memory: ✅ 43.391MB (SLO: <46.000MB -5.7%) vs baseline: +4.9% ✅ bytearray_aspectTime: ✅ 254.477µs (SLO: <400.000µs 📉 -36.4%) vs baseline: +1.1% Memory: ✅ 43.490MB (SLO: <46.000MB -5.5%) vs baseline: +5.0% ✅ bytearray_extend_aspectTime: ✅ 634.075µs (SLO: <800.000µs 📉 -20.7%) vs baseline: -0.5% Memory: ✅ 43.372MB (SLO: <46.000MB -5.7%) vs baseline: +4.7% ✅ bytearray_extend_noaspectTime: ✅ 264.277µs (SLO: <400.000µs 📉 -33.9%) vs baseline: -0.3% Memory: ✅ 43.372MB (SLO: <46.000MB -5.7%) vs baseline: +4.7% ✅ bytearray_noaspectTime: ✅ 138.321µs (SLO: <300.000µs 📉 -53.9%) vs baseline: +1.6% Memory: ✅ 43.352MB (SLO: <46.000MB -5.8%) vs baseline: +4.8% ✅ bytes_aspectTime: ✅ 217.847µs (SLO: <300.000µs 📉 -27.4%) vs baseline: -0.7% Memory: ✅ 43.411MB (SLO: <46.000MB -5.6%) vs baseline: +4.8% ✅ bytes_noaspectTime: ✅ 134.360µs (SLO: <200.000µs 📉 -32.8%) vs baseline: +0.6% Memory: ✅ 43.391MB (SLO: <46.000MB -5.7%) vs baseline: +4.6% ✅ bytesio_aspectTime: ✅ 3.752ms (SLO: <5.000ms 📉 -25.0%) vs baseline: -0.4% Memory: ✅ 43.411MB (SLO: <46.000MB -5.6%) vs baseline: +4.9% ✅ bytesio_noaspectTime: ✅ 316.577µs (SLO: <420.000µs 📉 -24.6%) vs baseline: +0.7% Memory: ✅ 43.490MB (SLO: <46.000MB -5.5%) vs baseline: +5.0% ✅ capitalize_aspectTime: ✅ 89.225µs (SLO: <300.000µs 📉 -70.3%) vs baseline: +0.5% Memory: ✅ 43.372MB (SLO: <46.000MB -5.7%) ✅ capitalize_noaspectTime: ✅ 255.524µs (SLO: <300.000µs 📉 -14.8%) vs baseline: +1.1% Memory: ✅ 43.431MB (SLO: <46.000MB -5.6%) vs baseline: +4.8% ✅ casefold_aspectTime: ✅ 88.837µs (SLO: <500.000µs 📉 -82.2%) vs baseline: -0.1% Memory: ✅ 43.332MB (SLO: <46.000MB -5.8%) vs baseline: +4.7% ✅ casefold_noaspectTime: ✅ 308.621µs (SLO: <500.000µs 📉 -38.3%) vs baseline: +0.4% Memory: ✅ 43.450MB (SLO: <46.000MB -5.5%) vs baseline: +5.0% ✅ decode_aspectTime: ✅ 86.753µs (SLO: <100.000µs 📉 -13.2%) vs baseline: ~same Memory: ✅ 43.529MB (SLO: <46.000MB -5.4%) vs baseline: +5.2% ✅ decode_noaspectTime: ✅ 153.102µs (SLO: <210.000µs 📉 -27.1%) vs baseline: ~same Memory: ✅ 43.431MB (SLO: <46.000MB -5.6%) vs baseline: +4.9% ✅ encode_aspectTime: ✅ 84.362µs (SLO: <200.000µs 📉 -57.8%) vs baseline: +0.3% Memory: ✅ 43.450MB (SLO: <46.000MB -5.5%) vs baseline: +4.7% ✅ encode_noaspectTime: ✅ 140.389µs (SLO: <200.000µs 📉 -29.8%) vs baseline: +0.4% Memory: ✅ 43.490MB (SLO: <46.000MB -5.5%) vs baseline: +4.9% ✅ format_aspectTime: ✅ 14.728ms (SLO: <19.200ms 📉 -23.3%) vs baseline: +0.7% Memory: ✅ 43.620MB (SLO: <46.000MB -5.2%) vs baseline: +4.9% ✅ format_map_aspectTime: ✅ 16.483ms (SLO: <21.500ms 📉 -23.3%) vs baseline: -0.3% Memory: ✅ 43.660MB (SLO: <46.000MB -5.1%) vs baseline: +5.1% ✅ format_map_noaspectTime: ✅ 367.904µs (SLO: <500.000µs 📉 -26.4%) vs baseline: -0.8% Memory: ✅ 43.411MB (SLO: <46.000MB -5.6%) vs baseline: +4.7% ✅ format_noaspectTime: ✅ 304.160µs (SLO: <500.000µs 📉 -39.2%) vs baseline: -0.5% Memory: ✅ 43.293MB (SLO: <46.000MB -5.9%) vs baseline: +4.7% ✅ index_aspectTime: ✅ 128.934µs (SLO: <300.000µs 📉 -57.0%) vs baseline: +5.7% Memory: ✅ 43.431MB (SLO: <46.000MB -5.6%) vs baseline: +4.8% ✅ index_noaspectTime: ✅ 40.229µs (SLO: <300.000µs 📉 -86.6%) vs baseline: -0.1% Memory: ✅ 43.490MB (SLO: <46.000MB -5.5%) ✅ join_aspectTime: ✅ 210.671µs (SLO: <300.000µs 📉 -29.8%) vs baseline: -0.9% Memory: ✅ 43.313MB (SLO: <46.000MB -5.8%) vs baseline: +4.7% ✅ join_noaspectTime: ✅ 141.618µs (SLO: <300.000µs 📉 -52.8%) vs baseline: -0.4% Memory: ✅ 43.391MB (SLO: <46.000MB -5.7%) vs baseline: +4.9% ✅ ljust_aspectTime: ✅ 587.468µs (SLO: <700.000µs 📉 -16.1%) vs baseline: 📈 +17.0% Memory: ✅ 43.600MB (SLO: <46.000MB -5.2%) vs baseline: +5.0% ✅ ljust_noaspectTime: ✅ 261.286µs (SLO: <300.000µs 📉 -12.9%) vs baseline: +1.2% Memory: ✅ 43.332MB (SLO: <46.000MB -5.8%) vs baseline: +4.5% ✅ lower_aspectTime: ✅ 301.320µs (SLO: <500.000µs 📉 -39.7%) vs baseline: +1.7% Memory: ✅ 43.431MB (SLO: <46.000MB -5.6%) vs baseline: +4.8% ✅ lower_noaspectTime: ✅ 237.691µs (SLO: <300.000µs 📉 -20.8%) vs baseline: +0.5% Memory: ✅ 43.332MB (SLO: <46.000MB -5.8%) vs baseline: +4.7% ✅ lstrip_aspectTime: ✅ 0.269ms (SLO: <3.000ms 📉 -91.0%) vs baseline: -1.2% Memory: ✅ 43.293MB (SLO: <46.000MB -5.9%) vs baseline: +4.3% ✅ lstrip_noaspectTime: ✅ 0.177ms (SLO: <3.000ms 📉 -94.1%) vs baseline: -0.9% Memory: ✅ 43.332MB (SLO: <46.000MB -5.8%) vs baseline: +4.7% ✅ modulo_aspectTime: ✅ 14.371ms (SLO: <18.750ms 📉 -23.4%) vs baseline: ~same Memory: ✅ 43.431MB (SLO: <46.000MB -5.6%) vs baseline: +4.4% ✅ modulo_aspect_for_bytearray_bytearrayTime: ✅ 14.800ms (SLO: <19.350ms 📉 -23.5%) vs baseline: -1.0% Memory: ✅ 43.521MB (SLO: <46.000MB -5.4%) vs baseline: +4.7% ✅ modulo_aspect_for_bytesTime: ✅ 14.488ms (SLO: <18.900ms 📉 -23.3%) vs baseline: ~same Memory: ✅ 43.542MB (SLO: <46.000MB -5.3%) vs baseline: +4.8% ✅ modulo_aspect_for_bytes_bytearrayTime: ✅ 14.688ms (SLO: <19.150ms 📉 -23.3%) vs baseline: -0.4% Memory: ✅ 43.549MB (SLO: <46.000MB -5.3%) vs baseline: +4.9% ✅ modulo_noaspectTime: ✅ 0.367ms (SLO: <3.000ms 📉 -87.8%) vs baseline: ~same Memory: ✅ 43.490MB (SLO: <46.000MB -5.5%) vs baseline: +5.0% ✅ replace_aspectTime: ✅ 18.428ms (SLO: <24.000ms 📉 -23.2%) vs baseline: -0.8% Memory: ✅ 43.546MB (SLO: <46.000MB -5.3%) vs baseline: +4.9% ✅ replace_noaspectTime: ✅ 280.120µs (SLO: <300.000µs -6.6%) vs baseline: +0.4% Memory: ✅ 43.490MB (SLO: <46.000MB -5.5%) vs baseline: +5.0% ✅ repr_aspectTime: ✅ 309.891µs (SLO: <420.000µs 📉 -26.2%) vs baseline: -0.3% Memory: ✅ 43.293MB (SLO: <46.000MB -5.9%) vs baseline: +4.6% ✅ repr_noaspectTime: ✅ 46.744µs (SLO: <90.000µs 📉 -48.1%) vs baseline: ~same Memory: ✅ 43.313MB (SLO: <46.000MB -5.8%) vs baseline: +4.7% ✅ rstrip_aspectTime: ✅ 383.632µs (SLO: <500.000µs 📉 -23.3%) vs baseline: +0.2% Memory: ✅ 43.403MB (SLO: <46.000MB -5.6%) vs baseline: +4.7% ✅ rstrip_noaspectTime: ✅ 186.009µs (SLO: <300.000µs 📉 -38.0%) vs baseline: +0.6% Memory: ✅ 43.450MB (SLO: <46.000MB -5.5%) vs baseline: +5.1% ✅ slice_aspectTime: ✅ 185.297µs (SLO: <300.000µs 📉 -38.2%) vs baseline: +1.2% Memory: ✅ 43.391MB (SLO: <46.000MB -5.7%) vs baseline: +5.0% ✅ slice_noaspectTime: ✅ 54.308µs (SLO: <90.000µs 📉 -39.7%) vs baseline: +0.3% Memory: ✅ 43.411MB (SLO: <46.000MB -5.6%) vs baseline: +5.0% ✅ stringio_aspectTime: ✅ 4.410ms (SLO: <5.000ms 📉 -11.8%) vs baseline: 📈 +14.9% Memory: ✅ 43.568MB (SLO: <46.000MB -5.3%) vs baseline: +5.4% ✅ stringio_noaspectTime: ✅ 344.631µs (SLO: <500.000µs 📉 -31.1%) vs baseline: +0.2% Memory: ✅ 43.411MB (SLO: <46.000MB -5.6%) vs baseline: +4.5% ✅ strip_aspectTime: ✅ 270.280µs (SLO: <350.000µs 📉 -22.8%) vs baseline: +0.3% Memory: ✅ 43.391MB (SLO: <46.000MB -5.7%) vs baseline: +4.6% ✅ strip_noaspectTime: ✅ 176.221µs (SLO: <240.000µs 📉 -26.6%) vs baseline: +0.3% Memory: ✅ 43.450MB (SLO: <46.000MB -5.5%) vs baseline: +4.9% ✅ swapcase_aspectTime: ✅ 333.406µs (SLO: <500.000µs 📉 -33.3%) vs baseline: -0.3% Memory: ✅ 43.352MB (SLO: <46.000MB -5.8%) vs baseline: +4.7% ✅ swapcase_noaspectTime: ✅ 273.542µs (SLO: <400.000µs 📉 -31.6%) vs baseline: -0.2% Memory: ✅ 43.470MB (SLO: <46.000MB -5.5%) vs baseline: +5.2% ✅ title_aspectTime: ✅ 323.657µs (SLO: <500.000µs 📉 -35.3%) vs baseline: ~same Memory: ✅ 43.352MB (SLO: <46.000MB -5.8%) vs baseline: +4.9% ✅ title_noaspectTime: ✅ 260.490µs (SLO: <400.000µs 📉 -34.9%) vs baseline: -0.9% Memory: ✅ 43.313MB (SLO: <46.000MB -5.8%) vs baseline: +4.5% ✅ translate_aspectTime: ✅ 498.961µs (SLO: <700.000µs 📉 -28.7%) vs baseline: +2.2% Memory: ✅ 43.431MB (SLO: <46.000MB -5.6%) vs baseline: +5.0% ✅ translate_noaspectTime: ✅ 423.877µs (SLO: <500.000µs 📉 -15.2%) vs baseline: -1.4% Memory: ✅ 43.411MB (SLO: <46.000MB -5.6%) vs baseline: +5.1% ✅ upper_aspectTime: ✅ 294.970µs (SLO: <500.000µs 📉 -41.0%) vs baseline: -0.6% Memory: ✅ 43.293MB (SLO: <46.000MB -5.9%) vs baseline: +4.5% ✅ upper_noaspectTime: ✅ 236.635µs (SLO: <400.000µs 📉 -40.8%) vs baseline: -1.1% Memory: ✅ 43.509MB (SLO: <46.000MB -5.4%) vs baseline: +5.1% 📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 509.030µs (SLO: <700.000µs 📉 -27.3%) vs baseline: 📈 +21.6% Memory: ✅ 43.372MB (SLO: <46.000MB -5.7%) vs baseline: +4.9% ✅ ospathbasename_noaspectTime: ✅ 429.673µs (SLO: <700.000µs 📉 -38.6%) vs baseline: -0.3% Memory: ✅ 43.313MB (SLO: <46.000MB -5.8%) vs baseline: +4.9% ✅ ospathjoin_aspectTime: ✅ 625.053µs (SLO: <700.000µs 📉 -10.7%) vs baseline: -0.5% Memory: ✅ 43.450MB (SLO: <46.000MB -5.5%) vs baseline: +5.1% ✅ ospathjoin_noaspectTime: ✅ 631.082µs (SLO: <700.000µs -9.8%) vs baseline: -0.7% Memory: ✅ 43.273MB (SLO: <46.000MB -5.9%) vs baseline: +4.8% ✅ ospathnormcase_aspectTime: ✅ 346.105µs (SLO: <700.000µs 📉 -50.6%) vs baseline: -1.0% Memory: ✅ 43.372MB (SLO: <46.000MB -5.7%) vs baseline: +5.0% ✅ ospathnormcase_noaspectTime: ✅ 357.209µs (SLO: <700.000µs 📉 -49.0%) vs baseline: -0.6% Memory: ✅ 43.372MB (SLO: <46.000MB -5.7%) vs baseline: +4.9% ✅ ospathsplit_aspectTime: ✅ 488.122µs (SLO: <700.000µs 📉 -30.3%) vs baseline: ~same Memory: ✅ 43.352MB (SLO: <46.000MB -5.8%) vs baseline: +4.8% ✅ ospathsplit_noaspectTime: ✅ 497.526µs (SLO: <700.000µs 📉 -28.9%) vs baseline: -0.2% Memory: ✅ 43.352MB (SLO: <46.000MB -5.8%) vs baseline: +5.0% ✅ ospathsplitdrive_aspectTime: ✅ 374.413µs (SLO: <700.000µs 📉 -46.5%) vs baseline: ~same Memory: ✅ 43.332MB (SLO: <46.000MB -5.8%) vs baseline: +4.8% ✅ ospathsplitdrive_noaspectTime: ✅ 72.906µs (SLO: <700.000µs 📉 -89.6%) vs baseline: -1.1% Memory: ✅ 43.352MB (SLO: <46.000MB -5.8%) vs baseline: +4.9% ✅ ospathsplitext_aspectTime: ✅ 456.025µs (SLO: <700.000µs 📉 -34.9%) vs baseline: +0.2% Memory: ✅ 43.293MB (SLO: <46.000MB -5.9%) vs baseline: +4.9% ✅ ospathsplitext_noaspectTime: ✅ 463.656µs (SLO: <700.000µs 📉 -33.8%) vs baseline: ~same Memory: ✅ 43.411MB (SLO: <46.000MB -5.6%) vs baseline: +5.1%
|
7e46b0e to
d2981c5
Compare
660b1f9 to
e8f165f
Compare
yshapiro-57
left a comment
There was a problem hiding this comment.
@dubloom I think this looks great, and I would be fine with merging this and addressing the following comments in separate PRs later:
For logging metrics, MLFlow also has a log_batch function that allows the user to log multiple metrics at once, oftentimes for one and the same step. It would be nice to update the step during log_batch as well as log_metric.
Also, I think that metrics are usually logged at the end of the step, not at the beginning, so it would be nice to expose some function along the lines of
from ddtrace.contrib.mlflow import set_step
set_step(19)
to give the users a way to explicitly start a step and let us know that they are starting it.
Can you run it with a script that does mlflow.set_tracking_uri("https://dd.datad0g.com/api/unstable/mlflow-tracking-server") , so that we can connect the run IDs on spans to a run ID that we store in our Postgres DB? I tried doing that today but ran into dev environment issues trying to get it to work
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9e9b16db30
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| "trace_id": 0, | ||
| "span_id": 1, | ||
| "parent_id": 0, | ||
| "type": "", |
| "type": "", | ||
| "error": 1, | ||
| "meta": { | ||
| "_dd.p.dm": "-0", |
There was a problem hiding this comment.
all of these spans are missing the component :)
| { | ||
| "name": "mlflow.step", | ||
| "service": "tests.contrib.mlflow", | ||
| "resource": "mlflow.step", |
There was a problem hiding this comment.
Is there anything specific about this mlflow.step action that we can split off into some kind of resource?
| mlflow: Adds instrumentation support for ``mlflow>=2.11.0`. See the | ||
| ``mlflow<https://ddtrace.readthedocs.io/en/stable/integrations.html#mlflow>`` | ||
| documentation for more information. |
There was a problem hiding this comment.
nit: I think it would be clearer to add tracing to this release note to differentiate it from adding support to something like MLObs:
| mlflow: Adds instrumentation support for ``mlflow>=2.11.0`. See the | |
| ``mlflow<https://ddtrace.readthedocs.io/en/stable/integrations.html#mlflow>`` | |
| documentation for more information. | |
| tracing: Adds instrumentation support for ``mlflow>=2.11.0`. See the | |
| ``mlflow<https://ddtrace.readthedocs.io/en/stable/integrations.html#mlflow>`` | |
| documentation for more information. |
| try: | ||
| return wrapped(*args, **kwargs) | ||
| except Exception: | ||
| # error happening when trying to end thte run |
There was a problem hiding this comment.
nit:
| # error happening when trying to end thte run | |
| # error happening when trying to end the run |
| @@ -0,0 +1,20 @@ | |||
| """ | |||
| The MLflow integration instruments the ``mlflow`` package. | |||
There was a problem hiding this comment.
I think it would be useful to state what parts of the MLflow the instrumentation covers since the spans in this PR are focused on "runs" and "steps", when their docs covers a wide range of functions: https://github.com/mlflow/mlflow
| { | ||
| "name": "mflow.run", | ||
| "service": "tests.contrib.mlflow", | ||
| "resource": "mflow.run", |
There was a problem hiding this comment.
I wonder if run_name is low cardinality enough to use as a resource name here, https://github.com/mlflow/mlflow/blob/b61eb59893f9ae3814f7f69fb2c0b2fb647fd371/mlflow/tracking/fluent.py#L360?
The usefulness is that if you are querying and trying to alert on this trace metric, you can get access to how long a specific run took.
Right now the trace metrics (and subsequently the Service Page) will only give high level information and the user will need to rely on the Trace Explorer to derive meaning from their runs.
There was a problem hiding this comment.
What are you calling low cardinality ?
Because it is optional and otherwise automatically generated by mlflow.
Also, it if helps, the end product of this integration is not APM
| finally: | ||
| unpatch() | ||
|
|
||
|
|
There was a problem hiding this comment.
What happens when the user creates a nested run?
Based on their code it looks like this is one of the usage patterns: https://github.com/mlflow/mlflow/blob/b61eb59893f9ae3814f7f69fb2c0b2fb647fd371/mlflow/tracking/fluent.py#L360 .
I think this would be a good test.
Important
This PR introduces an
mlflowintegration, changes you can also find in #16685.This is expected as the goal, as the two PRs worked on something different for mlflow but targeting the same product. We'll do the work to properly rebase based on the which one of the two PRs will be merged first
Description
Add
mlflowintegration todd-trace-py.This integration enables:
Tracing MLflow runs
A span is created when a run starts and finished when the run ends.
Tracing MLflow steps
In MLflow, steps are defined when calling
mlflow.log_metric("key", step=1). Since step increments depend on user calls, we cannot determine step boundaries in advance.To handle this:
log_metricwith astepparameter, the current step span is finished and a new one is created.This ensures that spans created during a step have the correct parent.
If an user defines steps, a final step span will represent the period between the last explicit step and the end of the run (because of auto creation). This span does not correspond to a real MLflow step, so it is named
mlflow.tailto indicate it can be ignored.Attaching metrics and parameters to step spans
Calls to
log_metric()andlog_param()automatically attach the correspondingkey:valuepair to the current step span (as a metric or tag, respectively).Adding
run_idto all spansThe
run_idis added to every span created during a run.Log correlation
dd.mlflow.run_id"is automatically injected into logs to enable correlation.Testing
Tests were added under
tests/contrib/mlflowRisks
None
Additional Notes
mlflowinstrumentation is not designed to be primarily used by APM.