feat(mlflow): create an auth plugin for MLFlow#16685
feat(mlflow): create an auth plugin for MLFlow#16685yshapiro-57 wants to merge 14 commits intomainfrom
Conversation
Codeowners resolved as |
Performance SLOsComparing candidate yakov.shapiro/MLOB-5384/auth-plugin (7e7960f) with baseline main (6749b36) 📈 Performance Regressions (2 suites)📈 iastaspects - 118/118✅ add_aspectTime: ✅ 105.523µs (SLO: <130.000µs 📉 -18.8%) vs baseline: +1.5% Memory: ✅ 43.411MB (SLO: <46.000MB -5.6%) vs baseline: +5.6% ✅ add_inplace_aspectTime: ✅ 101.596µs (SLO: <130.000µs 📉 -21.8%) vs baseline: -2.0% Memory: ✅ 43.568MB (SLO: <46.000MB -5.3%) vs baseline: +5.7% ✅ add_inplace_noaspectTime: ✅ 28.291µs (SLO: <40.000µs 📉 -29.3%) vs baseline: +0.1% Memory: ✅ 43.411MB (SLO: <46.000MB -5.6%) vs baseline: +5.5% ✅ add_noaspectTime: ✅ 48.786µs (SLO: <70.000µs 📉 -30.3%) vs baseline: -0.2% Memory: ✅ 43.411MB (SLO: <46.000MB -5.6%) vs baseline: +5.4% ✅ bytearray_aspectTime: ✅ 251.712µs (SLO: <400.000µs 📉 -37.1%) vs baseline: ~same Memory: ✅ 43.332MB (SLO: <46.000MB -5.8%) vs baseline: +5.3% ✅ bytearray_extend_aspectTime: ✅ 637.924µs (SLO: <800.000µs 📉 -20.3%) vs baseline: -3.2% Memory: ✅ 43.332MB (SLO: <46.000MB -5.8%) vs baseline: +4.8% ✅ bytearray_extend_noaspectTime: ✅ 264.975µs (SLO: <400.000µs 📉 -33.8%) vs baseline: -2.6% Memory: ✅ 43.411MB (SLO: <46.000MB -5.6%) vs baseline: +5.2% ✅ bytearray_noaspectTime: ✅ 135.605µs (SLO: <300.000µs 📉 -54.8%) vs baseline: -3.9% Memory: ✅ 43.509MB (SLO: <46.000MB -5.4%) vs baseline: +5.7% ✅ bytes_aspectTime: ✅ 217.078µs (SLO: <300.000µs 📉 -27.6%) vs baseline: -1.7% Memory: ✅ 43.450MB (SLO: <46.000MB -5.5%) vs baseline: +5.7% ✅ bytes_noaspectTime: ✅ 134.051µs (SLO: <200.000µs 📉 -33.0%) vs baseline: -2.4% Memory: ✅ 43.372MB (SLO: <46.000MB -5.7%) vs baseline: +5.4% ✅ bytesio_aspectTime: ✅ 3.767ms (SLO: <5.000ms 📉 -24.7%) vs baseline: -1.9% Memory: ✅ 43.431MB (SLO: <46.000MB -5.6%) vs baseline: +5.7% ✅ bytesio_noaspectTime: ✅ 315.943µs (SLO: <420.000µs 📉 -24.8%) vs baseline: -1.7% Memory: ✅ 43.450MB (SLO: <46.000MB -5.5%) vs baseline: +5.5% ✅ capitalize_aspectTime: ✅ 89.015µs (SLO: <300.000µs 📉 -70.3%) vs baseline: +0.5% Memory: ✅ 43.391MB (SLO: <46.000MB -5.7%) vs baseline: +5.4% ✅ capitalize_noaspectTime: ✅ 254.122µs (SLO: <300.000µs 📉 -15.3%) vs baseline: -2.8% Memory: ✅ 43.411MB (SLO: <46.000MB -5.6%) vs baseline: +5.6% ✅ casefold_aspectTime: ✅ 89.113µs (SLO: <500.000µs 📉 -82.2%) vs baseline: +0.3% Memory: ✅ 43.391MB (SLO: <46.000MB -5.7%) vs baseline: +5.8% ✅ casefold_noaspectTime: ✅ 313.265µs (SLO: <500.000µs 📉 -37.3%) vs baseline: -5.2% Memory: ✅ 43.509MB (SLO: <46.000MB -5.4%) vs baseline: +5.8% ✅ decode_aspectTime: ✅ 86.906µs (SLO: <100.000µs 📉 -13.1%) vs baseline: +0.2% Memory: ✅ 43.490MB (SLO: <46.000MB -5.5%) vs baseline: +5.6% ✅ decode_noaspectTime: ✅ 152.021µs (SLO: <210.000µs 📉 -27.6%) vs baseline: -1.3% Memory: ✅ 43.372MB (SLO: <46.000MB -5.7%) vs baseline: +5.4% ✅ encode_aspectTime: ✅ 84.543µs (SLO: <200.000µs 📉 -57.7%) vs baseline: +0.1% Memory: ✅ 43.411MB (SLO: <46.000MB -5.6%) vs baseline: +5.4% ✅ encode_noaspectTime: ✅ 140.469µs (SLO: <200.000µs 📉 -29.8%) vs baseline: -2.8% Memory: ✅ 43.490MB (SLO: <46.000MB -5.5%) vs baseline: +6.1% ✅ format_aspectTime: ✅ 14.680ms (SLO: <19.200ms 📉 -23.5%) vs baseline: +0.2% Memory: ✅ 43.560MB (SLO: <46.000MB -5.3%) vs baseline: +5.7% ✅ format_map_aspectTime: ✅ 16.721ms (SLO: <21.500ms 📉 -22.2%) vs baseline: +1.7% Memory: ✅ 43.591MB (SLO: <46.000MB -5.2%) vs baseline: +5.7% ✅ format_map_noaspectTime: ✅ 372.897µs (SLO: <500.000µs 📉 -25.4%) vs baseline: -1.7% Memory: ✅ 43.411MB (SLO: <46.000MB -5.6%) vs baseline: +5.5% ✅ format_noaspectTime: ✅ 306.226µs (SLO: <500.000µs 📉 -38.8%) vs baseline: ~same Memory: ✅ 43.450MB (SLO: <46.000MB -5.5%) vs baseline: +5.6% ✅ index_aspectTime: ✅ 128.690µs (SLO: <300.000µs 📉 -57.1%) vs baseline: +8.0% Memory: ✅ 43.470MB (SLO: <46.000MB -5.5%) vs baseline: +5.7% ✅ index_noaspectTime: ✅ 40.195µs (SLO: <300.000µs 📉 -86.6%) vs baseline: ~same Memory: ✅ 43.332MB (SLO: <46.000MB -5.8%) vs baseline: +5.5% ✅ join_aspectTime: ✅ 212.129µs (SLO: <300.000µs 📉 -29.3%) vs baseline: -1.6% Memory: ✅ 43.372MB (SLO: <46.000MB -5.7%) vs baseline: +5.2% ✅ join_noaspectTime: ✅ 142.242µs (SLO: <300.000µs 📉 -52.6%) vs baseline: -0.6% Memory: ✅ 43.431MB (SLO: <46.000MB -5.6%) vs baseline: +5.5% ✅ ljust_aspectTime: ✅ 585.774µs (SLO: <700.000µs 📉 -16.3%) vs baseline: 📈 +13.7% Memory: ✅ 43.482MB (SLO: <46.000MB -5.5%) vs baseline: +5.6% ✅ ljust_noaspectTime: ✅ 260.427µs (SLO: <300.000µs 📉 -13.2%) vs baseline: -1.1% Memory: ✅ 43.391MB (SLO: <46.000MB -5.7%) vs baseline: +5.8% ✅ lower_aspectTime: ✅ 294.283µs (SLO: <500.000µs 📉 -41.1%) vs baseline: -4.4% Memory: ✅ 43.391MB (SLO: <46.000MB -5.7%) vs baseline: +5.8% ✅ lower_noaspectTime: ✅ 236.263µs (SLO: <300.000µs 📉 -21.2%) vs baseline: -2.1% Memory: ✅ 43.411MB (SLO: <46.000MB -5.6%) vs baseline: +5.5% ✅ lstrip_aspectTime: ✅ 0.269ms (SLO: <3.000ms 📉 -91.0%) vs baseline: -4.1% Memory: ✅ 43.391MB (SLO: <46.000MB -5.7%) vs baseline: +5.5% ✅ lstrip_noaspectTime: ✅ 0.178ms (SLO: <3.000ms 📉 -94.1%) vs baseline: -3.1% Memory: ✅ 43.372MB (SLO: <46.000MB -5.7%) vs baseline: +5.2% ✅ modulo_aspectTime: ✅ 14.448ms (SLO: <18.750ms 📉 -22.9%) vs baseline: +0.5% Memory: ✅ 43.638MB (SLO: <46.000MB -5.1%) vs baseline: +6.0% ✅ modulo_aspect_for_bytearray_bytearrayTime: ✅ 14.785ms (SLO: <19.350ms 📉 -23.6%) vs baseline: +0.2% Memory: ✅ 43.546MB (SLO: <46.000MB -5.3%) vs baseline: +5.5% ✅ modulo_aspect_for_bytesTime: ✅ 14.438ms (SLO: <18.900ms 📉 -23.6%) vs baseline: +0.6% Memory: ✅ 43.442MB (SLO: <46.000MB -5.6%) vs baseline: +5.0% ✅ modulo_aspect_for_bytes_bytearrayTime: ✅ 14.677ms (SLO: <19.150ms 📉 -23.4%) vs baseline: +0.6% Memory: ✅ 43.639MB (SLO: <46.000MB -5.1%) vs baseline: +5.7% ✅ modulo_noaspectTime: ✅ 0.363ms (SLO: <3.000ms 📉 -87.9%) vs baseline: -1.1% Memory: ✅ 43.470MB (SLO: <46.000MB -5.5%) vs baseline: +5.6% ✅ replace_aspectTime: ✅ 18.509ms (SLO: <24.000ms 📉 -22.9%) vs baseline: ~same Memory: ✅ 43.509MB (SLO: <46.000MB -5.4%) vs baseline: +5.6% ✅ replace_noaspectTime: ✅ 280.021µs (SLO: <300.000µs -6.7%) vs baseline: -3.8% Memory: ✅ 43.411MB (SLO: <46.000MB -5.6%) vs baseline: +5.5% ✅ repr_aspectTime: ✅ 311.266µs (SLO: <420.000µs 📉 -25.9%) vs baseline: -4.4% Memory: ✅ 43.411MB (SLO: <46.000MB -5.6%) vs baseline: +5.7% ✅ repr_noaspectTime: ✅ 46.870µs (SLO: <90.000µs 📉 -47.9%) vs baseline: +1.0% Memory: ✅ 43.352MB (SLO: <46.000MB -5.8%) vs baseline: +5.5% ✅ rstrip_aspectTime: ✅ 382.239µs (SLO: <500.000µs 📉 -23.6%) vs baseline: -1.5% Memory: ✅ 43.394MB (SLO: <46.000MB -5.7%) vs baseline: +5.2% ✅ rstrip_noaspectTime: ✅ 185.984µs (SLO: <300.000µs 📉 -38.0%) vs baseline: +0.7% Memory: ✅ 43.391MB (SLO: <46.000MB -5.7%) vs baseline: +5.4% ✅ slice_aspectTime: ✅ 183.454µs (SLO: <300.000µs 📉 -38.8%) vs baseline: +0.9% Memory: ✅ 43.391MB (SLO: <46.000MB -5.7%) vs baseline: +5.2% ✅ slice_noaspectTime: ✅ 54.083µs (SLO: <90.000µs 📉 -39.9%) vs baseline: +0.5% Memory: ✅ 43.431MB (SLO: <46.000MB -5.6%) vs baseline: +5.7% ✅ stringio_aspectTime: ✅ 4.405ms (SLO: <5.000ms 📉 -11.9%) vs baseline: 📈 +13.6% Memory: ✅ 43.391MB (SLO: <46.000MB -5.7%) vs baseline: +5.1% ✅ stringio_noaspectTime: ✅ 346.010µs (SLO: <500.000µs 📉 -30.8%) vs baseline: -3.0% Memory: ✅ 43.470MB (SLO: <46.000MB -5.5%) vs baseline: +5.4% ✅ strip_aspectTime: ✅ 268.063µs (SLO: <350.000µs 📉 -23.4%) vs baseline: -4.3% Memory: ✅ 43.372MB (SLO: <46.000MB -5.7%) vs baseline: +5.6% ✅ strip_noaspectTime: ✅ 175.711µs (SLO: <240.000µs 📉 -26.8%) vs baseline: -4.1% Memory: ✅ 43.332MB (SLO: <46.000MB -5.8%) vs baseline: +5.2% ✅ swapcase_aspectTime: ✅ 332.703µs (SLO: <500.000µs 📉 -33.5%) vs baseline: -3.6% Memory: ✅ 43.391MB (SLO: <46.000MB -5.7%) vs baseline: +5.5% ✅ swapcase_noaspectTime: ✅ 273.059µs (SLO: <400.000µs 📉 -31.7%) vs baseline: -2.6% Memory: ✅ 43.332MB (SLO: <46.000MB -5.8%) vs baseline: +5.2% ✅ title_aspectTime: ✅ 322.488µs (SLO: <500.000µs 📉 -35.5%) vs baseline: -5.3% Memory: ✅ 43.450MB (SLO: <46.000MB -5.5%) vs baseline: +5.7% ✅ title_noaspectTime: ✅ 258.249µs (SLO: <400.000µs 📉 -35.4%) vs baseline: -2.5% Memory: ✅ 43.332MB (SLO: <46.000MB -5.8%) vs baseline: +5.6% ✅ translate_aspectTime: ✅ 489.931µs (SLO: <700.000µs 📉 -30.0%) vs baseline: -3.6% Memory: ✅ 43.313MB (SLO: <46.000MB -5.8%) vs baseline: +5.0% ✅ translate_noaspectTime: ✅ 427.882µs (SLO: <500.000µs 📉 -14.4%) vs baseline: -1.5% Memory: ✅ 43.352MB (SLO: <46.000MB -5.8%) vs baseline: +5.5% ✅ upper_aspectTime: ✅ 293.333µs (SLO: <500.000µs 📉 -41.3%) vs baseline: -3.9% Memory: ✅ 43.411MB (SLO: <46.000MB -5.6%) vs baseline: +5.4% ✅ upper_noaspectTime: ✅ 235.203µs (SLO: <400.000µs 📉 -41.2%) vs baseline: -4.0% Memory: ✅ 43.411MB (SLO: <46.000MB -5.6%) vs baseline: +5.5% 📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 508.390µs (SLO: <700.000µs 📉 -27.4%) vs baseline: 📈 +15.1% Memory: ✅ 43.313MB (SLO: <46.000MB -5.8%) vs baseline: +5.5% ✅ ospathbasename_noaspectTime: ✅ 429.941µs (SLO: <700.000µs 📉 -38.6%) vs baseline: -2.7% Memory: ✅ 43.332MB (SLO: <46.000MB -5.8%) vs baseline: +5.4% ✅ ospathjoin_aspectTime: ✅ 624.329µs (SLO: <700.000µs 📉 -10.8%) vs baseline: -0.6% Memory: ✅ 43.352MB (SLO: <46.000MB -5.8%) vs baseline: +5.5% ✅ ospathjoin_noaspectTime: ✅ 634.774µs (SLO: <700.000µs -9.3%) vs baseline: +1.2% Memory: ✅ 43.273MB (SLO: <46.000MB -5.9%) vs baseline: +5.1% ✅ ospathnormcase_aspectTime: ✅ 347.168µs (SLO: <700.000µs 📉 -50.4%) vs baseline: -1.6% Memory: ✅ 43.411MB (SLO: <46.000MB -5.6%) vs baseline: +5.8% ✅ ospathnormcase_noaspectTime: ✅ 357.261µs (SLO: <700.000µs 📉 -49.0%) vs baseline: -2.6% Memory: ✅ 43.352MB (SLO: <46.000MB -5.8%) vs baseline: +5.6% ✅ ospathsplit_aspectTime: ✅ 487.009µs (SLO: <700.000µs 📉 -30.4%) vs baseline: -1.3% Memory: ✅ 43.313MB (SLO: <46.000MB -5.8%) vs baseline: +5.3% ✅ ospathsplit_noaspectTime: ✅ 495.541µs (SLO: <700.000µs 📉 -29.2%) vs baseline: -1.3% Memory: ✅ 43.352MB (SLO: <46.000MB -5.8%) vs baseline: +5.4% ✅ ospathsplitdrive_aspectTime: ✅ 372.801µs (SLO: <700.000µs 📉 -46.7%) vs baseline: -1.8% Memory: ✅ 43.313MB (SLO: <46.000MB -5.8%) vs baseline: +5.3% ✅ ospathsplitdrive_noaspectTime: ✅ 73.066µs (SLO: <700.000µs 📉 -89.6%) vs baseline: +0.4% Memory: ✅ 43.411MB (SLO: <46.000MB -5.6%) vs baseline: +5.4% ✅ ospathsplitext_aspectTime: ✅ 455.352µs (SLO: <700.000µs 📉 -34.9%) vs baseline: -2.4% Memory: ✅ 43.293MB (SLO: <46.000MB -5.9%) vs baseline: +5.5% ✅ ospathsplitext_noaspectTime: ✅ 464.057µs (SLO: <700.000µs 📉 -33.7%) vs baseline: -1.3% Memory: ✅ 43.332MB (SLO: <46.000MB -5.8%) vs baseline: +5.3% 🟡 Near SLO Breach (1 suite)🟡 flasksimple - 18/18✅ appsec-getTime: ✅ 3.407ms (SLO: <4.750ms 📉 -28.3%) vs baseline: -0.5% Memory: ✅ 55.502MB (SLO: <66.500MB 📉 -16.5%) vs baseline: +5.0% ✅ appsec-postTime: ✅ 2.897ms (SLO: <6.750ms 📉 -57.1%) vs baseline: ~same Memory: ✅ 55.483MB (SLO: <66.500MB 📉 -16.6%) vs baseline: +5.0% ✅ appsec-telemetryTime: ✅ 3.460ms (SLO: <4.750ms 📉 -27.1%) vs baseline: +1.2% Memory: ✅ 55.542MB (SLO: <66.500MB 📉 -16.5%) vs baseline: +5.1% ✅ debuggerTime: ✅ 1.869ms (SLO: <2.000ms -6.5%) vs baseline: ~same Memory: ✅ 48.975MB (SLO: <51.500MB -4.9%) vs baseline: +5.1% ✅ iast-getTime: ✅ 1.856ms (SLO: <2.000ms -7.2%) vs baseline: -0.3% Memory: ✅ 45.711MB (SLO: <49.000MB -6.7%) vs baseline: +5.3% ✅ profilerTime: ✅ 1.902ms (SLO: <2.100ms -9.4%) vs baseline: -0.2% Memory: ✅ 51.511MB (SLO: <52.500MB 🟡 -1.9%) vs baseline: +4.9% ✅ resource-renamingTime: ✅ 3.404ms (SLO: <3.650ms -6.8%) vs baseline: -0.3% Memory: ✅ 55.463MB (SLO: <60.000MB -7.6%) vs baseline: +4.9% ✅ tracerTime: ✅ 3.419ms (SLO: <3.650ms -6.3%) vs baseline: ~same Memory: ✅ 55.483MB (SLO: <60.000MB -7.5%) vs baseline: +4.9% ✅ tracer-nativeTime: ✅ 3.410ms (SLO: <3.650ms -6.6%) vs baseline: +0.3% Memory: ✅ 55.522MB (SLO: <60.000MB -7.5%) vs baseline: +5.2%
|
|
✨ Fix all issues with BitsAI or with Cursor
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 797a790a26
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| Returns: | ||
| bool: True if DD_API_KEY and DD_APP_KEY environment variables are set, False otherwise. | ||
| """ | ||
| return "DD_API_KEY" in os.environ and "DD_APP_KEY" in os.environ |
There was a problem hiding this comment.
Restrict Datadog key injection to Datadog tracking hosts
The provider is enabled solely by environment variables (DD_API_KEY/DD_APP_KEY) and does not check the MLflow tracking destination, so request_headers() will attach Datadog credentials to any MLflow server URL when those vars are set. This means users running MLflow against self-hosted or third-party tracking servers while also exporting Datadog keys can leak both secrets over outbound requests; please gate this provider to Datadog tracking URIs (or an explicit opt-in) before injecting headers.
Useful? React with 👍 / 👎.
releasenotes/notes/add-auth-plugin-for-mlflow-4aa9aaa6e783078e.yaml
Outdated
Show resolved
Hide resolved
jboolean
left a comment
There was a problem hiding this comment.
Seems like it should likely be gated to datadog servers. codex had the same comment.
| pass | ||
|
|
||
|
|
||
| class DatadogHeaderProvider(RequestHeaderProvider): |
There was a problem hiding this comment.
Is there any opt-in or does it send these keys to any server when the tracer is enabled?
If users have their own tracking server and are using dd-trace-py for general instrumentation might we start sending keys to other servers? An unlikely but slightly concerning scenario.
There was a problem hiding this comment.
Yes, I've addressed Codex's comment by requiring the user to set the DD_MODEL_LAB_ENABLED environment variable to a truthy value. If they do not set it, we will not inject headers in requests even if the DD_API_KEY and DD_APP_KEY environment variables are set.
|
@emmettbutler thank you for the review! I have addressed both of your comments, this is ready for another look! |
|
I have created https://github.com/DataDog/dd-go/pull/225728 to make the system tests pass, I will reflect it in the system tests repo once I merge it into dd-go |
Description
This PR sounds serious, but the actual thing it does is dead simple.
MLFlow is an open source framework for monitoring ML training jobs. The MLFlow SDK uses HTTP to send telemetry messages to a backend. MLFlow has an open source implementation of a backend service, the MLFlow tracking server, that receives and stores these telemetry messages. We are building our own internal backend service to receive these messages from MLFlow's SDK.
MLFlow allows you to write custom plugins to modify its behavior, as described here. One of the simplest types of plugin is a request header provider: it tells MLFlow to include additional headers in the HTTP requests that it sends. That's what I am adding in this PR. The plugin tells MLFlow to check if the environment variables
DD_API_KEYandDD_APP_KEYare set, and if so, to include two additional HTTP headers,DD-API-KEY:andDD-APPLICATION-KEY, in its HTTP requests to the backend.MLFlow uses Python entry points to detect and register available plugins, so I am adding an entry point for the new plugin to our pyproject.toml.
The changes to the riot requirements files in this PR are generated automatically with this command:
Local Testing
With Docker Desktop running, I was able to run the unit tests locally with this command:
where the hash comes from this other command:
End-to-end Testing
Install dd-trace-py from the local directory:
Set the environment variables
DD_API_KEYandDD_APP_KEYin your terminal to valid Datadog API and application keys:Run the attached script. Note how it sets the MLFlow tracking server URL to
https://dd.datad0g.com/api/v1/mlflow-tracking-serveras opposed tohttps://dd.datad0g.com/api/unstable/mlflow-tracking-server. The latter does not enforce authentication, but the former does.simulate_run.py
Verify that the script does not produce errors from sending MLFlow telemetry to Datadog's backend, and that
projects.ownerandruns.ownerin the new rows in our Postgres database are set to the user that owns the API key (yourself).Risks
None
Additional Notes
None