Skip to content

Conversation

@srprash
Copy link
Contributor

@srprash srprash commented Oct 28, 2025

Issue

This PR first fixed the Gevent monkey patching issue by calling the patch operation when the ADOT Python starts up and before the modules are loaded.
Later on in v0.10.0, this commit broke the above fix by adding
the line from amazon.opentelemetry.distro.aws_opentelemetry_configurator which transitively imports the requests library before the gevent patch operation is called, causing inconsistencies in the patched libraries resulting in seeing the RecursionError: maximum recursion depth exceeded error reappearing.

Changes

  1. Adding back the earlier logic to run the gevent's monkey patch (only if gevent is installed) which was removed in this commit to unblock development.
  2. Refactoring the patch into its own file and adding comprehensive comments to avoid the same issue in future.
  3. Updated unit tests to mock the gevent module and validate only the ADOT logic without running the gevent monkey patch. The actual behavior of the patch should be tested in a higher level test such as a contract test.

Testing

Did a manual E2E testing using a sample flask application. Confirmed the following:

  • Saw RecursionError in the application with ADOT Python v0.10.0
  • No such error when using ADOT with this fix.

Setup

Sample app
from flask import Flask, request, jsonify
import logging
import sys
from datetime import datetime

import requests

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

app = Flask(__name__)
logger = logging.getLogger(__name__)

@app.route('/requests')
def make_request():
    res = requests.get('https://httpbin.org/get', timeout=5)
    return jsonify({
        'status': 'success',
        'status_code': res.status_code,
        'url': res.url,
        'timestamp': datetime.now().isoformat()
    })

if __name__ == '__main__':
    app.run(debug=False, host='0.0.0.0', port=5001)
Running with problematic version (see the MonkeyPatchWarning)
((venv) ) ➜  flask git:(fix_gevent_regression) ✗ opentelemetry-instrument gunicorn -w 1 -k gevent --bind 0.0.0.0:5001 app:app
/Users/srprash/aws-otel-python-instrumentation/venv/lib/python3.12/site-packages/amazon/opentelemetry/distro/patches/_instrumentation_patch.py:36: MonkeyPatchWarning: Monkey-patching ssl after ssl has already been imported may lead to errors, including RecursionError on Python 3.6. It may also silently lead to incorrect behaviour on Python 3.7. Please monkey-patch earlier. See https://github.com/gevent/gevent/issues/1016. Modules that had direct imports (NOT patched): ['urllib3.util.ssl_ (/Users/srprash/aws-otel-python-instrumentation/venv/lib/python3.12/site-packages/urllib3/util/ssl_.py)', 'urllib3.util (/Users/srprash/aws-otel-python-instrumentation/venv/lib/python3.12/site-packages/urllib3/util/__init__.py)']. 
  monkey.patch_all()
Failed to get k8s token: [Errno 2] No such file or directory: '/var/run/secrets/kubernetes.io/serviceaccount/token'
AwsEksResourceDetector failed: [Errno 2] No such file or directory: '/var/run/secrets/kubernetes.io/serviceaccount/token'
AwsEcsResourceDetector failed: Missing ECS_CONTAINER_METADATA_URI therefore process is not on ECS.
AwsEc2ResourceDetector failed: <urlopen error [Errno 65] No route to host>
[2025-10-29 11:00:55 -0700] [54080] [INFO] Starting gunicorn 23.0.0
[2025-10-29 11:00:55 -0700] [54080] [INFO] Listening at: http://0.0.0.0:5001 (54080)
[2025-10-29 11:00:55 -0700] [54080] [INFO] Using worker: gevent
[2025-10-29 11:00:55 -0700] [54290] [INFO] Booting worker with pid: 54290
2025-10-29 11:02:12,464 - app - ERROR - Exception on /requests [GET]
Running with the fix
((venv) ) ➜  flask git:(fix_gevent_regression) ✗ opentelemetry-instrument gunicorn -w 1 -k gevent --bind 0.0.0.0:5001 app:app
Failed to get k8s token: [Errno 2] No such file or directory: '/var/run/secrets/kubernetes.io/serviceaccount/token'
AwsEksResourceDetector failed: [Errno 2] No such file or directory: '/var/run/secrets/kubernetes.io/serviceaccount/token'
AwsEcsResourceDetector failed: Missing ECS_CONTAINER_METADATA_URI therefore process is not on ECS.
AwsEc2ResourceDetector failed: <urlopen error [Errno 65] No route to host>
[2025-10-29 11:04:45 -0700] [60707] [INFO] Starting gunicorn 23.0.0
[2025-10-29 11:04:45 -0700] [60707] [INFO] Listening at: http://0.0.0.0:5001 (60707)
[2025-10-29 11:04:45 -0700] [60707] [INFO] Using worker: gevent
[2025-10-29 11:04:45 -0700] [60734] [INFO] Booting worker with pid: 60734

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@srprash srprash force-pushed the fix_gevent_regression branch 2 times, most recently from c55bbd8 to 54c1437 Compare October 29, 2025 04:55
@srprash srprash force-pushed the fix_gevent_regression branch from 54c1437 to 5810288 Compare October 29, 2025 05:04
@srprash srprash force-pushed the fix_gevent_regression branch from 41d8ccd to b7b4ec1 Compare October 29, 2025 17:02
@srprash srprash force-pushed the fix_gevent_regression branch from b7b4ec1 to a306fd4 Compare October 29, 2025 17:15
@srprash srprash changed the title adding back the gevent patch with correct import order + minor refactor Fix Gevent patch regression with correct import order Oct 29, 2025
@srprash srprash marked this pull request as ready for review October 29, 2025 18:09
@srprash srprash requested a review from a team as a code owner October 29, 2025 18:09
@srprash srprash merged commit 387441e into main Oct 29, 2025
16 checks passed
@srprash srprash deleted the fix_gevent_regression branch October 29, 2025 23:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants