[OSDEV-2304] Upload MIT Living Wage data to the platform#844
[OSDEV-2304] Upload MIT Living Wage data to the platform#844protsack-stephan merged 66 commits intomainfrom
Conversation
…t table. Moved partner fields return to the GET os_id endpoint
…orm' into OSDEV-2304-upload-mit-living-wage-data-to-platform # Conflicts: # src/django/api/admin.py # src/django/api/models/__init__.py
…contributor data retrieval; introduced editing constraints for system fields
…orm' into OSDEV-2304-upload-mit-living-wage-data-to-platform # Conflicts: # src/django/api/admin.py
…tor ID to the base class; fixed linter errors
…orm' into OSDEV-2304-upload-mit-living-wage-data-to-platform
…OSDEV-2305-upload-the-wageindicator-data-into-the-platform
📝 WalkthroughWalkthroughAdds US county tigerline GIS support: new USCountyTigerline model, migrations to create/populate it from S3/MinIO CSV, a mit_living_wage system partner-field provider and registry entry, fixtures, admin UI, tests, MinIO local-dev wiring, docs, and Changes
Sequence Diagram(s)mermaid mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (6)
.env.sample (1)
10-18: Optional: reorder new env keys to satisfy dotenv-linterThe new AWS/Stripe/Google env keys are fine functionally, but
dotenv-linteris flagging them as unordered. If you want a clean lint run, consider reordering:
- Group and alphabetize
AWS_*keys (AWS_ACCESS_KEY_ID,AWS_REGION,AWS_S3_ENDPOINT_URL,AWS_SECRET_ACCESS_KEY,AWS_STORAGE_BUCKET_NAME) and place them beforeEXTERNAL_DOMAIN.- Move
GOOGLE_SERVICE_ACCOUNT_CREDS_BASE64next to the other Google keys.- Swap
STRIPE_PRICE_IDbeforeSTRIPE_SECRET_KEY.Purely a style/lint clean‑up; no behavior change.
README.md (1)
113-116: Add language to fenced shell snippet for markdownlintMarkdownlint is flagging this block; to satisfy MD040 and improve rendering, consider:
-``` -./scripts/manage load_fixtures -``` +```bash +./scripts/manage load_fixtures +```src/django/api/partner_fields/mit_living_wage_provider.py (2)
29-31: ReplaceThe codebase uses a logger pattern (visible in
base_provider.py). Using🔎 Proposed fix
+import logging from typing import Dict, Any, Optional from django.contrib.gis.geos import Point from api.models.us_county_tigerline import USCountyTigerline from api.partner_fields.base_provider import SystemPartnerFieldProvider + +logger = logging.getLogger(__name__)if not facility.location: - print(f'No location found for facility {facility.id}') + logger.warning(f'No location found for facility {facility.id}') return None
45-47: ReplaceSimilar to the above, use
logger.errororlogger.warningfor consistency with the project's logging infrastructure.🔎 Proposed fix
try: return USCountyTigerline.objects.filter( geometry__contains=point_5070 ).first() except Exception as e: - print(f'Error fetching geoid from database: {e}') + logger.error(f'Error fetching geoid from database: {e}') return Nonesrc/django/api/migrations/0195_add_mit_livingwage_partner_field.py (1)
17-28: Update placeholder title in JSON schema.The title
"Some Data"appears to be a placeholder. Consider updating it to something more descriptive like"MIT Living Wage County Data"for clarity in API documentation and schema introspection.🔎 Proposed fix
json_schema = { "type": "object", - "title": "Some Data", + "title": "MIT Living Wage County Data", "$schema": "https://json-schema.org/draft/2020-12/schema", "properties": { "county_id": { "type": "string", "title": "County Id", "format": "uri-reference" } } }src/django/api/migrations/0194_create_and_populate_us_county_tigerline.py (1)
92-92: Minor: unused loop index fromenumerate.The loop index
_fromenumerate(reader, start=1)is unused. Consider using a plainfor row in reader:instead, unless you plan to use the index for logging/error reporting.🔎 Proposed fix
- for _, row in enumerate(reader, start=1): + for row in reader: geoid = row['geoid'].strip()
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
.env.sampleREADME.mddoc/release/RELEASE-NOTES.mddocker-compose.ymlsrc/django/api/admin.pysrc/django/api/fixtures/us_county_tigerline.jsonsrc/django/api/management/commands/load_fixtures.pysrc/django/api/migrations/0194_create_and_populate_us_county_tigerline.pysrc/django/api/migrations/0195_add_mit_livingwage_partner_field.pysrc/django/api/models/__init__.pysrc/django/api/models/us_county_tigerline.pysrc/django/api/partner_fields/mit_living_wage_provider.pysrc/django/api/partner_fields/registry.pysrc/django/api/tests/test_mit_living_wage_provider.py
🧰 Additional context used
🧠 Learnings (6)
📚 Learning: 2025-10-07T10:55:42.752Z
Learnt from: roman-stolar
Repo: opensupplyhub/open-supply-hub PR: 765
File: src/django/api/models/extended_field.py:38-38
Timestamp: 2025-10-07T10:55:42.752Z
Learning: In the Open Supply Hub codebase, the `ESTIMATED_EMISSIONS_ACTIVITY` and `ESTIMATED_ANNUAL_ENERGY_CONSUMPTION` constants were defined in `src/django/api/models/extended_field.py` but were never actually implemented in database migrations or used to store data. They were removed as unused code cleanup.
Applied to files:
doc/release/RELEASE-NOTES.md
📚 Learning: 2025-11-18T09:54:27.458Z
Learnt from: VadimKovalenkoSNF
Repo: opensupplyhub/open-supply-hub PR: 796
File: src/django/api/management/commands/backfill_isic_4_extended_fields.py:115-175
Timestamp: 2025-11-18T09:54:27.458Z
Learning: In the Open Supply Hub codebase, the facility.id field (the database primary key) is the same as the OS ID (the public identifier exposed via the API).
Applied to files:
src/django/api/partner_fields/mit_living_wage_provider.py
📚 Learning: 2025-01-23T11:09:14.873Z
Learnt from: VadimKovalenkoSNF
Repo: opensupplyhub/open-supply-hub PR: 470
File: src/react/src/util/util.js:1435-1466
Timestamp: 2025-01-23T11:09:14.873Z
Learning: Error validation and throwing errors should be handled at higher levels of the application rather than in utility functions, as per team's preference.
Applied to files:
src/django/api/partner_fields/mit_living_wage_provider.py
📚 Learning: 2024-12-13T13:20:48.143Z
Learnt from: mazursasha1990
Repo: opensupplyhub/open-supply-hub PR: 438
File: src/django/api/views/v1/moderation_events.py:140-142
Timestamp: 2024-12-13T13:20:48.143Z
Learning: In the Open Supply Hub codebase, it's acceptable to catch broad exceptions in methods like `add_production_location` and `update_production_location` in `src/django/api/views/v1/moderation_events.py` instead of catching specific exceptions.
Applied to files:
src/django/api/partner_fields/mit_living_wage_provider.py
📚 Learning: 2024-12-12T14:59:19.694Z
Learnt from: mazursasha1990
Repo: opensupplyhub/open-supply-hub PR: 438
File: src/django/api/tests/test_moderation_events_add_production_location.py:21-65
Timestamp: 2024-12-12T14:59:19.694Z
Learning: In `src/django/api/tests/test_moderation_events_add_production_location.py`, the tests have been refactored, and the use of `POST` methods is intentional. Future suggestions to change HTTP methods in these tests may not be necessary.
Applied to files:
src/django/api/tests/test_mit_living_wage_provider.py
📚 Learning: 2024-11-28T11:29:28.139Z
Learnt from: Innavin369
Repo: opensupplyhub/open-supply-hub PR: 431
File: src/django/api/serializers/v1/opensearch_common_validators/countries_validator.py:23-24
Timestamp: 2024-11-28T11:29:28.139Z
Learning: The files `src/django/api/tests/test_facility_claim_view_set.py`, `src/django/api/views/facility/facilities_view_set.py`, and `src/django/api/facility_actions/processing_facility_api.py` are not part of the V1 codebase in the Open Supply Hub project.
Applied to files:
src/django/api/tests/test_mit_living_wage_provider.py
🧬 Code graph analysis (5)
src/django/api/models/__init__.py (1)
src/django/api/models/us_county_tigerline.py (1)
USCountyTigerline(5-31)
src/django/api/partner_fields/mit_living_wage_provider.py (2)
src/django/api/models/us_county_tigerline.py (1)
USCountyTigerline(5-31)src/django/api/partner_fields/base_provider.py (1)
SystemPartnerFieldProvider(12-107)
src/django/api/partner_fields/registry.py (1)
src/django/api/partner_fields/mit_living_wage_provider.py (1)
MITLivingWageProvider(7-73)
src/django/api/admin.py (1)
src/django/api/models/us_county_tigerline.py (1)
USCountyTigerline(5-31)
src/django/api/tests/test_mit_living_wage_provider.py (5)
src/django/api/models/partner_field.py (1)
PartnerField(14-143)src/django/api/models/us_county_tigerline.py (1)
USCountyTigerline(5-31)src/django/api/partner_fields/mit_living_wage_provider.py (4)
MITLivingWageProvider(7-73)_get_field_name(15-17)_fetch_raw_data(19-47)_format_data(49-73)src/django/api/models/partner_field_manager.py (1)
get_all_including_inactive(11-13)src/django/api/partner_fields/base_provider.py (1)
fetch_data(20-50)
🪛 dotenv-linter (4.0.0)
.env.sample
[warning] 10-10: [UnorderedKey] The AWS_S3_ENDPOINT_URL key should go before the EXTERNAL_DOMAIN key
(UnorderedKey)
[warning] 11-11: [UnorderedKey] The AWS_REGION key should go before the AWS_S3_ENDPOINT_URL key
(UnorderedKey)
[warning] 12-12: [UnorderedKey] The AWS_STORAGE_BUCKET_NAME key should go before the EXTERNAL_DOMAIN key
(UnorderedKey)
[warning] 13-13: [UnorderedKey] The AWS_ACCESS_KEY_ID key should go before the AWS_REGION key
(UnorderedKey)
[warning] 14-14: [UnorderedKey] The AWS_SECRET_ACCESS_KEY key should go before the AWS_STORAGE_BUCKET_NAME key
(UnorderedKey)
[warning] 16-16: [UnorderedKey] The STRIPE_PRICE_ID key should go before the STRIPE_SECRET_KEY key
(UnorderedKey)
[warning] 18-18: [UnorderedKey] The GOOGLE_SERVICE_ACCOUNT_CREDS_BASE64 key should go before the HUBSPOT_API_KEY key
(UnorderedKey)
🪛 markdownlint-cli2 (0.18.1)
README.md
114-114: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.14.10)
src/django/api/partner_fields/mit_living_wage_provider.py
45-45: Do not catch blind exception: Exception
(BLE001)
src/django/api/migrations/0194_create_and_populate_us_county_tigerline.py
48-51: Avoid specifying long messages outside the exception class
(TRY003)
67-67: Unused function argument: schema_editor
(ARG001)
84-87: Create your own exception
(TRY002)
84-87: Avoid specifying long messages outside the exception class
(TRY003)
103-105: Avoid specifying long messages outside the exception class
(TRY003)
132-132: Unused function argument: schema_editor
(ARG001)
142-144: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
146-175: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
src/django/api/migrations/0195_add_mit_livingwage_partner_field.py
6-6: Unused function argument: schema_editor
(ARG001)
13-15: Avoid specifying long messages outside the exception class
(TRY003)
41-41: Unused function argument: schema_editor
(ARG001)
51-53: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
55-60: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: run-eslint-linter-and-prettier-formatter
- GitHub Check: run-dd-code-quality
- GitHub Check: run-flake8-linter
- GitHub Check: get-base-branch-contricleaner-cov
- GitHub Check: run-countries-code-quality
- GitHub Check: get-base-branch-fe-cov
- GitHub Check: get-base-branch-django-cov
- GitHub Check: run-integration-test-code-quality
- GitHub Check: get-base-branch-countries-cov
- GitHub Check: run-contricleaner-code-quality
- GitHub Check: run-django-code-quality
- GitHub Check: run-fe-code-quality
- GitHub Check: get-base-branch-dd-cov
🔇 Additional comments (15)
src/django/api/management/commands/load_fixtures.py (1)
20-21: Fixture addition looks goodIncluding
us_county_tigerline.jsonin theload_fixturescommand is consistent with the new fixture file and error handling; no issues spotted.src/django/api/models/__init__.py (1)
85-85: USCountyTigerline export is consistentRe-exporting
USCountyTigerlinefromapi.modelsmatches the package pattern and will simplify imports (e.g., for providers/admin).src/django/api/fixtures/us_county_tigerline.json (1)
1-52: USCountyTigerline fixture structure matches the modelPKs, field names, geometry WKT (with
SRID=5070;MULTIPOLYGON(...)), and timestamps all align withUSCountyTigerline; this should load cleanly for local/dev/test usage.src/django/api/partner_fields/registry.py (1)
4-4: MITLivingWageProvider registration is straightforwardRegistering
MITLivingWageProvider()alongsideWageIndicatorProvider()cleanly extends the system partner field set; registry behavior remains unchanged.Also applies to: 23-26
src/django/api/admin.py (1)
20-20: USCountyTigerline admin integration looks solidRead‑only admin with
geoid/namein list display and search is a sensible default for this reference data, and registration withadmin_siteis wired correctly.Also applies to: 402-406, 436-436
doc/release/RELEASE-NOTES.md (1)
19-20: Release notes accurately describe MIT Living Wage & Tigerline changesThe new entries for 0194/0195, OSDEV‑2304 schema and API updates, and the “What’s new” bullet all line up with the new model, migrations, provider, and facility endpoint behavior introduced in this PR. No changes needed.
Also applies to: 24-25, 29-29, 40-40
README.md (1)
103-111: CSV filename matches consistently across README, migration, and docker-compose configuration—no action needed.Verification confirms the filename
us_county_tigerline_2025.csvreferenced in the README aligns exactly with the migration code (S3_CSV_KEY ='data/us_county_tigerline_2025.csv') and the docker-compose init-minio service, which handles the file upload to the same path. The migration also has proper error handling to gracefully skip data population if the CSV is missing in local development.src/django/api/partner_fields/mit_living_wage_provider.py (1)
49-73: LGTM!The
_format_datamethod correctly structures the MIT Living Wage data into the standard partner field format with appropriate keys and values.src/django/api/migrations/0195_add_mit_livingwage_partner_field.py (1)
30-60: LGTM!The migration correctly creates the
mit_living_wagepartner field with proper attributes (system_field=True,active=True) and includes a reverse migration for rollback support.docker-compose.yml (2)
115-143: LGTM!The
init-minioservice correctly initializes the MinIO bucket and handles the optional CSV upload gracefully. The dependency chain ensures MinIO is healthy before initialization.
54-57: LGTM!The Django service is correctly configured to use MinIO as an S3 endpoint for local development, with proper dependency ordering to ensure MinIO initialization completes before Django starts.
Also applies to: 81-82
src/django/api/tests/test_mit_living_wage_provider.py (1)
127-277: LGTM!Comprehensive test coverage including:
- Field name validation
- Raw data fetch with valid/invalid locations and country codes
- US territory support (US, PR, VI)
- Data formatting
- End-to-end flow
- Edge cases (no county data, no contributor, inactive field)
src/django/api/models/us_county_tigerline.py (1)
5-31: LGTM!Well-designed model for storing US county boundary data:
geoidas primary key provides a natural, stable identifier- SRID 5070 (NAD83/Conus Albers) is appropriate for US continental geometry operations
- Timestamps for audit trail
src/django/api/migrations/0194_create_and_populate_us_county_tigerline.py (2)
21-64: LGTM!S3/MinIO integration is well-structured:
- Environment-aware client creation
- Proper validation of bucket configuration
- Clean abstraction with
get_csv_reader
93-129: LGTM!The data processing logic is solid:
- Proper validation of required fields
- Geometry type validation with Polygon → MultiPolygon conversion
- Efficient batch processing with
bulk_create
src/django/api/migrations/0194_create_and_populate_us_county_tigerline.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
docker-compose.yml (1)
90-105: MinIO service configuration looks correct for local development.The service setup with healthcheck and port mappings is appropriate. The image version pinning concern has already been addressed in previous review comments.
src/django/api/migrations/0194_create_and_populate_us_county_tigerline.py (1)
164-167: Verbose name plural mismatch already flagged in previous review.The
verbose_name_pluralshould be'US Counties Tigerline'(plural) to match the model definition, as noted in the previous review comments.
🧹 Nitpick comments (1)
src/django/oar/settings.py (1)
589-604: Redundant AWS_STORAGE_BUCKET_NAME assignment creates confusion.The code reads
AWS_STORAGE_BUCKET_NAMEfrom the environment twice:
- Line 594: Inside the
if DEBUG:block- Line 600: Outside the block
The second assignment (line 600) overwrites the first one, making the assignment at line 594 redundant. This creates confusion about which value is actually used.
🔎 Proposed refactor to eliminate redundancy
# Local development with MinIO if DEBUG: AWS_S3_ENDPOINT_URL = os.getenv('AWS_S3_ENDPOINT_URL') AWS_ACCESS_KEY_ID = os.getenv('AWS_ACCESS_KEY_ID') AWS_SECRET_ACCESS_KEY = os.getenv('AWS_SECRET_ACCESS_KEY') - AWS_STORAGE_BUCKET_NAME = os.getenv('AWS_STORAGE_BUCKET_NAME') AWS_S3_USE_SSL = False AWS_S3_FILE_OVERWRITE = False DEFAULT_FILE_STORAGE = 'storages.backends.s3boto3.S3Boto3Storage' AWS_STORAGE_BUCKET_NAME = os.environ.get('AWS_STORAGE_BUCKET_NAME') if AWS_STORAGE_BUCKET_NAME is None and not DEBUG: raise ImproperlyConfigured( 'Invalid AWS_STORAGE_BUCKET_NAME provided, must be set in the environment' )The bucket name will be read once at line 600 and will be available in both DEBUG and production modes.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
docker-compose.ymlscripts/start_local_devscripts/updatesrc/django/api/migrations/0194_create_and_populate_us_county_tigerline.pysrc/django/oar/settings.py
🧰 Additional context used
🪛 Ruff (0.14.10)
src/django/api/migrations/0194_create_and_populate_us_county_tigerline.py
46-49: Avoid specifying long messages outside the exception class
(TRY003)
65-65: Unused function argument: schema_editor
(ARG001)
82-85: Create your own exception
(TRY002)
82-85: Avoid specifying long messages outside the exception class
(TRY003)
101-103: Avoid specifying long messages outside the exception class
(TRY003)
130-130: Unused function argument: schema_editor
(ARG001)
140-142: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
144-173: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: get-base-branch-django-cov
- GitHub Check: run-flake8-linter
- GitHub Check: run-contricleaner-code-quality
- GitHub Check: run-dd-code-quality
- GitHub Check: get-base-branch-countries-cov
- GitHub Check: run-eslint-linter-and-prettier-formatter
- GitHub Check: get-base-branch-contricleaner-cov
- GitHub Check: run-integration-test-code-quality
- GitHub Check: run-django-code-quality
- GitHub Check: get-base-branch-dd-cov
- GitHub Check: get-base-branch-fe-cov
- GitHub Check: run-countries-code-quality
- GitHub Check: run-fe-code-quality
🔇 Additional comments (8)
docker-compose.yml (1)
47-51: LGTM! Local development MinIO configuration is appropriate.The AWS/S3 environment variables correctly configure Django to use the local MinIO service for development. Hardcoded credentials are acceptable for local development environments.
scripts/update (1)
25-25: LGTM! MinIO service correctly added to initialization sequence.Adding MinIO to the startup ensures the S3-compatible storage is available before migrations run, which is necessary for the US County Tigerline data population.
src/django/api/migrations/0194_create_and_populate_us_county_tigerline.py (6)
19-36: LGTM! S3 client configuration correctly handles MinIO and AWS.The function appropriately distinguishes between local development (MinIO with explicit credentials) and production (AWS S3 with IAM roles), ensuring compatibility across environments.
39-53: LGTM! CSV download logic is correct.The function properly validates the bucket configuration and downloads the CSV from S3. The exception message is clear and informative. Static analysis warnings about exception message placement can be safely ignored in migration files.
56-62: LGTM! CSV reader wrapper is clean and simple.
65-127: LGTM! Population logic correctly implements past review feedback.The function properly handles:
- Environment-specific error handling (graceful skip in Local, strict in production)
- Geometry validation and type conversion (Polygon → MultiPolygon)
- Batch processing with appropriate size
- CSV parsing with field validation
The implementation addresses all concerns raised in previous reviews. Static analysis warnings about unused arguments and exception classes are standard for Django migrations and can be ignored.
130-135: LGTM! Reverse migration logic is appropriate.
169-172: LGTM! RunPython operation correctly wired with forward and reverse.The migration properly connects the population and cleanup functions for bidirectional migration support.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
scripts/start_local_dev (1)
13-47: LGTM! MinIO setup is well-structured and handles edge cases.The
setup_miniofunction properly:
- Configures the MinIO alias and creates the bucket with
--ignore-existing(addressing the previous review comment)- Copies fixtures to MinIO systematically with proper cleanup
- Gracefully handles the optional CSV file with a conditional check
The implementation is clear and maintainable.
src/django/oar/settings.py (1)
589-595: Add validation for DEBUG mode S3/MinIO credentials.When
AWS_S3_ENDPOINT_URLis set in DEBUG mode, the code retrievesAWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY, andAWS_STORAGE_BUCKET_NAMEfrom environment variables but does not validate that they are actually present. If any of these are missing, the application will fail at runtime (when storage is accessed) with an unclear error message.The validation at line 604 only applies to non-DEBUG environments.
🔎 Proposed fix to add DEBUG mode validation
AWS_S3_ENDPOINT_URL = os.getenv('AWS_S3_ENDPOINT_URL') if DEBUG and AWS_S3_ENDPOINT_URL: AWS_ACCESS_KEY_ID = os.getenv('AWS_ACCESS_KEY_ID') AWS_SECRET_ACCESS_KEY = os.getenv('AWS_SECRET_ACCESS_KEY') AWS_STORAGE_BUCKET_NAME = os.getenv('AWS_STORAGE_BUCKET_NAME') AWS_S3_USE_SSL = False + + if not all([AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_STORAGE_BUCKET_NAME]): + raise ImproperlyConfigured( + 'When AWS_S3_ENDPOINT_URL is set in DEBUG mode, all of ' + 'AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_STORAGE_BUCKET_NAME ' + 'must be provided in the environment' + )
🧹 Nitpick comments (3)
README.md (2)
114-116: Add language specification to the code block.The shell code block should specify the language for proper syntax highlighting.
🔎 Proposed fix
-``` +```shell ./scripts/manage load_fixtures</details> --- `104-106`: **Clarify CSV download and placement workflow.** The instructions mention downloading from S3 (`s3://opensupplyhub-development-files-eu-west-1/...`) but then state "The migration will download it from MinIO". This may confuse developers about whether they need to download the file manually or if it's automatically downloaded from MinIO. Consider clarifying that: - For local development, developers should download the CSV from S3 and place it in `src/django/` - The setup_minio script will then copy it to the local MinIO instance - The migration reads from the local MinIO instance </blockquote></details> <details> <summary>src/django/oar/settings.py (1)</summary><blockquote> `594-602`: **Consider removing redundant AWS_STORAGE_BUCKET_NAME assignment.** `AWS_STORAGE_BUCKET_NAME` is assigned from the environment at line 594 (when DEBUG and endpoint URL are present) and then retrieved again at line 602. The second retrieval will overwrite the first assignment for DEBUG mode with endpoint URL configured. While not breaking functionality, this creates confusion. Consider: - Removing line 594 and relying on line 602 for all cases, or - Removing line 602 and adjusting the validation condition at line 604 </blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used**: Repository UI **Review profile**: CHILL **Plan**: Pro <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between e6b59f806c90a373b4140052c5ee1281ec4b0b9a and 09a36f48e35983f2f591f5cd0468da87cc17426e. </details> <details> <summary>📒 Files selected for processing (3)</summary> * `README.md` * `scripts/start_local_dev` * `src/django/oar/settings.py` </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🧠 Learnings (1)</summary> <details> <summary>📚 Learning: 2025-01-21T13:37:17.243Z</summary>Learnt from: roninzp
Repo: opensupplyhub/open-supply-hub PR: 449
File: .github/workflows/deploy_to_aws.yml:165-180
Timestamp: 2025-01-21T13:37:17.243Z
Learning: In the Open Supply Hub project, using the--acl public-readflag with theaws s3 synccommand breaks the synchronization process. The flag should be omitted for proper functionality.**Applied to files:** - `scripts/start_local_dev` </details> </details><details> <summary>🪛 markdownlint-cli2 (0.18.1)</summary> <details> <summary>README.md</summary> 114-114: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> </details> <details> <summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)</summary> * GitHub Check: run-fe-code-quality * GitHub Check: run-flake8-linter * GitHub Check: run-contricleaner-code-quality * GitHub Check: get-base-branch-contricleaner-cov * GitHub Check: run-integration-test-code-quality * GitHub Check: run-dd-code-quality * GitHub Check: get-base-branch-countries-cov * GitHub Check: run-eslint-linter-and-prettier-formatter * GitHub Check: run-countries-code-quality * GitHub Check: get-base-branch-fe-cov * GitHub Check: run-django-code-quality * GitHub Check: get-base-branch-django-cov * GitHub Check: get-base-branch-dd-cov </details> <details> <summary>🔇 Additional comments (1)</summary><blockquote> <details> <summary>scripts/start_local_dev (1)</summary><blockquote> `70-71`: **LGTM! MinIO setup is appropriately placed in the initialization flow.** Calling `setup_minio` after static file collection and before database population ensures that S3/MinIO-backed data sources are available when migrations run. </blockquote></details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
src/django/oar/settings.py (2)
600-600: Consider more robust test detection.The current approach (
'test' in sys.argv) will work for standard Django test commands but may have edge cases:
- False positives if 'test' appears in other command arguments
- May not detect all test runner variations
Consider adding an environment variable fallback for explicit test mode control.
Details
🔎 Alternative approach-TESTING = 'test' in sys.argv +TESTING = 'test' in sys.argv or os.getenv('TESTING', '').lower() == 'true'
605-605: Consolidate duplicateAWS_STORAGE_BUCKET_NAMEassignments.
AWS_STORAGE_BUCKET_NAMEis assigned at line 595 (whenDEBUGandAWS_S3_ENDPOINT_URLare set) and then unconditionally retrieved again here at line 605, overwriting the earlier assignment. This makes line 595's assignment ineffective and creates confusion.Additionally, line 595 uses
os.getenv()while line 605 usesos.environ.get()— these are equivalent but inconsistent.🔎 Proposed consolidation
Option 1: Remove the assignment from line 595 and keep only line 605 (simplest):
if DEBUG and AWS_S3_ENDPOINT_URL: AWS_ACCESS_KEY_ID = os.getenv('AWS_ACCESS_KEY_ID') AWS_SECRET_ACCESS_KEY = os.getenv('AWS_SECRET_ACCESS_KEY') - AWS_STORAGE_BUCKET_NAME = os.getenv('AWS_STORAGE_BUCKET_NAME') AWS_S3_USE_SSL = FalseOption 2: Make line 605 conditional to preserve line 595's value if already set:
-AWS_STORAGE_BUCKET_NAME = os.environ.get('AWS_STORAGE_BUCKET_NAME') +if 'AWS_STORAGE_BUCKET_NAME' not in locals(): + AWS_STORAGE_BUCKET_NAME = os.environ.get('AWS_STORAGE_BUCKET_NAME')Option 1 is recommended for clarity.
README.md (1)
114-116: Add language identifier to the code block.The code block should specify
bashorshellas the language identifier for consistency with other code blocks in this README and proper syntax highlighting.🔎 Proposed fix
-``` +```bash ./scripts/manage load_fixtures</details> </blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used**: Repository UI **Review profile**: CHILL **Plan**: Pro <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 09a36f48e35983f2f591f5cd0468da87cc17426e and a83cab2b76d510f60648c21f4312a5a9ba386a8d. </details> <details> <summary>📒 Files selected for processing (6)</summary> * `README.md` * `src/django/api/admin.py` * `src/django/api/migrations/0194_create_and_populate_us_county_tigerline.py` * `src/django/api/models/us_county_tigerline.py` * `src/django/api/partner_fields/mit_living_wage_provider.py` * `src/django/oar/settings.py` </details> <details> <summary>🚧 Files skipped from review as they are similar to previous changes (1)</summary> * src/django/api/admin.py </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🧠 Learnings (3)</summary> <details> <summary>📚 Learning: 2025-11-18T09:54:27.458Z</summary>Learnt from: VadimKovalenkoSNF
Repo: opensupplyhub/open-supply-hub PR: 796
File: src/django/api/management/commands/backfill_isic_4_extended_fields.py:115-175
Timestamp: 2025-11-18T09:54:27.458Z
Learning: In the Open Supply Hub codebase, the facility.id field (the database primary key) is the same as the OS ID (the public identifier exposed via the API).**Applied to files:** - `src/django/api/partner_fields/mit_living_wage_provider.py` </details> <details> <summary>📚 Learning: 2025-01-23T11:09:14.873Z</summary>Learnt from: VadimKovalenkoSNF
Repo: opensupplyhub/open-supply-hub PR: 470
File: src/react/src/util/util.js:1435-1466
Timestamp: 2025-01-23T11:09:14.873Z
Learning: Error validation and throwing errors should be handled at higher levels of the application rather than in utility functions, as per team's preference.**Applied to files:** - `src/django/api/partner_fields/mit_living_wage_provider.py` </details> <details> <summary>📚 Learning: 2024-12-13T13:20:48.143Z</summary>Learnt from: mazursasha1990
Repo: opensupplyhub/open-supply-hub PR: 438
File: src/django/api/views/v1/moderation_events.py:140-142
Timestamp: 2024-12-13T13:20:48.143Z
Learning: In the Open Supply Hub codebase, it's acceptable to catch broad exceptions in methods likeadd_production_locationandupdate_production_locationinsrc/django/api/views/v1/moderation_events.pyinstead of catching specific exceptions.**Applied to files:** - `src/django/api/partner_fields/mit_living_wage_provider.py` </details> </details><details> <summary>🧬 Code graph analysis (2)</summary> <details> <summary>src/django/api/migrations/0194_create_and_populate_us_county_tigerline.py (1)</summary><blockquote> <details> <summary>src/django/api/migrations/0195_add_mit_livingwage_partner_field.py (1)</summary> * `Migration` (49-60) </details> </blockquote></details> <details> <summary>src/django/api/partner_fields/mit_living_wage_provider.py (2)</summary><blockquote> <details> <summary>src/django/api/models/us_county_tigerline.py (1)</summary> * `USCountyTigerline` (5-31) </details> <details> <summary>src/django/api/partner_fields/base_provider.py (1)</summary> * `SystemPartnerFieldProvider` (12-107) </details> </blockquote></details> </details><details> <summary>🪛 markdownlint-cli2 (0.18.1)</summary> <details> <summary>README.md</summary> 114-114: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> <details> <summary>🪛 Ruff (0.14.10)</summary> <details> <summary>src/django/api/migrations/0194_create_and_populate_us_county_tigerline.py</summary> 46-49: Avoid specifying long messages outside the exception class (TRY003) --- 65-65: Unused function argument: `schema_editor` (ARG001) --- 82-85: Create your own exception (TRY002) --- 82-85: Avoid specifying long messages outside the exception class (TRY003) --- 101-103: Avoid specifying long messages outside the exception class (TRY003) --- 130-130: Unused function argument: `schema_editor` (ARG001) --- 140-142: Mutable class attributes should be annotated with `typing.ClassVar` (RUF012) --- 144-173: Mutable class attributes should be annotated with `typing.ClassVar` (RUF012) </details> <details> <summary>src/django/api/partner_fields/mit_living_wage_provider.py</summary> 50-50: Do not catch blind exception: `Exception` (BLE001) </details> </details> </details> <details> <summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)</summary> * GitHub Check: get-base-branch-countries-cov * GitHub Check: run-integration-test-code-quality * GitHub Check: run-flake8-linter * GitHub Check: run-eslint-linter-and-prettier-formatter * GitHub Check: get-base-branch-fe-cov * GitHub Check: run-countries-code-quality * GitHub Check: run-django-code-quality * GitHub Check: get-base-branch-dd-cov * GitHub Check: run-dd-code-quality * GitHub Check: run-fe-code-quality * GitHub Check: get-base-branch-django-cov * GitHub Check: run-contricleaner-code-quality * GitHub Check: get-base-branch-contricleaner-cov </details> <details> <summary>🔇 Additional comments (6)</summary><blockquote> <details> <summary>src/django/oar/settings.py (2)</summary><blockquote> `15-15`: **LGTM: Essential imports for MinIO support.** The `sys` import and `AWS_S3_ENDPOINT_URL` environment variable retrieval are necessary for the new MinIO/S3 endpoint logic. Also applies to: 590-590 --- `602-603`: **LGTM: Conditional S3 storage enables MinIO in local development.** The modified condition correctly activates S3 storage for production environments and for local development when MinIO is configured (via `AWS_S3_ENDPOINT_URL`), while excluding test runs. This aligns with the PR's objective to support MinIO for local S3-compatible storage. </blockquote></details> <details> <summary>src/django/api/models/us_county_tigerline.py (1)</summary><blockquote> `1-31`: **LGTM! Well-structured GIS model.** The model definition is clean and appropriate for storing US county boundary data. The choice of SRID 5070 (Albers Equal Area Conic) is correct for US-focused geospatial analysis, and the field definitions follow Django conventions. </blockquote></details> <details> <summary>src/django/api/partner_fields/mit_living_wage_provider.py (1)</summary><blockquote> `1-80`: **LGTM! Solid provider implementation with proper GIS handling.** The provider correctly implements the system partner field pattern with appropriate country validation, GIS transformations (WGS84 → Albers), and error handling. The broad exception catch at line 50 is acceptable given that GIS operations can fail in various ways, and the error is properly logged with context. </blockquote></details> <details> <summary>src/django/api/migrations/0194_create_and_populate_us_county_tigerline.py (2)</summary><blockquote> `65-127`: **Well-implemented data migration with robust error handling.** The migration correctly handles: - Environment-aware behavior (graceful failure in local, strict in deployed environments) - Batch processing for performance - Geometry validation and type normalization (Polygon → MultiPolygon) - MinIO support for local development The unused `schema_editor` parameter (flagged by static analysis) is standard for Django migration functions and can be ignored. --- `164-167`: **No action required** — the migration's `verbose_name_plural = 'US Counties Boundaries'` correctly matches the model definition. </blockquote></details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/django/api/migrations/0194_create_and_populate_us_county_tigerline.py (2)
14-14: Move CSV field size limit setting into the function that needs it.Setting
csv.field_size_limit(sys.maxsize)at module level creates a global side effect that persists beyond this migration and could affect other parts of the application that use thecsvmodule.🔎 Proposed fix
Move the field size limit configuration into the
get_csv_reader()function:-csv.field_size_limit(sys.maxsize) - S3_CSV_KEY = 'data/us_county_tigerline_2025.csv' def get_s3_client():And in
get_csv_reader():def get_csv_reader(): ''' Get CSV reader from S3 (or MinIO for local development). Returns CSV DictReader. ''' + csv.field_size_limit(sys.maxsize) csv_content = download_csv_from_s3() return csv.DictReader(io.StringIO(csv_content))
65-86: DJANGO_ENV is consistently configured in Terraform deployments; the review comment's primary concern is already addressed.The
DJANGO_ENVenvironment variable is properly set across deployment stages via Terraform variables (with a default of "Staging" in variables.tf) and is explicitly set in local development (docker-compose.yml). The migration function's fallback to "Local" provides a safety mechanism for local environments, but silent failures are prevented in deployed environments because the variable is injected through task definitions. However, the underlying concern remains valid: if the variable is ever unset outside the standard Terraform deployment process, the code will default to "Local" and silently skip the migration. Consider documenting this dependency or validating the variable is set before executing the migration function.scripts/reset_database (1)
16-50: Consider optimizing file transfer to MinIO.The current approach copies files to an intermediate directory inside the MinIO container (
/fixturesand/temp) before usingmc cpto transfer them to the bucket. This works but adds unnecessary steps.💡 Alternative approach using mc cp with host paths
If the MinIO client (
mc) supports copying from mounted volumes or if you can mount the source directories, you could eliminate the intermediate copy steps. However, the current approach is functional and may be intentionally designed for container isolation, so this is just an optional optimization.Example alternative (if applicable):
# Instead of copying to /fixtures first, mount source and copy directly docker compose exec minio bash \ -c "mc cp --recursive /path/to/mounted/fixtures $MINIO_ALIAS/$MINIO_BUCKET/api/"Only implement if it simplifies the script without compromising reliability.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
scripts/reset_databasescripts/start_local_devsrc/django/api/migrations/0194_create_and_populate_us_county_tigerline.pysrc/django/api/models/us_county_tigerline.py
🚧 Files skipped from review as they are similar to previous changes (2)
- src/django/api/models/us_county_tigerline.py
- scripts/start_local_dev
🧰 Additional context used
🪛 Ruff (0.14.10)
src/django/api/migrations/0194_create_and_populate_us_county_tigerline.py
46-49: Avoid specifying long messages outside the exception class
(TRY003)
65-65: Unused function argument: schema_editor
(ARG001)
82-85: Create your own exception
(TRY002)
82-85: Avoid specifying long messages outside the exception class
(TRY003)
101-103: Avoid specifying long messages outside the exception class
(TRY003)
130-130: Unused function argument: schema_editor
(ARG001)
140-142: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
144-173: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: run-countries-code-quality
- GitHub Check: run-eslint-linter-and-prettier-formatter
- GitHub Check: run-contricleaner-code-quality
- GitHub Check: run-flake8-linter
- GitHub Check: run-integration-test-code-quality
- GitHub Check: get-base-branch-dd-cov
- GitHub Check: get-base-branch-contricleaner-cov
- GitHub Check: get-base-branch-django-cov
- GitHub Check: run-fe-code-quality
- GitHub Check: run-dd-code-quality
- GitHub Check: run-django-code-quality
- GitHub Check: get-base-branch-countries-cov
- GitHub Check: get-base-branch-fe-cov
🔇 Additional comments (7)
src/django/api/migrations/0194_create_and_populate_us_county_tigerline.py (5)
19-36: LGTM: S3 client configuration handles both MinIO and AWS correctly.The function properly differentiates between local MinIO (with endpoint URL) and production AWS S3, and correctly passes credentials for MinIO while relying on IAM roles/default credentials for AWS.
39-53: LGTM: CSV download logic is clear and includes proper error handling.The function validates the bucket name configuration and provides a clear error message if missing.
88-127: LGTM: Data processing logic is robust and efficient.The batch processing (2000 records), geometry validation, Polygon-to-MultiPolygon conversion, and bulk insertion logic are all correctly implemented.
130-135: LGTM: Reverse migration properly cleans up data.The cleanup function correctly deletes all records from the table.
164-167: No action needed — migration and model verbose names are already consistent.Both the migration and the model define
verbose_name = 'County (US)'andverbose_name_plural = 'Counties (US)'. The verbose names match exactly between the two files, so no consistency issues exist.Likely an incorrect or invalid review comment.
scripts/reset_database (2)
40-49: LGTM: Conditional CSV file handling is appropriate.The conditional check ensures the script doesn't fail if the CSV file is not present locally, which aligns with the migration's graceful handling of missing data in local environments.
56-56: Verify setup_minio is idempotent for repeated script execution.The
setup_miniofunction is called every time the database is reset. Confirm that repeated executions don't cause issues, such as duplicate uploads or stale data in MinIO.The function uses
--ignore-existingfor bucket creation (line 26), which is good. However, themc cp --recursivecommands (lines 34, 46) might overwrite existing files. Verify this behavior is intentional and doesn't cause issues during development workflows where the script might be run multiple times.
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/django/oar/settings.py (1)
604-609: Bucket name validation doesn't cover DEBUG mode with MinIO.Line 601 now enables S3 storage in DEBUG mode when
AWS_S3_ENDPOINT_URLis set, but the validation at lines 606-609 only checks bucket name whennot DEBUG. IfDEBUG=True,AWS_S3_ENDPOINT_URLis set, andAWS_STORAGE_BUCKET_NAMEis missing, S3 operations will fail at runtime with unclear errors.🔎 Proposed fix to validate bucket name in all S3 scenarios
AWS_STORAGE_BUCKET_NAME = os.environ.get('AWS_STORAGE_BUCKET_NAME') -if AWS_STORAGE_BUCKET_NAME is None and not DEBUG: +# Validate bucket name whenever S3 storage is actually being used +using_s3_storage = not DEBUG or (AWS_S3_ENDPOINT_URL and not TESTING) +if AWS_STORAGE_BUCKET_NAME is None and using_s3_storage: raise ImproperlyConfigured( - 'Invalid AWS_STORAGE_BUCKET_NAME provided, must be set in the environment' + 'AWS_STORAGE_BUCKET_NAME must be set when S3 storage is enabled' )
♻️ Duplicate comments (1)
src/django/oar/settings.py (1)
592-594: Validate MinIO/S3 credentials when AWS_S3_ENDPOINT_URL is set.The conditional block loads
AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEYfrom environment variables but performs no validation. If either credential is missing, the application will fail later during S3 operations with cryptic boto3 errors rather than a clear startup error.This was flagged in previous reviews and remains unresolved.
🔎 Proposed fix to validate credentials
if DEBUG and AWS_S3_ENDPOINT_URL: AWS_ACCESS_KEY_ID = os.getenv('AWS_ACCESS_KEY_ID') AWS_SECRET_ACCESS_KEY = os.getenv('AWS_SECRET_ACCESS_KEY') + if not AWS_ACCESS_KEY_ID or not AWS_SECRET_ACCESS_KEY: + raise ImproperlyConfigured( + 'AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY must be set when ' + 'AWS_S3_ENDPOINT_URL is configured in DEBUG mode' + ) AWS_S3_USE_SSL = False
🧹 Nitpick comments (1)
src/django/oar/settings.py (1)
599-599: Consider more robust test detection.Checking
'test' in sys.argvworks for standard Django test commands but may not detect tests run via pytest, IDE test runners, or custom invocations. Consider using an environment variable for more reliable detection:TESTING = os.getenv('TESTING', 'false').lower() == 'true' or 'test' in sys.argvThen set
TESTING=truein your test environment configuration.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/django/oar/settings.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: run-eslint-linter-and-prettier-formatter
- GitHub Check: run-countries-code-quality
- GitHub Check: run-flake8-linter
- GitHub Check: run-dd-code-quality
- GitHub Check: get-base-branch-contricleaner-cov
- GitHub Check: run-fe-code-quality
- GitHub Check: run-contricleaner-code-quality
- GitHub Check: run-integration-test-code-quality
- GitHub Check: get-base-branch-countries-cov
- GitHub Check: get-base-branch-django-cov
- GitHub Check: get-base-branch-dd-cov
- GitHub Check: get-base-branch-fe-cov
- GitHub Check: run-django-code-quality
🔇 Additional comments (2)
src/django/oar/settings.py (2)
15-15: LGTM!The
sysimport is necessary for detecting test mode at line 599.
601-602: LGTM with validation gap noted below.The condition correctly enables S3 storage for MinIO in DEBUG mode when
AWS_S3_ENDPOINT_URLis set (excluding test runs). However, the bucket name validation at lines 606-609 doesn't account for this case — see comment below.



OSDEV-2304 Upload MIT Living Wage data to the platform
0194_create_and_populate_us_county_tigerline.py- This migration creates theUSCountyTigerlinemodel to store US county boundary data with geometry in EPSG:5070 (Albers Equal Area projection). The model includes fields for geoid (primary key), county name, and MultiPolygon geometry. The migration attempts to populate the table with data from a CSV file stored in S3 (or MinIO for local development). If the CSV file is not available, the migration completes successfully with an empty table, allowing data to be populated later. The migration supports downloading the CSV file from S3 buckets with MinIO endpoint configuration for local development environments.0195_add_mit_livingwage_partner_field.py- This data migration creates themit_living_wagesystem partner field with typeobjectand a JSON schema defining a single property for county ID with uri-reference format. The field is marked assystem_field=Trueandactive=True, and includes a base URL pointing to the MIT Living Wage website and display text for the link. This partner field enables automatic display of MIT Living Wage data links on US production location profiles based on the facility's geographic location.USCountyTigerlinemodel was added to store US county boundary geometries for geographic lookups, and amit_living_wagesystem partner field was created to display MIT Living Wage data links for US production locations based on their county location. The migration supports downloading CSV files from S3 buckets (with MinIO support for local development) and gracefully handles missing files by skipping data population without failing the migration.GET /api/facilities/{os_id}endpoint to support MIT Living Wage system-generated partner fields. For US production locations, themit_living_wagefield is included when a contributor is assigned, automatically providing links to MIT Living Wage data based on the facility's geographic location (county). The field is populated dynamically through a provider registry pattern and follows the same structure as user-contributed partner fields.