Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds Bedrock-backed streaming NLP extraction and a Python geocoder Lambda, local feature flags and docs, API proxy streaming support, CDK infra and IAM for Bedrock, build/deploy updates, tests, and dependency/script changes. ChangesNLP Search & Geocoding Feature
Sequence Diagram(s)sequenceDiagram
participant Client
participant APIGateway as API Proxy
participant NlpLambda as nlpSearch Lambda
participant Bedrock
participant GeocoderLambda
Client->>APIGateway: GET /nlp?query=...
APIGateway->>NlpLambda: invoke (streaming)
NlpLambda->>Bedrock: streamText (tools: reportFound, convertTemporal, lookupSpatial)
Bedrock-->>NlpLambda: stream progress & tool outputs
NlpLambda->>GeocoderLambda: Invoke (lookupSpatial) or HTTP dev service
GeocoderLambda-->>NlpLambda: polygon result
NlpLambda-->>APIGateway: stream response chunks + final JSON
APIGateway-->>Client: streamed HTTP response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Bundle Size ComparisonFull build detailsThe full bundle is smaller than main by -849.75 kB. 🎉 The index.js is smaller than main by -103.37 kB. 🎉 Run
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2053 +/- ##
==========================================
- Coverage 95.82% 95.77% -0.05%
==========================================
Files 753 754 +1
Lines 17431 17517 +86
Branches 4901 4912 +11
==========================================
+ Hits 16703 16777 +74
- Misses 677 688 +11
- Partials 51 52 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
bin/start.js (1)
70-75:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winKeep
defaultInputTargetaligned with the optional command count.Now that
optionalCommandscan include geocoder, branching only onsetupSqsno longer points stdin at the same default process whenUSE_GEOCODER=true.Suggested fix
- defaultInputTarget: setupSqs ? 3 : 1 + defaultInputTarget: optionalCommands.length + 1🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@bin/start.js` around lines 70 - 75, The hardcoded defaultInputTarget (setupSqs ? 3 : 1) is no longer correct when optionalCommands can include the geocoder; change it to compute the target from optionalCommands.length so stdin points to the same logical process regardless of optional commands. Replace the ternary with an expression like optionalCommands.length + (setupSqs ? 3 : 1) (adjust the numeric offsets to match your process ordering) so defaultInputTarget is derived from optionalCommands.length, referencing the optionalCommands, defaultInputTarget and setupSqs identifiers.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.env.example:
- Around line 16-18: Remove the duplicate AWS_ACCESS_KEY_ID and
AWS_SECRET_ACCESS_KEY entries so only a single pair of credential definitions
remains in the example (keeping the keys named AWS_ACCESS_KEY_ID and
AWS_SECRET_ACCESS_KEY), and convert the note about enabling geocoder/NLP into
commented lines above them so the example does not silently override real values
when copied.
In `@bin/api.mjs`:
- Around line 133-149: The current flow calls reply.send(responseStream) before
applying streaming metadata, so headers/status may be committed too late; change
the sequence in the handler logic to invoke the streaming handler
(handler(event, responseStream, {})) and wait for its returned metadata (or
consult streamMetadataByStream) and apply reply.code(metadata.statusCode) and
reply.headers(metadata.headers) before calling reply.send(responseStream); keep
the Promise.resolve(...).then(...) pattern but move the
reply.send(responseStream) to after the metadata application (and still handle
errors by logging and destroying responseStream in the .catch), referencing the
existing symbols handler, responseStream, streamMetadataByStream, reply.send and
reply.code/reply.headers to locate the code to reorder.
In `@bin/build-python.sh`:
- Around line 3-4: Guard the directory change before running pip: ensure the `cd
serverless/src/geocoder` command cannot fail silently by checking its exit and
aborting if it fails (so `pip install -r requirements.txt -t .` only runs when
the directory change succeeds). Update the script around the `cd
serverless/src/geocoder` and `pip install -r requirements.txt -t .` lines to
either use an explicit conditional check (fail with a clear message and non-zero
exit) or chain with a safe operator so the `pip` invocation never runs on the
wrong working directory.
In `@cdk/earthdata-search/lib/earthdata-search-functions.ts`:
- Line 359: The Lambda request timeout is set to 30s via the property timeout:
cdk.Duration.seconds(30) which is likely too short for cold-start work (S3
download, GeocodeIndexPlaceLookup build, Bedrock extraction); update the timeout
value on the Lambda construct to a higher value (e.g., cdk.Duration.seconds(60)
or cdk.Duration.seconds(120)) or align it with the upstream NLP lambda timeout,
and redeploy so cold-starts don’t cause end-to-end failures.
In `@package.json`:
- Line 30: The "start:geocoder" npm script doesn't pass the AWS region into the
geocoder container; update the docker run for the "start:geocoder" script to
forward the host's region environment variable (e.g., AWS_REGION or
AWS_DEFAULT_REGION) into the container by adding the appropriate -e flag(s) so
the geocoder's Bedrock client can pick up the region (reference the
"start:geocoder" script string in package.json and update the docker run
invocation that runs the geocoder-lambda image).
In `@serverless/src/geocoder/handler.py`:
- Around line 98-99: The temporary INFO-level logging of raw user queries is a
privacy/noise concern; update the logger call in handler.py (the
logger.info(...) that logs the variable query) to either remove it or demote it
to DEBUG (e.g., replace logger.info(f"Received geocoding query: {query}") with
logger.debug(...) or delete the line), and if keeping DEBUG-level logging,
optionally guard with logger.isEnabledFor(logging.DEBUG) to avoid unnecessary
interpolation when debug is off; ensure no raw queries are logged at INFO.
In `@serverless/src/geocoder/requirements.txt`:
- Around line 1-3: Update serverless/src/geocoder/requirements.txt to add a
urllib3 constraint that addresses the CVE by requiring the fixed release line;
specifically add or replace any urllib3 pin with "urllib3>=2.6.3,<3" so
transitive pulls from boto3/botocore (via boto3==1.36.1) will use a patched
urllib3 version and mitigate GHSA-38jv-5279-wg99 / CVE-2026-21441.
In `@serverless/src/nlpSearch/handler.js`:
- Around line 149-160: Remove or redact logging of raw user input and extracted
values: stop writing the raw query to logs (remove or sanitize the
console.log(`Received query: ${query}`) and the responseStream.write('Analyzing
your query...\n') if it exposes the query), and avoid printing the
extractedResults object or its fields (keyword, spatial, spatialArea, temporal);
instead log only non-sensitive status messages or hashed/filtered indicators.
Also remove or change the console.log that prints process.env.USE_NLP_SEARCH
along with any sensitive variables; if you need feature flags keep them but
ensure no user-derived content is included in logs from handler,
responseStream.write, or extractedResults.
- Around line 126-130: lookupSpatialToolExecute currently throws when
process.env.USE_GEOCODER !== 'true', which aborts extraction; change it to
degrade gracefully: remove the throw in lookupSpatialToolExecute and instead
log/warn, skip calling the geocoder, and return without setting spatialArea
(i.e., preserve the incoming spatial value or call setResults with spatialArea
omitted/null) so the rest of the extraction can continue; alternatively ensure
callers omit this tool from the tools list when USE_GEOCODER is false. Reference
symbols: lookupSpatialToolExecute, spatial, setResults,
process.env.USE_GEOCODER.
- Around line 145-148: The handler currently destructures query from
queryStringParameters without checking for its existence; update the handler
function to first verify event.queryStringParameters and that it contains a
non-empty query before destructuring. If missing, write a predictable error
response to responseStream (e.g., a 400 Bad Request JSON message) and return
early; otherwise proceed to extract query and continue normal processing. Ensure
you reference the handler entry point and use responseStream to emit the error
so the function never throws on malformed requests.
- Around line 69-72: The current code assumes every invoke succeeds and directly
returns responsePayload.body; instead validate the Lambda invoke before using
body by checking the invoke result (the response object from
lambdaClient.send(lambdaCommand)) for response.FunctionError or a non-200/202
StatusCode, ensure response.Payload exists, parse it safely into
responsePayload, and verify responsePayload.body is present and valid (e.g.,
non-null and correct type) before returning; if any check fails, throw or return
a descriptive error so the caller (e.g., spatialArea consumer) doesn't receive a
bad value.
- Around line 94-118: The code currently destructures only text from
generateText and then JSON.parse(text); instead, destructure the structured
output from generateText (e.g., const { text, output } = await
generateText(...)) and pass the schema-validated value to setResults('temporal',
output.value) (or output if your generateText returns the value directly) and
update the console.log to use that structured value; remove the JSON.parse(text)
call and rely on Output.object validation instead.
---
Outside diff comments:
In `@bin/start.js`:
- Around line 70-75: The hardcoded defaultInputTarget (setupSqs ? 3 : 1) is no
longer correct when optionalCommands can include the geocoder; change it to
compute the target from optionalCommands.length so stdin points to the same
logical process regardless of optional commands. Replace the ternary with an
expression like optionalCommands.length + (setupSqs ? 3 : 1) (adjust the numeric
offsets to match your process ordering) so defaultInputTarget is derived from
optionalCommands.length, referencing the optionalCommands, defaultInputTarget
and setupSqs identifiers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 48c9c4c7-d73e-45a2-a3df-5ec65d631c27
⛔ Files ignored due to path filters (4)
cdk/earthdata-search-infrastructure/package-lock.jsonis excluded by!**/package-lock.jsoncdk/earthdata-search-static/package-lock.jsonis excluded by!**/package-lock.jsoncdk/earthdata-search/package-lock.jsonis excluded by!**/package-lock.jsonpackage-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (18)
.env.example.gitignorebin/api.mjsbin/build-python.shbin/deploy-bamboo.shbin/start.jscdk/earthdata-search-infrastructure/lib/earthdata-search-infrastructure-stack.tscdk/earthdata-search-infrastructure/package.jsoncdk/earthdata-search-static/package.jsoncdk/earthdata-search/lib/earthdata-search-functions.tscdk/earthdata-search/lib/earthdata-search-stack.tscdk/earthdata-search/package.jsongeocoder.Dockerfilepackage.jsonserverless/src/geocoder/handler.pyserverless/src/geocoder/requirements.txtserverless/src/nlpSearch/__tests__/handler.test.jsserverless/src/nlpSearch/handler.js
| Promise.resolve(handler(event, responseStream, {})) | ||
| .then((streamResponseMetadata) => { | ||
| const metadata = streamMetadataByStream.get(responseStream) || streamResponseMetadata || {} | ||
|
|
||
| if (metadata?.statusCode) reply.code(metadata.statusCode) | ||
| if (metadata?.headers) reply.headers(metadata.headers) | ||
|
|
||
| if (metadata?.isBase64Encoded) { | ||
| request.log.warn('Streaming response marked as base64 encoded; proxy forwards raw stream chunks.') | ||
| } | ||
| }) | ||
| .catch((error) => { | ||
| request.log.error({ error }, 'Streaming lambda handler failed') | ||
| responseStream.destroy(error) | ||
| }) | ||
|
|
||
| return reply.send(responseStream) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Fastify v5, after calling reply.send(stream), can reply.code()orreply.headers() still change the HTTP status code or headers once the stream has started flowing?
💡 Result:
No. Once you call reply.send(stream) and the response has started streaming (i.e., headers have been sent and the payload is flowing), calling reply.code() or reply.headers()/reply.header() will not be able to reliably change the HTTP status code or headers. Fastify’s Reply API is designed such that .code() and .header(s) set status/headers as part of preparing the reply; .send(payload) is the operation that sends the payload. For stream payloads, Fastify treats the stream as “pre-serialized” and “will be sent unmodified”, and .send() is the function that actually sends the payload to the user [1]. Once the stream is being written to the underlying Node http.ServerResponse, Node won’t allow changing status/headers after they’ve already been sent. Practical guidance: set .code() and any .header(s) before reply.send(stream) (or via onSend/other hooks prior to the actual send). Use reply.sent to detect whether reply.send() has already happened [1].
Citations:
Ensure streaming metadata is applied before the response is sent (headers/status must be set pre-stream)
Fastify/Node won’t reliably allow setting HTTP status or headers after the streamed response has started flowing; reply.send(stream) commits headers, and calling reply.code() / reply.headers() later is too late for effect. Since bin/api.mjs applies statusCode/headers only in the .then(...) after reply.send(responseStream), the metadata can fail to take effect if the handler writes the first chunk before the promise resolves.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@bin/api.mjs` around lines 133 - 149, The current flow calls
reply.send(responseStream) before applying streaming metadata, so headers/status
may be committed too late; change the sequence in the handler logic to invoke
the streaming handler (handler(event, responseStream, {})) and wait for its
returned metadata (or consult streamMetadataByStream) and apply
reply.code(metadata.statusCode) and reply.headers(metadata.headers) before
calling reply.send(responseStream); keep the Promise.resolve(...).then(...)
pattern but move the reply.send(responseStream) to after the metadata
application (and still handle errors by logging and destroying responseStream in
the .catch), referencing the existing symbols handler, responseStream,
streamMetadataByStream, reply.send and reply.code/reply.headers to locate the
code to reorder.
| role: defaultLambdaConfig.role, | ||
| runtime: lambda.Runtime.PYTHON_3_13, | ||
| securityGroups: defaultLambdaConfig.securityGroups, | ||
| timeout: cdk.Duration.seconds(30), |
There was a problem hiding this comment.
30s timeout may be too short for cold-start initialization.
On a cold invocation the handler downloads the place cache from S3, builds GeocodeIndexPlaceLookup (the handler comment notes this is ~15s against OpenSearch when the cache is absent), and then runs the Bedrock extraction — all counted against this 30s request timeout. Since the NLP lambda invokes the geocoder synchronously (5-min timeout), a geocoder timeout fails the whole flow. Consider raising this (e.g. 60–120s) or confirm cold-start latency stays well under 30s.
♻️ Suggested adjustment
- timeout: cdk.Duration.seconds(30),
+ timeout: cdk.Duration.seconds(60),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| timeout: cdk.Duration.seconds(30), | |
| timeout: cdk.Duration.seconds(60), |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cdk/earthdata-search/lib/earthdata-search-functions.ts` at line 359, The
Lambda request timeout is set to 30s via the property timeout:
cdk.Duration.seconds(30) which is likely too short for cold-start work (S3
download, GeocodeIndexPlaceLookup build, Bedrock extraction); update the timeout
value on the Lambda construct to a higher value (e.g., cdk.Duration.seconds(60)
or cdk.Duration.seconds(120)) or align it with the upstream NLP lambda timeout,
and redeploy so cold-starts don’t cause end-to-end failures.
There was a problem hiding this comment.
The nlp lambda is invoked through api gateway, so it does have a 30s timeout.
And I've placed the cache file where it needs to be so we don't have that ~15s build time
There was a problem hiding this comment.
@macrouch, thanks for the clarification — both points make sense. API Gateway's ~29s max integration timeout caps the effective execution window anyway, so 30s is exactly right for the geocoder. And with the cache file pre-placed, the expensive GeocodeIndexPlaceLookup build path is bypassed on cold starts. My concern doesn't apply here.
| "start:app": "vite dev --port 8080", | ||
| "start": "dotenvx run node bin/start.js", | ||
| "start:optionals": "USE_IMAGE_CACHE=true SKIP_SQS=false ORDER_DELAY_SECONDS=1 npm start", | ||
| "start:geocoder": "docker build -f geocoder.Dockerfile -t geocoder-lambda . && docker run -p 4001:8080 -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY geocoder-lambda", |
There was a problem hiding this comment.
Pass the AWS region into the geocoder container.
The local geocoder container only receives access keys right now. That leaves the Bedrock client without a region unless the Dockerfile hardcodes one, which can break the documented local NLP/geocoder flow even when .env is configured correctly.
Suggested fix
- "start:geocoder": "docker build -f geocoder.Dockerfile -t geocoder-lambda . && docker run -p 4001:8080 -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY geocoder-lambda",
+ "start:geocoder": "docker build -f geocoder.Dockerfile -t geocoder-lambda . && docker run -p 4001:8080 -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY -e AWS_REGION=$AWS_REGION -e AWS_DEFAULT_REGION=$AWS_REGION geocoder-lambda",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "start:geocoder": "docker build -f geocoder.Dockerfile -t geocoder-lambda . && docker run -p 4001:8080 -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY geocoder-lambda", | |
| "start:geocoder": "docker build -f geocoder.Dockerfile -t geocoder-lambda . && docker run -p 4001:8080 -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY -e AWS_REGION=$AWS_REGION -e AWS_DEFAULT_REGION=$AWS_REGION geocoder-lambda", |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@package.json` at line 30, The "start:geocoder" npm script doesn't pass the
AWS region into the geocoder container; update the docker run for the
"start:geocoder" script to forward the host's region environment variable (e.g.,
AWS_REGION or AWS_DEFAULT_REGION) into the container by adding the appropriate
-e flag(s) so the geocoder's Bedrock client can pick up the region (reference
the "start:geocoder" script string in package.json and update the docker run
invocation that runs the geocoder-lambda image).
| responseStream.write('Analyzing your query...\n') | ||
| console.log(`Received query: ${query}`) | ||
|
|
||
| const extractedResults = { | ||
| keyword: null, | ||
| query, | ||
| spatial: null, | ||
| spatialArea: null, | ||
| temporal: null | ||
| } | ||
|
|
||
| console.log('🚀 ~ handler.js:156 ~ handler ~ process.env.USE_NLP_SEARCH:', process.env.USE_NLP_SEARCH) |
There was a problem hiding this comment.
Avoid logging raw user queries and extracted values.
Line 150 starts a pattern of writing unredacted search text into logs; the same file also logs raw temporal/spatial inputs and final extracted results. These queries can contain PII or sensitive research intent, so this should be redacted or removed before release.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@serverless/src/nlpSearch/handler.js` around lines 149 - 160, Remove or redact
logging of raw user input and extracted values: stop writing the raw query to
logs (remove or sanitize the console.log(`Received query: ${query}`) and the
responseStream.write('Analyzing your query...\n') if it exposes the query), and
avoid printing the extractedResults object or its fields (keyword, spatial,
spatialArea, temporal); instead log only non-sensitive status messages or
hashed/filtered indicators. Also remove or change the console.log that prints
process.env.USE_NLP_SEARCH along with any sensitive variables; if you need
feature flags keep them but ensure no user-derived content is included in logs
from handler, responseStream.write, or extractedResults.
| functionName: `${functionNamePrefix}-geocoder${defaultLambdaConfig.logGroupSuffix}`, | ||
| handler: 'handler.main', | ||
| logGroup: geocoderLogGroup, | ||
| memorySize: 512, |
| role: defaultLambdaConfig.role, | ||
| runtime: lambda.Runtime.PYTHON_3_13, | ||
| securityGroups: defaultLambdaConfig.securityGroups, | ||
| timeout: cdk.Duration.seconds(30), |
There was a problem hiding this comment.
The nlp lambda is invoked through api gateway, so it does have a 30s timeout.
And I've placed the cache file where it needs to be so we don't have that ~15s build time
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cdk/earthdata-search-infrastructure/lib/earthdata-search-infrastructure-stack.ts (1)
126-132:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRestore least-privilege scoping for Bedrock access.
Changing this statement to
Resource: '*'broadens every lambda usingEDSCLambdaBasefrom specific Bedrock model/profile ARNs to all Bedrock resources. That is a real permissions regression for a shared execution role.Suggested fix
{ Effect: 'Allow', Action: [ 'bedrock:InvokeModelWithResponseStream', 'bedrock:InvokeModel' ], - Resource: '*' + Resource: [ + 'arn:aws:bedrock:*::foundation-model/*', + 'arn:aws:bedrock:*::inference-profile/*' + ] }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cdk/earthdata-search-infrastructure/lib/earthdata-search-infrastructure-stack.ts` around lines 126 - 132, The IAM statement that grants bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream is currently using Resource: '*' which overbroadens every Lambda derived from EDSCLambdaBase; replace the wildcard with explicit Bedrock model and profile ARNs (or a supplied list of ARNs) so each role only permits the specific model/profile resources it needs. Locate the policy attachment in the EDSCLambdaBase-related construct (the statement containing Action: ['bedrock:InvokeModelWithResponseStream','bedrock:InvokeModel']) and change Resource from '*' to a scoped array of ARN strings (or hook into a construct/property that injects allowedModelArns) so the permission is least-privilege rather than global.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@bin/deploy-bamboo.sh`:
- Line 136: The deploy script currently expands bamboo_BEDROCK_MODEL_ID
unguarded (the -e "BEDROCK_MODEL_ID=$bamboo_BEDROCK_MODEL_ID" environment
argument) which fails under set -u when the Bamboo variable is unset; change the
deploy-bamboo.sh logic to make BEDROCK_MODEL_ID optional by either defaulting
the expansion to an empty string (e.g., use a parameter expansion fallback) or
only append the -e "BEDROCK_MODEL_ID=..." argument when bamboo_BEDROCK_MODEL_ID
is set; update the section that builds the docker run/env list where
BEDROCK_MODEL_ID is added so deployments don't abort when NLP search/Bedrock is
disabled.
In `@serverless/src/nlpSearch/handler.js`:
- Around line 18-19: There is an extra blank line in the nlpSearch handler file
causing the ESLint no-multiple-empty-lines error; open the file and remove the
superfluous empty line so there are no consecutive blank lines (leave at most
one), then save and re-run lint; look around the exported handler
(module.exports or the function named handler) to find and delete the extra
blank line.
- Around line 136-140: The handler currently unconditionally synthesizes a
Washington DC polygon by calling setResults('spatialArea', ...) when
process.env.USE_GEOCODER !== 'true'; remove that synthetic bounding box and
instead do not set spatialArea at all and return/emit a clear response
indicating geocoding is disabled (for example return { ok: false, reason:
'geocoding_disabled' } or set a flag via the existing result API and/or a log
entry). Locate the check using process.env.USE_GEOCODER and the call to
setResults('spatialArea', 'POLYGON(...)') in handler.js and replace the
synthetic setResults call with an explicit non-setting plus a clear disabled
response/log so downstream NLP logic knows geocoding was not performed.
---
Outside diff comments:
In
`@cdk/earthdata-search-infrastructure/lib/earthdata-search-infrastructure-stack.ts`:
- Around line 126-132: The IAM statement that grants bedrock:InvokeModel and
bedrock:InvokeModelWithResponseStream is currently using Resource: '*' which
overbroadens every Lambda derived from EDSCLambdaBase; replace the wildcard with
explicit Bedrock model and profile ARNs (or a supplied list of ARNs) so each
role only permits the specific model/profile resources it needs. Locate the
policy attachment in the EDSCLambdaBase-related construct (the statement
containing Action:
['bedrock:InvokeModelWithResponseStream','bedrock:InvokeModel']) and change
Resource from '*' to a scoped array of ARN strings (or hook into a
construct/property that injects allowedModelArns) so the permission is
least-privilege rather than global.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7e7ebc76-774b-4ab7-9cd4-270217040d1c
📒 Files selected for processing (12)
.env.exampleREADME.mdbin/build-python.shbin/deploy-bamboo.shcdk/earthdata-search-infrastructure/lib/earthdata-search-infrastructure-stack.tscdk/earthdata-search/lib/earthdata-search-functions.tscdk/earthdata-search/lib/earthdata-search-stack.tsgeocoder.Dockerfilepackage.jsonserverless/src/geocoder/requirements.txtserverless/src/nlpSearch/__tests__/handler.test.jsserverless/src/nlpSearch/handler.js
💤 Files with no reviewable changes (2)
- .env.example
- geocoder.Dockerfile
🚧 Files skipped from review as they are similar to previous changes (5)
- bin/build-python.sh
- cdk/earthdata-search/lib/earthdata-search-stack.ts
- package.json
- serverless/src/nlpSearch/tests/handler.test.js
- cdk/earthdata-search/lib/earthdata-search-functions.ts
Overview
What is the feature?
Adds NLP and geocoder lambda functions
What is the Solution?
The
nlpSearchlambda calls Bedrock to parse out the query, then calls various tools to determine the final return value. If spatial was included it will call the geocoder lambda function which will return a spatial bounding box in WKT polygon formatWhat areas of the application does this impact?
New lambda functions
Testing
In order to get the spatial values you will have to tunnel into CMR OpenSearch
Checklist
Summary by CodeRabbit
New Features
Chores
Documentation
Tests