Skip to content

Conversation

@haiqi96
Copy link
Contributor

@haiqi96 haiqi96 commented Nov 19, 2024

Description

The existing implementation of webui only supports viewing unstructured logs. This PR supports viewing JSON logs in the webui.

With refactor in #584, JSON and IR log viewing share a large portion of common code, except two differences:

  1. The streamId between Json and IR are different. IR's stream id is orig_file_id, whereas JSON's streamId is archive_id.
  2. The job_config and Job type to be stored in the database are different.

This PR replaces the original IR extraction methods with stream extraction methods which can support both JSON and IR viewing. In addition to the original parameters, they take in an extra "JobType" to distinguish if the job is targeting JSON or IR.

Also fixes #608..

Validation performed

Tested package locally and verified that both CLP and CLP-S flow works

Summary by CodeRabbit

Release Notes

  • New Features

    • Updated job submission process to support extraction of both CLP IR and JSON line files.
    • Enhanced QueryStatus component with new job type handling and improved error messages.
    • Introduced new constants and functions in SearchResultsTable for better handling of storage types and URL generation.
    • Added configuration options for stream output size in the log viewer web UI.
    • New parameter for stream target uncompressed size added to settings.
  • Bug Fixes

    • Improved error handling for unsupported job types in query routes.
  • Documentation

    • Updated metadata handling and response structures to reflect new extraction types.
    • Clarified configuration comments related to stream file sizes.

Copy link
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes in DbManager looks good.

Copy link
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good except one suggestion in a method signature.

Comment on lines 24 to 27
if (null === streamId) {
resp.code(StatusCodes.BAD_REQUEST);
throw new Error("streamId must not be null");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Except that we do not need to check !streamId because typeof streamId !== "string" should have handled that.

Comment on lines 121 to 126
async submitAndWaitForExtractStreamJob (
jobType,
logEventIdx,
streamId,
targetUncompressedSize
) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When there are multiple parameters to a single function, it's be better to use the object destructuring syntax:

Suggested change
async submitAndWaitForExtractStreamJob (
jobType,
logEventIdx,
streamId,
targetUncompressedSize
) {
async submitAndWaitForExtractStreamJob ({
jobType,
logEventIdx,
streamId,
targetUncompressedSize
}) {

The JsDoc can be updated as:

     * @param {object} props
     * @param {number} props.jobType
     * @param {number} props.logEventIdx
     * @param {string} props.streamId
     * @param {number} props.targetUncompressedSize

Then the method call can be modified as:

            const extractResult = await fastify.dbManager.submitAndWaitForExtractStreamJob({
                jobType: extractJobType,
                logEventIdx: sanitizedLogEventIdx,
                streamId: streamId,
                targetUncompressedSize: settings.StreamTargetUncompressedSize,
            });

which looks more readable. What do you think?

Comment on lines 271 to 272
export {EXTRACT_JOB_TYPES};
export {QUERY_JOB_TYPE};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export {EXTRACT_JOB_TYPES};
export {QUERY_JOB_TYPE};
export {
EXTRACT_JOB_TYPES,
QUERY_JOB_TYPE,
};

Comment on lines 24 to 27
if (null === streamId) {
resp.code(StatusCodes.BAD_REQUEST);
throw new Error("streamId must not be null");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (null === streamId) {
resp.code(StatusCodes.BAD_REQUEST);
throw new Error("streamId must not be null");
}
if (typeof streamId !== "string" || 0 === streamId.trim().length) {
resp.code(StatusCodes.BAD_REQUEST);
throw new Error(`"streamId" must be a non-empty string.`);
}

Comment on lines 19 to 22
if (false === EXTRACT_JOB_TYPES.includes(extractJobType)) {
resp.code(StatusCodes.BAD_REQUEST);
throw new Error("Invalid extractJobType");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (false === EXTRACT_JOB_TYPES.includes(extractJobType)) {
resp.code(StatusCodes.BAD_REQUEST);
throw new Error("Invalid extractJobType");
}
if (false === EXTRACT_JOB_TYPES.includes(extractJobType)) {
resp.code(StatusCodes.BAD_REQUEST);
throw new Error(`Invalid extractJobType="${extractJobType}".`);
}

fastify.post("/query/extract-ir", async (req, resp) => {
const {origFileId, logEventIdx} = req.body;
fastify.post("/query/extract-stream", async (req, resp) => {
const {extractJobType, logEventIdx, streamId} = req.body;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the parameter validation checks directly below this line.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (5)
components/log-viewer-webui/server/src/routes/query.js (3)

Line range hint 6-14: Enhance route documentation

The JSDoc comment should include details about request parameters and possible error responses.

 /**
  * Creates query routes.
  *
+ * @param {Object} req.body
+ * @param {string} req.body.extractJobType - The type of extraction job (must be in EXTRACT_JOB_TYPES)
+ * @param {string} req.body.streamId - The ID of the stream to extract
+ * @param {number} req.body.logEventIdx - The index of the log event
+ * @throws {Error} When parameters are invalid or extraction fails
  * @param {import("fastify").FastifyInstance | {dbManager: DbManager}} fastify
  * @param {import("fastify").FastifyPluginOptions} options
  * @return {Promise<void>}
  */

42-44: Add logging for extraction failures

Consider logging extraction failures to help with debugging and monitoring.

+                fastify.log.error(`Stream extraction failed for streamId=${streamId} at logEventIdx=${sanitizedLogEventIdx}`);
                 resp.code(StatusCodes.BAD_REQUEST);
                 throw new Error("Unable to extract stream with " +
                     `streamId=${streamId} at logEventIdx=${sanitizedLogEventIdx}`);

59-59: Consider using a consistent response structure

For consistency across endpoints, consider wrapping the response in a standard structure.

-        return streamMetadata;
+        return {
+            success: true,
+            data: streamMetadata
+        };
components/log-viewer-webui/server/src/DbManager.js (2)

69-75: Enhance constant documentation

Consider adding more descriptive JSDoc comments to explain the purpose and usage of EXTRACT_JOB_TYPES. For example:

 /**
- * List of valid extract job types.
+ * An immutable array containing valid job types for stream extraction operations.
+ * Used for validation and type checking in stream extraction related operations.
+ * @type {ReadonlyArray<number>}
  */

172-177: Track technical debt: field naming in MongoDB

As discussed in previous reviews, orig_file_id is being used for both IR and JSON streams. Consider creating a technical debt ticket to standardize this field to stream_id in a future release.

Would you like me to create a GitHub issue to track this technical debt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 66e3466 and a897682.

📒 Files selected for processing (2)
  • components/log-viewer-webui/server/src/DbManager.js (4 hunks)
  • components/log-viewer-webui/server/src/routes/query.js (2 hunks)
🔇 Additional comments (4)
components/log-viewer-webui/server/src/routes/query.js (2)

15-60: Implementation aligns well with requirements

The route handler successfully implements the stream extraction functionality with proper validation and error handling. The code is well-structured and maintainable.


3-3: Verify JSON import assertion compatibility

The JSON import assertion syntax (with {type: "json"}) is a newer feature. Ensure your deployment environment supports this syntax or consider providing a fallback.

✅ Verification successful

JSON import assertions are properly supported

The project's configuration in log-viewer-webui-server/package.json explicitly supports JSON import assertions through the @babel/plugin-syntax-import-attributes plugin, and the project is configured as an ES module with "type": "module". The syntax is safe to use in this environment.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check package.json for Node.js version requirements
rg -A 5 '"engines"' package.json

Length of output: 87


Script:

#!/bin/bash
# Search for package.json files in the repository
fd -t f "package.json" -x cat {}

Length of output: 5779

components/log-viewer-webui/server/src/DbManager.js (2)

65-65: LGTM: Enum value addition follows established pattern

The addition of EXTRACT_JSON to QUERY_JOB_TYPE maintains consistency with the existing enum structure.


271-274: LGTM: Exports follow best practices

The addition of named exports for EXTRACT_JOB_TYPES and QUERY_JOB_TYPE is appropriate and follows JavaScript best practices.


let irMetadata = await fastify.dbManager.getExtractedIrFileMetadata(
origFileId,
const sanitizedLogEventIdx = Number(logEventIdx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add validation for logEventIdx parameter

The logEventIdx parameter should be validated before conversion to ensure it's defined.

+        if (undefined === logEventIdx) {
+            resp.code(StatusCodes.BAD_REQUEST);
+            throw new Error("\"logEventIdx\" must be provided.");
+        }
         const sanitizedLogEventIdx = Number(logEventIdx);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const sanitizedLogEventIdx = Number(logEventIdx);
if (undefined === logEventIdx) {
resp.code(StatusCodes.BAD_REQUEST);
throw new Error("\"logEventIdx\" must be provided.");
}
const sanitizedLogEventIdx = Number(logEventIdx);

Comment on lines +121 to +140
async submitAndWaitForExtractStreamJob ({
jobType,
logEventIdx,
streamId,
targetUncompressedSize,
}) {
let jobConfig;
if (QUERY_JOB_TYPE.EXTRACT_IR === jobType) {
jobConfig = {
file_split_id: null,
msg_ix: logEventIdx,
orig_file_id: streamId,
target_uncompressed_size: targetUncompressedSize,
};
} else if (QUERY_JOB_TYPE.EXTRACT_JSON === jobType) {
jobConfig = {
archive_id: streamId,
target_chunk_size: targetUncompressedSize,
};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add validation for job type parameter

The method should validate that the provided jobType is one of the valid extract job types defined in EXTRACT_JOB_TYPES.

 async submitAndWaitForExtractStreamJob ({
     jobType,
     logEventIdx,
     streamId,
     targetUncompressedSize,
 }) {
+    if (!EXTRACT_JOB_TYPES.includes(jobType)) {
+        throw new Error(`Invalid job type: ${jobType}. Expected one of: ${EXTRACT_JOB_TYPES}`);
+    }
     let jobConfig;
     if (QUERY_JOB_TYPE.EXTRACT_IR === jobType) {

Committable suggestion skipped: line range outside the PR's diff.

@haiqi96 haiqi96 requested a review from junhaoliao November 21, 2024 21:36
Copy link
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR tile, how about

feat(webui): Support viewing search results in context for JSON logs (clp-json).

@haiqi96 haiqi96 changed the title feat(clp-package): Support JSON log viewing in webui feat(webui): Support viewing search results in context for JSON logs (clp-json). Nov 21, 2024
@haiqi96 haiqi96 merged commit 7d85ad3 into y-scope:main Nov 21, 2024
6 checks passed
jackluo923 pushed a commit to jackluo923/clp that referenced this pull request Dec 4, 2024
@haiqi96 haiqi96 deleted the webuiLogEventIdx branch December 6, 2024 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

clp-config: The default value of stream_output.directory in clp_config.py does not match the one in clp-config.yml.

3 participants