-
Notifications
You must be signed in to change notification settings - Fork 518
[Check Point] Process the packets field in SecureXL format #16235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🚀 Benchmarks reportTo see the full report comment with |
💚 Build Succeeded
cc @ilyannn |
P1llus
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any sample data (anonymized) we can add to the pipeline test?
| field: checkpoint.subs_exp | ||
| ignore_missing: true | ||
| - script: | ||
| tag: script_parse_checkpoint_packets_dropped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be preferred if we could use a common processor to parse this out, as script regex is quite heavy, was this the only way? looking at the data I guess it does seem like that, but its quite troublesome for performance reasons
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I welcome any ideas 😄. I tried split + grok but couldn't make it work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah once I noticed it was meant to produce an array then any other processor mostly goes out the window with the exception of maybe foreach, but that just introduces many other issues, so script was the way to go, I was just hoping we could have managed to skip the regex part, as in theory the data is structured in some ways.
| description: | | ||
| Amount of packets dropped. | ||
| - name: packets_dropped | ||
| type: nested |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just leaving it as a note rather than something that needs to be resolved, nested types are something we do try to avoid as much as possible, but at this point I am unsure about any other better alternative
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's a tradeoff. With nested types we accurately show the underlying structure, so that they can do "find packets where source is X and destination is Y", but it does come at a cost. The alternative could be to use a group but then we'll only have a list of all sources and a list of all destinations in the event and the information about which source goes to which destination would be lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know the details, but my assumption was that there is no performance penalty as long as nobody is searching on that field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah nested fields don't induce a performance penalty but you cannot use aggregations on them, so any dashboard visualization etc wont work, the only things you can do is querying for example on the interface name of any of the objects
| - script: | ||
| tag: script_parse_checkpoint_packets_dropped | ||
| description: Parse packets field containing connection tuples into structured packets_dropped array. | ||
| if: ctx.checkpoint?.packets instanceof String && ctx.checkpoint.packets.contains('<') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit unsure but it could be startsWith is better performance wise, though if the only times this is a string is when it is the SecureXL format it might be okay.
If we had sample data we could see if there was an already parsed field that showcases that it's SecureXL type data so we wouldn't need to use either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I don't know if their tuple format is well-documented, for example if it's guaranteed that it does not start with a whitespace. The cleanest way is probably to try a conversion to number first and only run the script if it fails – I'll do that.
Proposed commit message
The specification at https://support.checkpoint.com/results/sk/sk144192 describes the
packetsfield in the Security Gateway - SecureXL Fields section as the tuple ofThe current pipeline, however, parses the
packetsfield as an integer, handling only the format described in the Security Gateway - Firewall Fields section. When this field comes as defined in the SecureXL section – a string describing the dropped packets – the current pipeline fails.In practice the tuple can also contain an additional member denoting the network interface.
This PR adds the parsing to the pipeline. The result is an array of json objects, each structured according to the ECS, that is:
We handle both the case when the interface name is present (as in practice) and when it's not (as per specification). The added group is defined as
nestedto preserve the relationships within each tuple.The parsing is implemented as a Painless script; alternative approaches were tried, but were not ultimately successful.
Screenshot
From the specification:
Checklist
changelog.ymlfile.[ ] I have verified that any added dashboard complies with Kibana's Dashboard good practicesAuthor's Checklist
How to test this PR locally
GenAI Note
Cursor with the Claude Opus 4.5 model was used in creating this PR's contents, especially the Painless script.