Skip to content

Conversation

@ilyannn
Copy link
Contributor

@ilyannn ilyannn commented Dec 3, 2025

Proposed commit message

The specification at https://support.checkpoint.com/results/sk/sk144192 describes the packets field in the Security Gateway - SecureXL Fields section as the tuple of

  • Source IP address
  • Source Port
  • Destination IP address
  • Destination Port
  • Protocol Number

The current pipeline, however, parses the packets field as an integer, handling only the format described in the Security Gateway - Firewall Fields section. When this field comes as defined in the SecureXL section – a string describing the dropped packets – the current pipeline fails.

In practice the tuple can also contain an additional member denoting the network interface.

This PR adds the parsing to the pipeline. The result is an array of json objects, each structured according to the ECS, that is:

"packets_dropped" => [{
    "source": {
        "ip": ...
        "port": ...
    },
    "destination": {
        "ip": ...
        "port": ...
    },
    "network": {
        "iana_number": ...
    },
    "interface": {
        "name": ...
    }
}, ...]

We handle both the case when the interface name is present (as in practice) and when it's not (as per specification). The added group is defined as nested to preserve the relationships within each tuple.

The parsing is implemented as a Painless script; alternative approaches were tried, but were not ultimately successful.

Screenshot

From the specification:

CleanShot 2025-12-03 at 18 27 12@2x

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • [ ] I have verified that any added dashboard complies with Kibana's Dashboard good practices

Author's Checklist

  • Add test cases for the new functionality
  • Use the hashes on the processor's tag

How to test this PR locally

GenAI Note

Cursor with the Claude Opus 4.5 model was used in creating this PR's contents, especially the Painless script.

@ilyannn ilyannn self-assigned this Dec 3, 2025
@ilyannn ilyannn added enhancement New feature or request Integration:checkpoint Check Point Team:Integration-Experience Security Integrations Integration Experience [elastic/integration-experience] labels Dec 3, 2025
@ilyannn ilyannn changed the title [checkpoint] Add the processor for SecureXL fields [checkpoint] Add the processor for packets field in SecureXL format Dec 3, 2025
@ilyannn ilyannn changed the title [checkpoint] Add the processor for packets field in SecureXL format [checkpoint] Process the packets field in SecureXL format Dec 3, 2025
@ilyannn ilyannn changed the title [checkpoint] Process the packets field in SecureXL format [Check Point] Process the packets field in SecureXL format Dec 3, 2025
@elastic-vault-github-plugin-prod

🚀 Benchmarks report

To see the full report comment with /test benchmark fullreport

@elasticmachine
Copy link

💚 Build Succeeded

cc @ilyannn

Copy link
Member

@P1llus P1llus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any sample data (anonymized) we can add to the pipeline test?

field: checkpoint.subs_exp
ignore_missing: true
- script:
tag: script_parse_checkpoint_packets_dropped
Copy link
Member

@P1llus P1llus Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be preferred if we could use a common processor to parse this out, as script regex is quite heavy, was this the only way? looking at the data I guess it does seem like that, but its quite troublesome for performance reasons

Copy link
Contributor Author

@ilyannn ilyannn Dec 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I welcome any ideas 😄. I tried split + grok but couldn't make it work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah once I noticed it was meant to produce an array then any other processor mostly goes out the window with the exception of maybe foreach, but that just introduces many other issues, so script was the way to go, I was just hoping we could have managed to skip the regex part, as in theory the data is structured in some ways.

description: |
Amount of packets dropped.
- name: packets_dropped
type: nested
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving it as a note rather than something that needs to be resolved, nested types are something we do try to avoid as much as possible, but at this point I am unsure about any other better alternative

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's a tradeoff. With nested types we accurately show the underlying structure, so that they can do "find packets where source is X and destination is Y", but it does come at a cost. The alternative could be to use a group but then we'll only have a list of all sources and a list of all destinations in the event and the information about which source goes to which destination would be lost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the details, but my assumption was that there is no performance penalty as long as nobody is searching on that field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah nested fields don't induce a performance penalty but you cannot use aggregations on them, so any dashboard visualization etc wont work, the only things you can do is querying for example on the interface name of any of the objects

- script:
tag: script_parse_checkpoint_packets_dropped
description: Parse packets field containing connection tuples into structured packets_dropped array.
if: ctx.checkpoint?.packets instanceof String && ctx.checkpoint.packets.contains('<')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit unsure but it could be startsWith is better performance wise, though if the only times this is a string is when it is the SecureXL format it might be okay.
If we had sample data we could see if there was an already parsed field that showcases that it's SecureXL type data so we wouldn't need to use either

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I don't know if their tuple format is well-documented, for example if it's guaranteed that it does not start with a whitespace. The cleanest way is probably to try a conversion to number first and only run the script if it fails – I'll do that.

@andrewkroh andrewkroh added the documentation Improvements or additions to documentation. Applied to PRs that modify *.md files. label Dec 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation. Applied to PRs that modify *.md files. enhancement New feature or request Integration:checkpoint Check Point Team:Integration-Experience Security Integrations Integration Experience [elastic/integration-experience]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants