Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions docs/field_tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,12 @@ To generate test cases only for knowledge objects, append the following marker t

- Plugin gets the list of defined sourcetypes by parsing props.conf
- For each sourcetype, plugin generates an SPL search query and asserts event_count 0.

**Note:** Sourcetypes can be defined in two ways:
- **Direct stanzas**: `[sourcetype_name]` in props.conf
- **TRANSFORMS-defined**: Sourcetypes dynamically set via TRANSFORMS directives that reference transforms.conf entries with `FORMAT = sourcetype::<sourcetype_name>`

Both types of sourcetypes are automatically discovered and tested for event coverage.

**2. Fields mentioned under source/sourcetype should be extracted**

Expand Down Expand Up @@ -155,3 +161,25 @@ For every test case failure, there is a defined structure for the stack trace.
```

Get the search query from the stack trace and execute it on the Splunk instance and verify which specific type of events are causing failure.

## Performance Optimization

### Caching for pytest-xdist

When running tests with pytest-xdist (multiple workers), the plugin automatically caches parsed configuration files and generated test parameters to avoid redundant work across workers.

**What is cached:**
- Parsed configuration: props.conf, transforms.conf, tags.conf, eventtypes.conf, savedsearches.conf
- Generated test parameters for all fixtures

**How it works:**
- The first worker to request a cache key parses the data and saves it
- Other workers load from the shared cache instead of re-parsing
- Per-key locking prevents deadlocks when nested cache lookups occur
- Atomic writes with integrity hashing prevent cache corruption

**Cache files:**
- Location: `{temp_dir}/pytest-splunk-addon/{testrunuid}_parser_cache`
- Cleaned up at process exit by the first worker (gw0)

**Note:** Caching only activates when running under pytest-xdist. Single-worker execution parses files directly without caching overhead.
33 changes: 32 additions & 1 deletion docs/requirement_tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,46 @@
## Overview

- The tests are written with a purpose of testing the proper functioning of the fields extractions of the add-on.
- Requirement tests use XML sample files with embedded field expectations (`cim_fields`, `other_mappings`)

______________________________________________________________________

To generate only requirement tests, append the folowing marker to pytest command:
To generate only requirement tests, append the following marker to pytest command:

```console
-m splunk_requirements
```

## XML Sample File Structure for Requirement Tests

Requirement tests require XML format sample files with `requirement_test_sample = 1` in the conf file.

### Transport Node Usage

The `<transport>` node in XML samples has specific meaning for requirement tests:

```xml
<transport type="modinput" host="sample_host" source="test_source" sourcetype="test:sourcetype" />
```

| Attribute | Behavior in Requirement Tests |
|-----------|------------------------------|
| `type` | Used for syslog header stripping (if `type="syslog"`). NOT used for ingestion. |
| `host` | **Overrides** the host value that is recorded in both the ingested event metadata and the search query generated for that event. |
| `source` | **Overrides** the source value that is recorded in the ingested event metadata and the search query generated for that event. |
| `sourcetype` | Used in field extraction test searches (not ingestion) |

**Important:** The actual ingestion is still driven by the conf file's `input_type`, `sourcetype`, and other metadata settings, but any `<transport host>` or `<transport source>` values found in the XML are merged into the `SampleEvent` metadata before ingestion and before the search query is generated (see `pytest_splunk_addon/sample_generation/sample_stanza.py`). That means those XML overrides affect the payload submitted to Splunk and the constraints that the requirement tests evaluate, not just the search query.

### Scenarios

- **Requirement test sample with `<transport host/source>` overrides:**
The XML parser injects these override values into `SampleEvent.metadata`, the ingestor emits them in the indexed event, and the search generator reuses them so the test looks for the same host/source.
- **Field test sample with `<transport type="syslog">`:**
Only the syslog header stripping behavior is affected; the host/source/sourcetype are still driven by the conf stanza (`host_type`, `host`, `source`, `sourcetype_to_search`) and tokens.
- **Events without XML transport overrides:**
Ingestors rely entirely on stanza metadata; searches use `host`/`source`/`sourcetype_to_search` from the conf file or tokens.

## Test Scenarios

**1. Fields should be extracted as defined in the cim_fields and other_mappings.**
Expand Down
134 changes: 94 additions & 40 deletions docs/sample_generator.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,28 @@ Example live event:
```
</details>

### XML Transport Node vs. Conf File Settings

The XML `<transport>` node and `pytest-splunk-addon-data.conf` settings serve **different purposes**. Understanding this distinction is important to avoid confusion:

| Setting | Source | Purpose |
|---------|--------|---------|
| `<transport type="">` | XML file | **Field tests only** - Used for syslog header stripping in field extraction tests. NOT used for ingestion. |
| `<transport host="">` | XML file | **Requirement tests** - Overrides conf host for requirement test searches. |
| `<transport source="">` | XML file | **Requirement tests** - Overrides conf source for requirement test searches. |
| `<transport sourcetype="">` | XML file | **Field tests only** - Used in field extraction test searches. NOT used for ingestion. |
| `input_type` | Conf file | **Ingestion** - Controls how events are parsed and ingested into Splunk. |
| `host` | Conf file | **Ingestion** - Base host value for events. Defaults to sample file name. |
| `sourcetype` | Conf file | **Ingestion** - The sourcetype assigned when ingesting events. |
| `sourcetype_to_search` | Conf file | **All test searches** - The sourcetype used in search queries. |

**Key Points:**

1. The conf file's `input_type` **always** controls how events are ingested
2. The XML's `<transport type="">` only affects field test behavior (e.g., stripping syslog headers)
3. Use `sourcetype_to_search` in the conf file when your add-on transforms the sourcetype at index-time (via TRANSFORMS)
4. The XML's `host` and `source` attributes override conf values **only for requirement tests**

## pytest-splunk-addon-data.conf.spec

**Default Values**:
Expand Down Expand Up @@ -119,78 +141,110 @@ host_prefix = {{host_prefix}}
- Example1: \[sample_file.samples\] would collect samples from file sample_file.samples
- Example2: \[sample\_\*.samples\] would collect samples from both sample_file.samples and sample_sample.samples.

**sourcetype = <sourcetype\>**
---

- sourcetype to be assigned to the sample events
### Ingestion Settings

**source = <source\>**
These settings control how events are ingested into Splunk.

- source to be assigned to the sample events
- default value: pytest-splunk-addon:{\{input_type}}
**sourcetype = <sourcetype\>**

- **Purpose:** Sourcetype assigned when **ingesting** events into Splunk
- This is the sourcetype that Splunk receives at index-time
- If your add-on uses TRANSFORMS to change the sourcetype, this should be the **original** sourcetype before transformation

**sourcetype_to_search = <sourcetype\>**
**source = <source\>**

- The sourcetype used to search events
- This would be different then sourcetype= param in cases where TRANSFORMS is used to update the sourcetype index time.
- **Purpose:** Source assigned when **ingesting** events into Splunk
- default value: pytest-splunk-addon:{\{input_type}}

**host_type = plugin | event**
**input_type = modinput | scripted_input | syslog_tcp | file_monitor | windows_input | uf_file_monitor | default**

- This key determines if host is assigned from event or default host should be assigned by plugin.
- If the value is plugin, the plugin will generate host with format of "stanza\_\{count}" to uniquely identify the events.
- If the value is event, the host field should be provided for a token using "token.<n\>.field = host".
- **Purpose:** Controls how events are **parsed and ingested** into Splunk
- This determines how sample files are processed:
- `modinput`, `windows_input`: One event per line in the sample file
- `file_monitor`, `scripted_input`, `syslog_tcp`, `syslog_udp`, `default`: Entire file as single event (unless breaker is specified)
- The ingestion method is chosen to match how data flows in production for accurate index-time testing
- For example, if sourcetype "alert" is ingested through syslog in production, use `input_type=syslog_tcp`

**input_type = modinput | scripted_input | syslog_tcp | file_monitor | windows_input | uf_file_monitor | default**

- The input_type used in addon to ingest data of a sourcetype used in stanza.
- The way with which the sample data is ingested in Splunk depends on Splunk. The most similar ingesting approach is used for each input_type to get accurate index-time testing.
- In input_type=uf_file_monitor, universal forwarder will use file monitor to read event and then it will send data to indexer.
- For example, in an Add-on, a sourcetype "alert" is ingested through syslog in live environment, provide input_type=syslog_tcp.
> **_Note:_** This is different from the XML's `<transport type="">` which only affects field test behavior (syslog header stripping). The conf file's `input_type` always controls actual ingestion.

> **_warning:_** uf_file_monitor input_type will only work with splunk-type=docker.


**index = <index\>**

- The index used to ingest the data.
- The index must be configured beforehand.
- If the index is not available then the data will not get ingested into Splunk and a warning message will be printed.
- **Purpose:** The index where events are **ingested**
- The index must be configured beforehand
- If the index is not available, data will not be ingested and a warning will be printed
- Custom index is not supported for syslog_tcp or syslog_udp

**host = <host\>**

- **Purpose:** Base host value assigned when **ingesting** events
- If not specified, defaults to the sample file name
- When `host_type = plugin`, the plugin appends `_{count}` to make each event's host unique (e.g., `myhost_1`, `myhost_2`)
- Can be overridden per-event in XML samples via `<transport host="...">` for requirement tests

---

### Search Settings

These settings control how tests search for events in Splunk.

**sourcetype_to_search = <sourcetype\>**

- **Purpose:** The sourcetype used in **search queries** during tests
- Use this when your add-on transforms the sourcetype at index-time via TRANSFORMS
- Example: If you ingest with `sourcetype=raw:data` but TRANSFORMS changes it to `sourcetype=parsed:data`, set:
- `sourcetype = raw:data` (for ingestion)
- `sourcetype_to_search = parsed:data` (for searching)
- If not specified, defaults to the value of `sourcetype`

---

### Test Behavior Settings

These settings control test generation and execution.

**host_type = plugin | event**

- **Purpose:** Determines how the host field is assigned
- `plugin`: The plugin generates unique hosts with format "stanza\_\{count}" to identify events
- `event`: The host is extracted from a token using "token.<n\>.field = host"

**sample_count = <count\>**

- The no. of events present in the sample file.
- This parameter will be used to calculate the total number of events which will be generated from the sample file.
- If `input_type = modinput`, do not provide this parameter.
- **Purpose:** Number of events present in the sample file
- Used to calculate total events generated from the sample file
- If `input_type = modinput`, do not provide this parameter (each line is an event)

**requirement_test_sample = 1**

- This parameter is used to run requirement tests for the provided sample xml file
- only supported with the xml sample file
- **Purpose:** Enables requirement tests for XML sample files
- When set to 1, the plugin parses the XML format and runs requirement tests using `cim_fields` and `other_mappings`
- Only supported with XML sample files

**expected_event_count = <count\>**

- The no. of events this sample stanza should generate.
- The parameter will be used to test the line breaking in index-time tests.
- To calculate expected_event_count 2 parameters can be used. 1) Number of events in the sample file. 2) Number of values of replacementType=all tokens in the sample file. Both the parameters can be multiplied to get expected_event_count.
- For example, if sample contains 3 lines & a token has replacement_type=all and replacement has list of 2 values, then 6 events will be generated.
- This parameter is optional, if it is not provided by the user, it will be calculated automatically by the pytest-splunk-addon.
- **Purpose:** Expected number of events for index-time line-breaking tests
- Calculated as: (events in sample) × (values in replacementType=all tokens)
- For example, if sample has 3 lines and a token has `replacement_type=all` with 2 values, then 6 events are generated
- Optional - if not provided, calculated automatically

**timestamp_type = plugin | event**

- This key determines if \_time is assigned from event or default \_time should be assigned by plugin.
- The parameter will be used to test the time extraction in index-time tests.
- If value is plugin, the plugin will assign the time while ingesting the event.
- If value is event, that means the time will be extracted from event and therfore, there should be a token provided with token.<n\>.field = \_time.
- **Purpose:** Determines how \_time is assigned for index-time tests
- `plugin`: The plugin assigns timestamp during ingestion
- `event`: Timestamp is extracted from the event; requires a token with `token.<n\>.field = _time`

**breaker = <regex\>**

- The breaker is used to breakdown the sample file into multiple events, based on the regex provided.
- This parameter is optional. If it is not provided by the user, the events will be ingested into Splunk,
as per the *input_type* provided.
- **Purpose:** Regex pattern to split sample file into multiple events
- Optional - if not provided, events are parsed according to `input_type`

**host_prefix = <host_prefix\>**
- This param is used as an identification for the **host** field, for the events which are ingested using SC4S.

- **Purpose:** Prefix for host field identification when using SC4S ingestion

## Token replacement settings

Expand Down
59 changes: 52 additions & 7 deletions pytest_splunk_addon/addon_parser/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@
# -*- coding: utf-8 -*-
"""
The module provides the Add-on parsing mechanism. It can
parse the knowledge objects from an Add-on's configuration files
parse the knowledge objects from an Add-on's configuration files.

Supports: fields from props & transforms, tags, eventtypes
Supports: fields from props & transforms, tags, eventtypes, savedsearches
"""
import os
import re
Expand All @@ -30,6 +30,7 @@
from .tags_parser import TagsParser
from .eventtype_parser import EventTypeParser
from .savedsearches_parser import SavedSearchParser
from .parser_cache import ParserCache

LOGGER = logging.getLogger("pytest-splunk-addon")

Expand All @@ -49,29 +50,64 @@ def __init__(self, splunk_app_path):
self._tags_parser = None
self._eventtype_parser = None
self._savedsearch_parser = None
self._parser_cache = ParserCache()

@property
def props_parser(self):
if not self._props_parser:
self._props_parser = PropsParser(self.splunk_app_path)

def _parse_props():
parser = PropsParser(self.splunk_app_path)
return parser.props

props_data = self._parser_cache.get_or_parse(_parse_props, "props")
self._props_parser = PropsParser(
self.splunk_app_path, props_data=props_data
)
return self._props_parser

@property
def tags_parser(self):
if not self._tags_parser:
self._tags_parser = TagsParser(self.splunk_app_path)

def _parse_tags():
parser = TagsParser(self.splunk_app_path)
return parser.tags

tags_data = self._parser_cache.get_or_parse(_parse_tags, "tags")
self._tags_parser = TagsParser(self.splunk_app_path, tags_data=tags_data)
return self._tags_parser

@property
def eventtype_parser(self):
if not self._eventtype_parser:
self._eventtype_parser = EventTypeParser(self.splunk_app_path)

def _parse_eventtypes():
parser = EventTypeParser(self.splunk_app_path)
return parser.eventtypes

eventtypes_data = self._parser_cache.get_or_parse(
_parse_eventtypes, "eventtypes"
)
self._eventtype_parser = EventTypeParser(
self.splunk_app_path, eventtypes_data=eventtypes_data
)
return self._eventtype_parser

@property
def savedsearch_parser(self):
if not self._savedsearch_parser:
self._savedsearch_parser = SavedSearchParser(self.splunk_app_path)

def _parse_savedsearches():
parser = SavedSearchParser(self.splunk_app_path)
return parser.savedsearches

savedsearches_data = self._parser_cache.get_or_parse(
_parse_savedsearches, "savedsearches"
)
self._savedsearch_parser = SavedSearchParser(
self.splunk_app_path, savedsearches_data=savedsearches_data
)
return self._savedsearch_parser

def get_props_fields(self):
Expand All @@ -81,7 +117,16 @@ def get_props_fields(self):
Yields:
generator of all the supported fields
"""
return self.props_parser.get_props_fields()

def _parse_props_fields():
LOGGER.info("Building props_fields cache")
fields = list(self.props_parser.get_props_fields())
return fields

fields_data = self._parser_cache.get_or_parse(
_parse_props_fields, "props_fields"
)
return iter(fields_data or [])

def get_tags(self):
"""
Expand Down
4 changes: 2 additions & 2 deletions pytest_splunk_addon/addon_parser/eventtype_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,10 @@ class EventTypeParser(object):
splunk_app_path (str): Path of the Splunk app
"""

def __init__(self, splunk_app_path: str):
def __init__(self, splunk_app_path: str, eventtypes_data: Optional[Dict] = None):
self._conf_parser = conf_parser.TABConfigParser()
self.splunk_app_path = splunk_app_path
self._eventtypes = None
self._eventtypes = eventtypes_data

@property
def eventtypes(self) -> Optional[Dict]:
Expand Down
Loading
Loading