splunk · mkolasinski-splunk · Feb 4, 2026 · Jan 28, 2026 · Jan 28, 2026 · Jan 28, 2026
@@ -39,6 +39,12 @@ To generate test cases only for knowledge objects, append the following marker t
 
  - Plugin gets the list of defined sourcetypes by parsing props.conf
  - For each sourcetype, plugin generates an SPL search query and asserts event_count  0.
+
+ **Note:** Sourcetypes can be defined in two ways:
+ - **Direct stanzas**: `[sourcetype_name]` in props.conf
+ - **TRANSFORMS-defined**: Sourcetypes dynamically set via TRANSFORMS directives that reference transforms.conf entries with `FORMAT = sourcetype::<sourcetype_name>`
+
+ Both types of sourcetypes are automatically discovered and tested for event coverage.
 
 **2. Fields mentioned under source/sourcetype should be extracted**
 
@@ -155,3 +161,25 @@ For every test case failure, there is a defined structure for the stack trace.
  ```
 
 Get the search query from the stack trace and execute it on the Splunk instance and verify which specific type of events are causing failure.
+
+## Performance Optimization
+
+### Caching for pytest-xdist
+
+When running tests with pytest-xdist (multiple workers), the plugin automatically caches parsed configuration files and generated test parameters to avoid redundant work across workers.
+
+**What is cached:**
+- Parsed configuration: props.conf, transforms.conf, tags.conf, eventtypes.conf, savedsearches.conf
+- Generated test parameters for all fixtures
+
+**How it works:**
+- The first worker to request a cache key parses the data and saves it
+- Other workers load from the shared cache instead of re-parsing
+- Per-key locking prevents deadlocks when nested cache lookups occur
+- Atomic writes with integrity hashing prevent cache corruption
+
+**Cache files:**
+- Location: `{temp_dir}/pytest-splunk-addon/{testrunuid}_parser_cache`
+- Cleaned up at process exit by the first worker (gw0)
+
+**Note:** Caching only activates when running under pytest-xdist. Single-worker execution parses files directly without caching overhead.
@@ -3,15 +3,46 @@
 ## Overview
 
 - The tests are written with a purpose of testing the proper functioning of the fields extractions of the add-on.
+- Requirement tests use XML sample files with embedded field expectations (`cim_fields`, `other_mappings`)
 
 ______________________________________________________________________
 
-To generate only requirement tests, append the folowing marker to pytest command:
+To generate only requirement tests, append the following marker to pytest command:
 
  ```console
  -m  splunk_requirements
  ```
 
+## XML Sample File Structure for Requirement Tests
+
+Requirement tests require XML format sample files with `requirement_test_sample = 1` in the conf file.
+
+### Transport Node Usage
+
+The `<transport>` node in XML samples has specific meaning for requirement tests:
+
+```xml
+<transport type="modinput" host="sample_host" source="test_source" sourcetype="test:sourcetype" />
+```
+
+| Attribute | Behavior in Requirement Tests |
+|-----------|------------------------------|
+| `type` | Used for syslog header stripping (if `type="syslog"`). NOT used for ingestion. |
+| `host` | **Overrides** the host value that is recorded in both the ingested event metadata and the search query generated for that event. |
+| `source` | **Overrides** the source value that is recorded in the ingested event metadata and the search query generated for that event. |
+| `sourcetype` | Used in field extraction test searches (not ingestion) |
+
+**Important:** The actual ingestion is still driven by the conf file's `input_type`, `sourcetype`, and other metadata settings, but any `<transport host>` or `<transport source>` values found in the XML are merged into the `SampleEvent` metadata before ingestion and before the search query is generated (see `pytest_splunk_addon/sample_generation/sample_stanza.py`). That means those XML overrides affect the payload submitted to Splunk and the constraints that the requirement tests evaluate, not just the search query.
+
+### Scenarios
+
+- **Requirement test sample with `<transport host/source>` overrides:**  
+  The XML parser injects these override values into `SampleEvent.metadata`, the ingestor emits them in the indexed event, and the search generator reuses them so the test looks for the same host/source.
+- **Field test sample with `<transport type="syslog">`:**  
+  Only the syslog header stripping behavior is affected; the host/source/sourcetype are still driven by the conf stanza (`host_type`, `host`, `source`, `sourcetype_to_search`) and tokens.
+- **Events without XML transport overrides:**  
+  Ingestors rely entirely on stanza metadata; searches use `host`/`source`/`sourcetype_to_search` from the conf file or tokens.
+
 ## Test Scenarios
 
 **1. Fields should be extracted as defined in the cim_fields and other_mappings.**

@@ -89,6 +89,28 @@ Example live event:
 ```
 </details>
 
+### XML Transport Node vs. Conf File Settings
+
+The XML `<transport>` node and `pytest-splunk-addon-data.conf` settings serve **different purposes**. Understanding this distinction is important to avoid confusion:
+
+| Setting | Source | Purpose |
+|---------|--------|---------|
+| `<transport type="">` | XML file | **Field tests only** - Used for syslog header stripping in field extraction tests. NOT used for ingestion. |
+| `<transport host="">` | XML file | **Requirement tests** - Overrides conf host for requirement test searches. |
+| `<transport source="">` | XML file | **Requirement tests** - Overrides conf source for requirement test searches. |
+| `<transport sourcetype="">` | XML file | **Field tests only** - Used in field extraction test searches. NOT used for ingestion. |
+| `input_type` | Conf file | **Ingestion** - Controls how events are parsed and ingested into Splunk. |
+| `host` | Conf file | **Ingestion** - Base host value for events. Defaults to sample file name. |
+| `sourcetype` | Conf file | **Ingestion** - The sourcetype assigned when ingesting events. |
+| `sourcetype_to_search` | Conf file | **All test searches** - The sourcetype used in search queries. |
+
+**Key Points:**
+
+1. The conf file's `input_type` **always** controls how events are ingested
+2. The XML's `<transport type="">` only affects field test behavior (e.g., stripping syslog headers)
+3. Use `sourcetype_to_search` in the conf file when your add-on transforms the sourcetype at index-time (via TRANSFORMS)
+4. The XML's `host` and `source` attributes override conf values **only for requirement tests**
+
 ## pytest-splunk-addon-data.conf.spec
 
 **Default Values**:
@@ -119,78 +141,110 @@ host_prefix = {{host_prefix}}
 - Example1: \[sample_file.samples\] would collect samples from file sample_file.samples
 - Example2: \[sample\_\*.samples\] would collect samples from both sample_file.samples and sample_sample.samples.
 
-**sourcetype = <sourcetype\>**
+---
 
-- sourcetype to be assigned to the sample events
+### Ingestion Settings
 
-**source = <source\>**
+These settings control how events are ingested into Splunk.
 
-- source to be assigned to the sample events
-  - default value: pytest-splunk-addon:{\{input_type}}
+**sourcetype = <sourcetype\>**
 
+- **Purpose:** Sourcetype assigned when **ingesting** events into Splunk
+- This is the sourcetype that Splunk receives at index-time
+- If your add-on uses TRANSFORMS to change the sourcetype, this should be the **original** sourcetype before transformation
 
-**sourcetype_to_search = <sourcetype\>**
+**source = <source\>**
 
-- The sourcetype used to search events
-  - This would be different then sourcetype= param in cases where TRANSFORMS is used to update the sourcetype index time.
+- **Purpose:** Source assigned when **ingesting** events into Splunk
+- default value: pytest-splunk-addon:{\{input_type}}
 
-**host_type = plugin | event**
+**input_type = modinput | scripted_input | syslog_tcp | file_monitor | windows_input | uf_file_monitor | default**
 
-- This key determines if host is assigned from event or default host should be assigned by plugin.
-- If the value is plugin, the plugin will generate host with format of "stanza\_\{count}" to uniquely identify the events.
-- If the value is event, the host field should be provided for a token using "token.<n\>.field = host".
+- **Purpose:** Controls how events are **parsed and ingested** into Splunk
+- This determines how sample files are processed:
+  - `modinput`, `windows_input`: One event per line in the sample file
+  - `file_monitor`, `scripted_input`, `syslog_tcp`, `syslog_udp`, `default`: Entire file as single event (unless breaker is specified)
+- The ingestion method is chosen to match how data flows in production for accurate index-time testing
+- For example, if sourcetype "alert" is ingested through syslog in production, use `input_type=syslog_tcp`
 
-**input_type = modinput | scripted_input | syslog_tcp | file_monitor | windows_input | uf_file_monitor | default**
-
-- The input_type used in addon to ingest data of a sourcetype used in stanza.
-- The way with which the sample data is ingested in Splunk depends on Splunk. The most similar ingesting approach is used for each input_type to get accurate index-time testing.
-- In input_type=uf_file_monitor, universal forwarder will use file monitor to read event and then it will send data to indexer.
-- For example, in an Add-on, a sourcetype "alert" is ingested through syslog in live environment, provide input_type=syslog_tcp.
+> **_Note:_** This is different from the XML's `<transport type="">` which only affects field test behavior (syslog header stripping). The conf file's `input_type` always controls actual ingestion.
 
 > **_warning:_**  uf_file_monitor input_type will only work with splunk-type=docker.
 
-
 **index = <index\>**
 
-- The index used to ingest the data.
-- The index must be configured beforehand.
-- If the index is not available then the data will not get ingested into Splunk and a warning message will be printed.
+- **Purpose:** The index where events are **ingested**
+- The index must be configured beforehand
+- If the index is not available, data will not be ingested and a warning will be printed
 - Custom index is not supported for syslog_tcp or syslog_udp
 
+**host = <host\>**
+
+- **Purpose:** Base host value assigned when **ingesting** events
+- If not specified, defaults to the sample file name
+- When `host_type = plugin`, the plugin appends `_{count}` to make each event's host unique (e.g., `myhost_1`, `myhost_2`)
+- Can be overridden per-event in XML samples via `<transport host="...">` for requirement tests
+
+---
+
+### Search Settings
+
+These settings control how tests search for events in Splunk.
+
+**sourcetype_to_search = <sourcetype\>**
+
+- **Purpose:** The sourcetype used in **search queries** during tests
+- Use this when your add-on transforms the sourcetype at index-time via TRANSFORMS
+- Example: If you ingest with `sourcetype=raw:data` but TRANSFORMS changes it to `sourcetype=parsed:data`, set:
+  - `sourcetype = raw:data` (for ingestion)
+  - `sourcetype_to_search = parsed:data` (for searching)
+- If not specified, defaults to the value of `sourcetype`
+
+---
+
+### Test Behavior Settings
+
+These settings control test generation and execution.
+
+**host_type = plugin | event**
+
+- **Purpose:** Determines how the host field is assigned
+- `plugin`: The plugin generates unique hosts with format "stanza\_\{count}" to identify events
+- `event`: The host is extracted from a token using "token.<n\>.field = host"
+
 **sample_count = <count\>**
 
-- The no. of events present in the sample file.
-- This parameter will be used to calculate the total number of events which will be generated from the sample file.
-- If `input_type = modinput`, do not provide this parameter.
+- **Purpose:** Number of events present in the sample file
+- Used to calculate total events generated from the sample file
+- If `input_type = modinput`, do not provide this parameter (each line is an event)
 
 **requirement_test_sample = 1**
 
-- This parameter is used to run requirement tests for the provided sample xml file
-- only supported with the xml sample file
+- **Purpose:** Enables requirement tests for XML sample files
+- When set to 1, the plugin parses the XML format and runs requirement tests using `cim_fields` and `other_mappings`
+- Only supported with XML sample files
 
 **expected_event_count = <count\>**
 
-- The no. of events this sample stanza should generate.
-- The parameter will be used to test the line breaking in index-time tests.
-- To calculate expected_event_count 2 parameters can be used. 1) Number of events in the sample file. 2) Number of values of replacementType=all tokens in the sample file. Both the parameters can be multiplied to get expected_event_count.
-- For example, if sample contains 3 lines & a token has replacement_type=all and replacement has list of 2 values, then 6 events will be generated.
-- This parameter is optional, if it is not provided by the user, it will be calculated automatically by the pytest-splunk-addon.
+- **Purpose:** Expected number of events for index-time line-breaking tests
+- Calculated as: (events in sample) × (values in replacementType=all tokens)
+- For example, if sample has 3 lines and a token has `replacement_type=all` with 2 values, then 6 events are generated
+- Optional - if not provided, calculated automatically
 
 **timestamp_type = plugin | event**
 
-- This key determines if \_time is assigned from event or default \_time should be assigned by plugin.
-- The parameter will be used to test the time extraction in index-time tests.
-- If value is plugin, the plugin will assign the time while ingesting the event.
-- If value is event, that means the time will be extracted from event and therfore, there should be a token provided with token.<n\>.field = \_time.
+- **Purpose:** Determines how \_time is assigned for index-time tests
+- `plugin`: The plugin assigns timestamp during ingestion
+- `event`: Timestamp is extracted from the event; requires a token with `token.<n\>.field = _time`
 
 **breaker = <regex\>**
 
-- The breaker is used to breakdown the sample file into multiple events, based on the regex provided.
-- This parameter is optional. If it is not provided by the user, the events will be ingested into Splunk,
-as per the *input_type* provided.
+- **Purpose:** Regex pattern to split sample file into multiple events
+- Optional - if not provided, events are parsed according to `input_type`
 
 **host_prefix = <host_prefix\>**
-- This param is used as an identification for the **host** field, for the events which are ingested using SC4S.
+
+- **Purpose:** Prefix for host field identification when using SC4S ingestion
 
 ## Token replacement settings
 

@@ -16,9 +16,9 @@
 # -*- coding: utf-8 -*-
 """
 The module provides the Add-on parsing mechanism. It can
-parse the knowledge objects from an Add-on's configuration files
+parse the knowledge objects from an Add-on's configuration files.
 
-Supports: fields from props & transforms, tags, eventtypes
+Supports: fields from props & transforms, tags, eventtypes, savedsearches
 """
 import os
 import re
@@ -30,6 +30,7 @@
 from .tags_parser import TagsParser
 from .eventtype_parser import EventTypeParser
 from .savedsearches_parser import SavedSearchParser
+from .parser_cache import ParserCache
 
 LOGGER = logging.getLogger("pytest-splunk-addon")
 
@@ -49,29 +50,64 @@ def __init__(self, splunk_app_path):
         self._tags_parser = None
         self._eventtype_parser = None
         self._savedsearch_parser = None
+        self._parser_cache = ParserCache()
 
     @property
     def props_parser(self):
         if not self._props_parser:
-            self._props_parser = PropsParser(self.splunk_app_path)
+
+            def _parse_props():
+                parser = PropsParser(self.splunk_app_path)
+                return parser.props
+
+            props_data = self._parser_cache.get_or_parse(_parse_props, "props")
+            self._props_parser = PropsParser(
+                self.splunk_app_path, props_data=props_data
+            )
         return self._props_parser
 
     @property
     def tags_parser(self):
         if not self._tags_parser:
-            self._tags_parser = TagsParser(self.splunk_app_path)
+
+            def _parse_tags():
+                parser = TagsParser(self.splunk_app_path)
+                return parser.tags
+
+            tags_data = self._parser_cache.get_or_parse(_parse_tags, "tags")
+            self._tags_parser = TagsParser(self.splunk_app_path, tags_data=tags_data)
         return self._tags_parser
 
     @property
     def eventtype_parser(self):
         if not self._eventtype_parser:
-            self._eventtype_parser = EventTypeParser(self.splunk_app_path)
+
+            def _parse_eventtypes():
+                parser = EventTypeParser(self.splunk_app_path)
+                return parser.eventtypes
+
+            eventtypes_data = self._parser_cache.get_or_parse(
+                _parse_eventtypes, "eventtypes"
+            )
+            self._eventtype_parser = EventTypeParser(
+                self.splunk_app_path, eventtypes_data=eventtypes_data
+            )
         return self._eventtype_parser
 
     @property
     def savedsearch_parser(self):
         if not self._savedsearch_parser:
-            self._savedsearch_parser = SavedSearchParser(self.splunk_app_path)
+
+            def _parse_savedsearches():
+                parser = SavedSearchParser(self.splunk_app_path)
+                return parser.savedsearches
+
+            savedsearches_data = self._parser_cache.get_or_parse(
+                _parse_savedsearches, "savedsearches"
+            )
+            self._savedsearch_parser = SavedSearchParser(
+                self.splunk_app_path, savedsearches_data=savedsearches_data
+            )
         return self._savedsearch_parser
 
     def get_props_fields(self):
@@ -81,7 +117,16 @@ def get_props_fields(self):
         Yields:
             generator of all the supported fields
         """
-        return self.props_parser.get_props_fields()
+
+        def _parse_props_fields():
+            LOGGER.info("Building props_fields cache")
+            fields = list(self.props_parser.get_props_fields())
+            return fields
+
+        fields_data = self._parser_cache.get_or_parse(
+            _parse_props_fields, "props_fields"
+        )
+        return iter(fields_data or [])
 
     def get_tags(self):
         """

@@ -35,10 +35,10 @@ class EventTypeParser(object):
         splunk_app_path (str): Path of the Splunk app
     """
 
-    def __init__(self, splunk_app_path: str):
+    def __init__(self, splunk_app_path: str, eventtypes_data: Optional[Dict] = None):
         self._conf_parser = conf_parser.TABConfigParser()
         self.splunk_app_path = splunk_app_path
-        self._eventtypes = None
+        self._eventtypes = eventtypes_data
 
     @property
     def eventtypes(self) -> Optional[Dict]: