y-scope · davidlion · Dec 19, 2025 · Nov 14, 2025 · Nov 28, 2025 · Dec 3, 2025
@@ -16,8 +16,10 @@ There are three types of rules in a schema file:
 
 * [Variables](#variables): Defines patterns for capturing specific pieces of the log.
 * [Delimiters](#delimiters): Specifies the characters that separate tokens in the log.
-* [Timestamps](#timestamps): Identifies the boundary between log events. Timestamps are also treated
-  as variables.
+* [Headers](#headers): Identifies the boundary between log events. Headers are also treated as
+  variables.
+  * The first capture named `timestamp` matched within a header pattern is considered the log
+    event's timestamp.
 
 For documentation, the schema allows for user comments by ignoring any text preceded by `//`.
 
@@ -26,15 +28,21 @@ For documentation, the schema allows for user comments by ignoring any text prec
 **Syntax:**
 
 ```txt
-<variable-name>:<variable-pattern>
+$VARIABLE_NAME$:$VARIABLE_PATTERN$
 ```
 
-* `variable-name` may contain any alphanumeric characters, but may not be the reserved names
-  `delimiters` or `timestamp`.
-* `variable-pattern` is a regular expression using the supported
+* `$VARIABLE_NAME$` may contain any alphanumeric characters, but may not use the reserved names
+  `delimiters`, `header`, or `timestamp`.
+* `$VARIABLE_PATTERN$` is a regular expression using the supported
   [syntax](#regular-expression-syntax).
 
-Note that:
+**Example:**
+
+```txt
+equalsCapture:.*=(?<equals>.*[a-zA-Z0-9].*)
+```
+
+**Note that:**
 
 * A schema file may contain zero or more variable rules.
 * Repeating the same variable name in another rule will `OR` the regular expressions (perform an
@@ -47,36 +55,54 @@ Note that:
 **Syntax:**
 
 ```txt
-delimiters:<characters>
+delimiters:$CHARACTERS$
 ```
 
 * `delimiters` is a reserved name for this rule.
-* `characters` is a set of characters that should be treated as delimiters. These characters define
-  the boundaries between tokens in the log.
+* `$CHARACTERS$` is a set of characters that should be treated as delimiters. These characters
+  define the boundaries between tokens in the log.
 
-Note that:
+**Example:**
+
+```txt
+delimiters: \t\r\n:,!;%
+```
+
+**Note that:**
 
 * A schema file must contain at least one `delimiters` rule. If multiple `delimiters` rules are
   specified, only the last one will be used.
 
-### Timestamps
+### Headers
 
 **Syntax:**
 
 ```txt
-timestamp:<timestamp-pattern>
+header:$PREFIX$(?<timestamp>$TIMESTAMP-PATTERN$)$SUFFIX$
 ```
 
-* `timestamp` is a reserved name for this rule.
-* `timestamp-pattern` is a regular expression using the supported
+* Multiple headers can be specified within a schema.
+* The timestamp capture can be omitted if the log-event boundary does not contain a timestamp.
+* Multiple timestamp captures are allowed within a header. These can exist within regex repetitions
+  or alternations.
+  * If no timestamps are parsed, the event's logtype has no timestamp.
+  * If one or more timestamps are parsed, the event's logtype uses the first timestamp.
+* `timestamp` is a reserved name for the capture within a header rule.
+* `$PREFIX$`, `$SUFFIX$`, and `$TIMESTAMP-PATTERN$` are regular expressions using the supported
   [syntax](#regular-expression-syntax).
 
-Note that:
+**Example:**
+
+```txt
+header:Log (?<pid>\d+) (?<timestamp>\[\d{8}\-\d{2}:\d{2}:\d{2}\]){0,1}
+```
+
+**Note that:**
 
-* The parser uses a timestamp to denote the start of a new log event if:
+* The parser uses a header to denote the start of a new log event if:
   * It appears as the first token in the input, or
   * It appears after a newline character.
-* Until a timestamp is found, the parser will use a newline character to denote the start of a new
+* Until a header is found, the parser will use a newline character to denote the start of a new
   log event.
 
 ## Example schema file
@@ -86,10 +112,11 @@ Note that:
 delimiters: \t\r\n:,!;%
 
 // Keywords
-timestamp:\d{4}\-\d{2}\-\d{2} \d{2}:\d{2}:\d{2}(\.\d{3}){0,1}
-timestamp:\[\d{8}\-\d{2}:\d{2}:\d{2}\]
-int:\-{0,1}[0-9]+
-float:\-{0,1}[0-9]+\.[0-9]+
+header:(?<timestamp>\d{4}\-\d{2}\-\d{2} \d{2}:\d{2}:\d{2}(\.\d{3}){0,1})
+header:Log (?<pid>\d+) (?<timestamp>\[\d{8}\-\d{2}:\d{2}:\d{2}\]){0,1}
+header:--- Log:
+int:\-{0,1}\d+
+float:\-{0,1}\d+\.\d+
 
 // Custom variables
 hex:[a-fA-F]+
@@ -99,7 +126,7 @@ equalsCapture:.*=(?<equals>.*[a-zA-Z0-9].*)
 
 * `delimiters: \t\r\n:,!;%` indicates that ` `, `\t`, `\r`, `\n`, `:`, `,`, `!`, `;`, and `%` are
   delimiters.
-* `timestamp` matches two different patterns:
+* `header` matches two different timestamp patterns:
   * `2023-04-19 12:32:08.064`
   * `[20230419-12:32:08]`
 * `int`, `float`, `hex`, `hasNumber`, and `equalsCapture` all match different user defined

diff --git a/examples/common.cpp b/examples/common.cpp
@@ -31,16 +31,13 @@ auto check_input(std::vector<std::string> const& args) -> int {
 }
 
 auto print_timestamp_loglevel(LogEventView const& event, uint32_t loglevel_id) -> void {
-    Token* timestamp{event.get_timestamp()};
     Token* loglevel{nullptr};
-    if (nullptr != timestamp) {
+    auto const& optional_timestamp{event.get_log_output_buffer()->get_timestamp()};
+    if (optional_timestamp.has_value()) {
         if (auto const& vec{event.get_variables(loglevel_id)}; false == vec.empty()) {
             loglevel = vec[0];
         }
-    }
-    if (nullptr != timestamp) {
-        cout << "timestamp: ";
-        cout << timestamp->to_string_view();
+        cout << "timestamp: " << optional_timestamp.value();
     }
     if (nullptr != loglevel) {
         cout << ", loglevel:";

diff --git a/examples/schema.txt b/examples/schema.txt
@@ -1,53 +1,53 @@
-// Timestamps (using the `timestamp` keyword)
+// Timestamps (using a `header` rule with a `timestamp` named capture)
 // E.g. 2015-01-31T15:50:45.392
-timestamp:\d{4}\-\d{2}\-\d{2}T\d{2}:\d{2}:\d{2}.\d{3}
+header:(?<timestamp>\d{4}\-\d{2}\-\d{2}T\d{2}:\d{2}:\d{2}.\d{3})
 // E.g. 2015-01-31T15:50:45,392
-timestamp:\d{4}\-\d{2}\-\d{2}T\d{2}:\d{2}:\d{2},\d{3}
+header:(?<timestamp>\d{4}\-\d{2}\-\d{2}T\d{2}:\d{2}:\d{2},\d{3})
 // E.g. [2015-01-31T15:50:45
-timestamp:\[\d{4}\-\d{2}\-\d{2}T\d{2}:\d{2}:\d{2}
+header:(?<timestamp>\[\d{4}\-\d{2}\-\d{2}T\d{2}:\d{2}:\d{2})
 // E.g. [20170106-16:56:41]
-timestamp:\[\d{4}\d{2}\d{2}\-\d{2}:\d{2}:\d{2}\]
+header:(?<timestamp>\[\d{4}\d{2}\d{2}\-\d{2}:\d{2}:\d{2}\])
 // E.g. 2015-01-31 15:50:45,392
 // E.g. INFO [main] 2015-01-31 15:50:45,085
-timestamp:\d{4}\-\d{2}\-\d{2} \d{2}:\d{2}:\d{2},\d{3}
+header:(?<timestamp>\d{4}\-\d{2}\-\d{2} \d{2}:\d{2}:\d{2},\d{3})
 // E.g. 2015-01-31 15:50:45.392
-timestamp:\d{4}\-\d{2}\-\d{2} \d{2}:\d{2}:\d{2}.\d{3}
+header:(?<timestamp>\d{4}\-\d{2}\-\d{2} \d{2}:\d{2}:\d{2}.\d{3})
 // E.g. [2015-01-31 15:50:45,085]
-timestamp:\[\d{4}\-\d{2}\-\d{2} \d{2}:\d{2}:\d{2},\d{3}\]
+header:(?<timestamp>\[\d{4}\-\d{2}\-\d{2} \d{2}:\d{2}:\d{2},\d{3}\])
 // E.g. 2015-01-31 15:50:45
 // E.g. Started POST /api/v3/internal/allowed for 127.0.0.1 at 2017-06-18 00:20:44
 // E.g. update-alternatives 2015-01-31 15:50:45
-timestamp:\d{4}\-\d{2}\-\d{2} \d{2}:\d{2}:\d{2}
+header:(?<timestamp>\d{4}\-\d{2}\-\d{2} \d{2}:\d{2}:\d{2})
 // E.g. Start-Date: 2015-01-31  15:50:45
-timestamp:\d{4}\-\d{2}\-\d{2}  \d{2}:\d{2}:\d{2}
+header:(?<timestamp>\d{4}\-\d{2}\-\d{2}  \d{2}:\d{2}:\d{2})
 // E.g. 2015/01/31 15:50:45
-timestamp:\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}
+header:(?<timestamp>\d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2})
 // E.g. 15/01/31 15:50:45
-timestamp:\d{2}/\d{2}/\d{2} \d{2}:\d{2}:\d{2}
+header:(?<timestamp>\d{2}/\d{2}/\d{2} \d{2}:\d{2}:\d{2})
 // E.g. 150131  9:50:45
-timestamp:\d{2}\d{2}\d{2} [ 0-9]{2}:\d{2}:\d{2}
+header:(?<timestamp>\d{2}\d{2}\d{2} [ 0-9]{2}:\d{2}:\d{2})
 // E.g. 01 Jan 2016 15:50:17,085
-timestamp:\d{2} [A-Z][a-z]{2} \d{4} \d{2}:\d{2}:\d{2},\d{3}
+header:(?<timestamp>\d{2} [A-Z][a-z]{2} \d{4} \d{2}:\d{2}:\d{2},\d{3})
 // E.g. Jan 01, 2016 3:50:17 PM
-timestamp:[A-Z][a-z]{2} \d{2}, \d{4} [ 0-9]{2}:\d{2}:\d{2} [AP]M
+header:(?<timestamp>[A-Z][a-z]{2} \d{2}, \d{4} [ 0-9]{2}:\d{2}:\d{2} [AP]M)
 // E.g. January 31, 2015 15:50
-timestamp:[A-Z][a-z]+ \d{2}, \d{4} \d{2}:\d{2}
+header:(?<timestamp>[A-Z][a-z]+ \d{2}, \d{4} \d{2}:\d{2})
 // E.g. E [31/Jan/2015:15:50:45
 // E.g. localhost - - [01/Jan/2016:15:50:17
 // E.g. 192.168.4.5 - - [01/Jan/2016:15:50:17
-timestamp:\[\d{2}/[A-Z][a-z]{2}/\d{4}:\d{2}:\d{2}:\d{2}
+header:(?<timestamp>\[\d{2}/[A-Z][a-z]{2}/\d{4}:\d{2}:\d{2}:\d{2})
 // E.g. 192.168.4.5 - - [01/01/2016:15:50:17
-timestamp:\[\d{2}/\d{2}/\d{4}:\d{2}:\d{2}:\d{2}
+header:(?<timestamp>\[\d{2}/\d{2}/\d{4}:\d{2}:\d{2}:\d{2})
 // E.g. ERROR: apport (pid 4557) Sun Jan  1 15:50:45 2015
-timestamp:[A-Z][a-z]{2} [A-Z][a-z]{2} [ 0-9]{2} \d{2}:\d{2}:\d{2} \d{4}
+header:(?<timestamp>[A-Z][a-z]{2} [A-Z][a-z]{2} [ 0-9]{2} \d{2}:\d{2}:\d{2} \d{4})
 // E.g. <<<2016-11-10 03:02:29:936
-timestamp:\<\<\<\d{4}\-\d{2}\-\d{2} \d{2}:\d{2}:\d{2}:\d{3}
+header:(?<timestamp>\<\<\<\d{4}\-\d{2}\-\d{2} \d{2}:\d{2}:\d{2}:\d{3})
 // E.g. Jan 21 11:56:42
-timestamp:[A-Z][a-z]{2} \d{2} \d{2}:\d{2}:\d{2}
+header:(?<timestamp>[A-Z][a-z]{2} \d{2} \d{2}:\d{2}:\d{2})
 // E.g. 01-21 11:56:42.392
-timestamp:\d{2}\-\d{2} \d{2}:\d{2}:\d{2}.\d{3}
+header:(?<timestamp>\d{2}\-\d{2} \d{2}:\d{2}:\d{2}.\d{3})
 // E.g. 2016-05-08 11:34:04.083464
-timestamp:\d{4}\-\d{2}\-\d{2} \d{2}:\d{2}:\d{2}.\d{6}
+header:(?<timestamp>\d{4}\-\d{2}\-\d{2} \d{2}:\d{2}:\d{2}.\d{6})
 
 // Delimiters
 delimiters: \t\r\n:,!;%

diff --git a/src/log_surgeon/Constants.hpp b/src/log_surgeon/Constants.hpp
@@ -35,8 +35,7 @@ enum class SymbolId : uint32_t {
     TokenInt,
     TokenFloat,
     TokenHex,
-    TokenFirstTimestamp,
-    TokenNewlineTimestamp,
+    TokenHeader,
     TokenNewline
 };
 
@@ -45,8 +44,7 @@ constexpr char cTokenUncaughtString[] = "$UncaughtString";
 constexpr char cTokenInt[] = "int";
 constexpr char cTokenFloat[] = "float";
 constexpr char cTokenHex[] = "hex";
-constexpr char cTokenFirstTimestamp[] = "firstTimestamp";
-constexpr char cTokenNewlineTimestamp[] = "newLineTimestamp";
+constexpr char cTokenHeader[] = "header";
 constexpr char cTokenNewline[] = "newLine";
 // Buffer size cannot be odd, so always use a multiple of 2
 constexpr uint32_t cStaticByteBuffSize = 48'000;

diff --git a/src/log_surgeon/Lalr1Parser.tpp b/src/log_surgeon/Lalr1Parser.tpp
@@ -59,8 +59,7 @@ Lalr1Parser<TypedNfaState, TypedDfaState>::Lalr1Parser() {
     m_terminals.insert((uint32_t)SymbolId::TokenInt);
     m_terminals.insert((uint32_t)SymbolId::TokenFloat);
     m_terminals.insert((uint32_t)SymbolId::TokenHex);
-    m_terminals.insert((uint32_t)SymbolId::TokenFirstTimestamp);
-    m_terminals.insert((uint32_t)SymbolId::TokenNewlineTimestamp);
+    m_terminals.insert((uint32_t)SymbolId::TokenHeader);
     m_terminals.insert((uint32_t)SymbolId::TokenNewline);
 }
 

diff --git a/src/log_surgeon/Lexer.hpp b/src/log_surgeon/Lexer.hpp
@@ -6,6 +6,7 @@
 #include <memory>
 #include <optional>
 #include <set>
+#include <stdexcept>
 #include <string>
 #include <unordered_map>
 #include <utility>
@@ -212,24 +213,24 @@ class Lexer {
      * Retrieves the register IDs for the start and end tags associated with a given capture.
      * @param capture Pointer to the capture to search for.
      * @return A pair of register IDs corresponding to the start and end tags of the capture.
-     * @return std::nullopt if no such capture is found.
+     * @throw runtime_error if capture does not have tag ids or register ids.
      */
     [[nodiscard]] auto get_reg_ids_from_capture(finite_automata::Capture const* const capture) const
-            -> std::optional<std::pair<reg_id_t, reg_id_t>> {
+            -> std::pair<reg_id_t, reg_id_t> {
         auto const optional_tag_id_pair{get_tag_id_pair_from_capture(capture)};
         if (false == optional_tag_id_pair.has_value()) {
-            return std::nullopt;
+            throw std::runtime_error(capture->get_name() + " has no tag ids");
         }
         auto const [start_tag_id, end_tag_id]{optional_tag_id_pair.value()};
 
         auto const optional_start_reg_id{get_reg_id_from_tag_id(start_tag_id)};
         if (false == optional_start_reg_id.has_value()) {
-            return std::nullopt;
+            throw std::runtime_error(capture->get_name() + " has no start reg id");
         }
 
         auto const optional_end_reg_id{get_reg_id_from_tag_id(end_tag_id)};
         if (false == optional_end_reg_id.has_value()) {
-            return std::nullopt;
+            throw std::runtime_error(capture->get_name() + " has no end reg id");
         }
 
         return std::make_pair(optional_start_reg_id.value(), optional_end_reg_id.value());