Skip to content

Manifest placeholder escape sequence support#282

Merged
prom3theu5 merged 9 commits intoprom3theu5:mainfrom
John-Leitch:placeholder-tokenizer
Feb 16, 2025
Merged

Manifest placeholder escape sequence support#282
prom3theu5 merged 9 commits intoprom3theu5:mainfrom
John-Leitch:placeholder-tokenizer

Conversation

@John-Leitch
Copy link
Copy Markdown
Contributor

@John-Leitch John-Leitch commented Feb 5, 2025

This PR adds proper placeholder escape sequence support to the manifest processing logic, allowing for the escaping of curly braces via doubling e.g. {{ or }}. The lack of support in the previous implementation was quite noticeable in the case of .NET format string placeholders e.g. {0}, which triggered unhandled exceptions during generation.

While the Microsoft documentation makes no mention of this form of escaping, this sequence is supported by Microsoft in the Azure Developer CLI (azd).

Care was taken to ensure backwards compatibility with the unresolved placeholder behaviors of the previous implementation, while also bringing the implementation closer to parity with Microsoft's. Several unit tests were added to cover all known edge cases in reference tokenization and unescaping.

@John-Leitch John-Leitch marked this pull request as ready for review February 6, 2025 06:15
@qodo-code-review
Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Error Handling

The tokenizer and unescape logic may not handle all edge cases properly. For example, nested placeholders or malformed escape sequences could cause unexpected behavior.

public static List<JsonInterpolationToken> Tokenize(ReadOnlySpan<char> input)
{
    var state = TokenizerState.InText;
    var currentTokenIndex = 0;

    // Estimate the number of tokens.
    var tokens = new List<JsonInterpolationToken>(input.Length / 2);

    void TryAddToken(
        ref ReadOnlySpan<char> input,
        JsonInterpolationTokenType tokenType,
        int length,
        int nextTokenIndex)
    {
        if (length != 0 || tokenType != JsonInterpolationTokenType.Text)
        {
            tokens.Add(
                new JsonInterpolationToken(
                    tokenType,
                    input.Slice(currentTokenIndex, length).ToString()));
        }

        currentTokenIndex = nextTokenIndex;
    }

    for (var i = 0; i < input.Length; i++)
    {
        var currentChar = input[i];

        switch (state)
        {
            case TokenizerState.InText:

                if (currentChar == '{')
                {
                    // We are in a potential placeholder token start.
                    state = TokenizerState.InPlaceholderStart;
                }

                break;

            case TokenizerState.InPlaceholderStart:

                if (currentChar == '{')
                {
                    // This curly brace is escaped, we're still in text.
                    state = TokenizerState.InText;
                }
                else
                {
                    // We are in a placeholder token, slice the previous text.
                    TryAddToken(
                        ref input,
                        JsonInterpolationTokenType.Text,
                        i - currentTokenIndex - 1,
                        i);

                    // Advance to toke in placeholder token state.
                    state = TokenizerState.InPlaceholder;
                }

                break;

            case TokenizerState.InPlaceholder:

                // We are going for parity with the regular expression used in the
                // previous implementation: ([\w\.-]+)
                if (currentChar is
                    (>= 'a' and <= 'z') or
                    (>= 'A' and <= 'Z') or
                    (>= '0' and <= '9') or
                    '.' or
                    '-')
                {
                    // We are still in the placeholder token.
                    continue;
                }
                else if (currentChar == '}')
                {
                    // Our placeholder token is complete.
                    TryAddToken(
                        ref input,
                        JsonInterpolationTokenType.Placeholder,
                        i - currentTokenIndex,
                        i + 1);

                    // The next token is probably text.
                    state = TokenizerState.InText;
                }
                else
                {
                    // Our placeholder token has unsupported characters. As per
                    // the original implementation, we are going to treat the
                    // current lexme as text.
                    state = TokenizerState.InText;
                }

                break;

            default:
                throw new NotSupportedException();
        }
    }

    if (currentTokenIndex != input.Length - 1)
    {
        // There is a dangling token. To ensure parity with the original
        // implementation, we're going to treat it as text.

        if (state == TokenizerState.InPlaceholder)
        {
            // We were actually in a placeholder token, so we need to move
            // back a single character to ensure we catch the opening brace.
            currentTokenIndex--;
        }

        TryAddToken(
            ref input,
            JsonInterpolationTokenType.Text,
            input.Length - currentTokenIndex,
            0);
    }

    return tokens;
}
Performance

The string builder allocation and multiple string operations in the placeholder replacement logic could be optimized for better performance with large JSON documents.

var tokens = JsonInterpolation.Tokenize(input);
var transformedInput = new StringBuilder();

foreach (var token in tokens)
{
    if (token.IsText())
    {
        transformedInput.Append(token.Lexeme);
    }
    else
    {
        var jsonPath = token.Lexeme;
        var pathParts = jsonPath.Split('.');

        void AppendPlaceholderTokenAsText()
        {
            transformedInput.Append('{');
            transformedInput.Append(jsonPath);
            transformedInput.Append('}');
        }

        if (pathParts.Length == 1)
        {
            var resolvedNode = rootNode[pathParts[0]];

            if (resolvedNode != null)
            {
                transformedInput.Append(resolvedNode.ToString());
            }
            else
            {
                AppendPlaceholderTokenAsText();
            }

            continue;
        }
        else if (pathParts is [_, Literals.Bindings, ..])
        {
            transformedInput.Append(bindingProcessor.ParseBinding(pathParts, rootNode));
            continue;
        }

        var selectionPath = pathParts.AsJsonPath();
        var path = JsonPath.Parse(selectionPath);
        var result = path.Evaluate(rootNode);

        if (result.Matches.Count == 0)
        {
            AppendPlaceholderTokenAsText();
            continue;
        }

        var value = result.Matches.FirstOrDefault()?.Value;

        if (value is null)
        {
            AppendPlaceholderTokenAsText();
            continue;
        }

        transformedInput.Append(value.ToString());
    }

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented Feb 6, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Add null parameter validation
Suggestion Impact:Added null checks for both Tokenize and Unescape methods to return empty results instead of throwing exceptions

code diff:

-    public static List<JsonInterpolationToken> Tokenize(string input) => Tokenize(input.AsSpan());
+    public static List<JsonInterpolationToken> Tokenize(string input) =>
+        input is null ? [] : Tokenize(input.AsSpan());
 
     public static List<JsonInterpolationToken> Tokenize(ReadOnlySpan<char> input)
     {
@@ -188,6 +189,11 @@
 
     public static string Unescape(string value)
     {
+        if (value is null)
+        {
+            return string.Empty;
+        }

Add null checks for input parameters in the Tokenize and Unescape methods to
prevent potential NullReferenceExceptions.

src/Aspirate.Processors/Transformation/Json/JsonInterpolation.cs [54-56]

-public static List<JsonInterpolationToken> Tokenize(string input) => Tokenize(input.AsSpan());
+public static List<JsonInterpolationToken> Tokenize(string input) => 
+    input is null ? new List<JsonInterpolationToken>() : Tokenize(input.AsSpan());
 
-public static string Unescape(string value)
+public static string Unescape(string value) => 
+    value is null ? string.Empty : UnescapeImpl(value);

[Suggestion has been applied]

Suggestion importance[1-10]: 7

__

Why: Adding null checks for input parameters is important for preventing NullReferenceExceptions and making the code more robust. This is a defensive programming practice that improves code reliability.

Medium
Add thread safety protection

Add thread safety protection when processing JsonArray and JsonObject elements
since ToArray() creates a copy but subsequent modifications could cause race
conditions.

src/Aspirate.Processors/Transformation/Json/JsonInterpolationUnescapeProcessor.cs [38-41]

-foreach (var kvp in jsonObject.ToArray())
+lock(jsonObject)
 {
-    UnescapeJsonExpression(kvp.Value);
+    foreach (var kvp in jsonObject.ToArray())
+    {
+        UnescapeJsonExpression(kvp.Value);
+    }
 }
  • Apply this suggestion
Suggestion importance[1-10]: 3

__

Why: While thread safety is generally good practice, the context doesn't suggest this code needs to be thread-safe. The lock would add unnecessary overhead since JsonNode operations are typically not designed for concurrent modifications.

Low

@John-Leitch
Copy link
Copy Markdown
Contributor Author

Hey @prom3theu5, sorry to bother, but this one and #277 are blocking a project I'm on. Is there anything I can do to help move these forward?

@prom3theu5
Copy link
Copy Markdown
Owner

Hi John
Thanks for this looks good - Thanks
Any chance you can rebase on the latest .net 9 update thats in main, and resolve the small conflict?
I'll get it merged in and released in the next day

@John-Leitch
Copy link
Copy Markdown
Contributor Author

Hi John Thanks for this looks good - Thanks Any chance you can rebase on the latest .net 9 update thats in main, and resolve the small conflict? I'll get it merged in and released in the next day

Sure, no worries. I should be able to do it today or tomorrow.

@John-Leitch
Copy link
Copy Markdown
Contributor Author

@prom3theu5, I believe everything's synced up now.

@prom3theu5
Copy link
Copy Markdown
Owner

Thanks again John

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants