Skip to content

Conversation

@taka-oyama
Copy link
Collaborator

@taka-oyama taka-oyama commented Nov 1, 2024

Closes #234

TODO

  • Schema: add SEARCH INDEX Support (implement Blueprint::fullText)
    • PARTITION BY
    • ORDER BY
    • OPTIONS(sort_order_sharding)
  • Schema: add TOKENLIST column type support
    • TOKENIZE_FULLTEXT
    • TOKENIZE_SUBSTRING
    • TOKENIZE_NGRAMS
    • TOKEN
    • TOKENIZE_BOOL
    • TOKENIZE_NUMBER
  • Query: add SEARCH support (Builder::searchFullText(...))
  • Query: add SEARCH_NGRAMS Builder::searchNgrams(...)
  • Query: add SEARCH_SUBSTRING Builder::searchSubstring(...)
  • README

PRs below should be merged before this to reduce size of diff.

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced full-text search capabilities in the database schema.
    • Added methods for creating token lists and search indexes.
    • Enhanced the query builder with new search methods: searchFullText, searchNgrams, and searchSubstring.
    • Support for invisible columns and improved SQL command generation.
    • Added change streams for real-time data updates and support for snapshot queries.
    • Updated minimum supported Laravel version to 11.15.0.
  • Bug Fixes

    • Resolved an issue where schema changes were applied twice.
  • Documentation

    • Updated README.md to include a section on full-text search implementation.
    • Changelog updated with new features and deprecations.
  • Tests

    • Added tests for full-text search functionalities and schema definitions.

@taka-oyama taka-oyama added the enhancement New feature or request label Nov 1, 2024
@taka-oyama taka-oyama self-assigned this Nov 1, 2024
@taka-oyama taka-oyama marked this pull request as draft November 1, 2024 02:49
@taka-oyama taka-oyama changed the title update readme feat: add FULL TEXT SEARCH support Nov 1, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
README.md (2)

352-355: Enhance the introduction and fix hyphenation.

Consider these improvements:

  1. Use "Full-Text Search" with hyphenation for better readability.
  2. Add a brief explanation of common use cases or benefits of using full-text search.

Apply this diff:

-### Full Text Search
+### Full-Text Search
 
-Spanner supports [Full Text Search](https://cloud.google.com/spanner/docs/full-text-search) which allows you to search for text in columns.
+Spanner supports [Full-Text Search](https://cloud.google.com/spanner/docs/full-text-search) which allows you to perform efficient text-based queries on your data. This is particularly useful for implementing search functionality in applications where users need to search through large text fields.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~352-~352: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...angeStream('stream_name'); }); ``` ### Full Text Search Spanner supports [Full Text Sea...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[uncategorized] ~354-~354: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...### Full Text Search Spanner supports [Full Text Search](https://cloud.google.com/spanne...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🪛 Markdownlint

352-352: Expected: setext; Actual: atx
Heading style

(MD003, heading-style)


356-365: Add examples for all supported features.

The code example is clear but doesn't cover all the features mentioned in the PR objectives. Consider adding examples for:

  1. Additional tokenization options:

    • TOKENIZE_SUBSTRING
    • TOKENIZE_NGRAMS
    • TOKEN
    • TOKENIZE_BOOL
    • TOKENIZE_NUMBER
  2. Search index options:

    • PARTITION BY
    • ORDER BY
    • OPTIONS(sort_order_sharding)
  3. Query capabilities:

    • Builder::whereSearch(...)
    • Builder::whereSearchSubstring(...)
    • SNIPPET feature

Would you like me to help generate examples for these features?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 04fb5bc and 4d0e373.

📒 Files selected for processing (1)
  • README.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
README.md

[uncategorized] ~352-~352: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...angeStream('stream_name'); }); ``` ### Full Text Search Spanner supports [Full Text Sea...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[uncategorized] ~354-~354: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...### Full Text Search Spanner supports [Full Text Search](https://cloud.google.com/spanne...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🪛 Markdownlint
README.md

352-352: Expected: setext; Actual: atx
Heading style

(MD003, heading-style)

@taka-oyama taka-oyama changed the base branch from master to feature/phpstan-support-part2 November 6, 2024 05:41
@taka-oyama taka-oyama force-pushed the feature/full-text-search branch from 22bb5f1 to a3cedc3 Compare November 6, 2024 05:44
@taka-oyama taka-oyama force-pushed the feature/phpstan-support-part2 branch from e9da42e to 0e8c1f0 Compare November 6, 2024 05:48
@taka-oyama taka-oyama force-pushed the feature/full-text-search branch from a3cedc3 to 95921a1 Compare November 6, 2024 05:48
@taka-oyama taka-oyama requested a review from a team November 6, 2024 06:55
@taka-oyama taka-oyama marked this pull request as ready for review November 6, 2024 06:56
Base automatically changed from feature/phpstan-support-part2 to master November 6, 2024 10:42
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (19)
src/Concerns/SharedGrammarCalls.php (1)

46-60: LGTM! Consider enhancing the documentation.

The implementation is robust and handles all value types appropriately. The method is well-structured using modern PHP features and follows good practices.

Consider enhancing the PHPDoc to document the return format for each type:

     /**
      * @param scalar|BackedEnum $value
+     * @return string Format:
+     *   - bool: 'true' or 'false'
+     *   - string: quoted string
+     *   - BackedEnum: quoted value if string, cast to string otherwise
+     *   - other: cast to string
      */
src/Schema/SearchIndexDefinition.php (2)

21-33: Enhance class documentation with descriptions and version information.

While the type information is well-documented, consider adding:

  1. A class description explaining its purpose in the full-text search feature
  2. Property descriptions explaining their roles
  3. Method descriptions explaining their usage
  4. @since tag for version tracking

Example enhancement:

 /**
+ * Represents a full-text search index definition for Google Cloud Spanner.
+ * This class handles the configuration of search indexes including partitioning,
+ * ordering, and specific Cloud Spanner options.
+ *
+ * @since 1.0.0
+ *
- * @property string $name
- * @property string $index
+ * @property string $name The name of the search index
+ * @property string $index The type of index being created
  * @property list<string> $columns The columns to be indexed
  * @property string|list<string> $partitionBy Columns used for partitioning
  * @property list<string>|array<string, string> $orderBy Columns and their order
  * @property bool|null $sortOrderSharding Enable/disable sort order sharding
  * @property bool|null $disableAutomaticUidColumn Enable/disable automatic UID
  * @method $this partitionBy(string|string[] $columns) Set partitioning columns
  * @method $this orderBy(string|string[] $columns) Set ordering columns
  * @method $this sortOrderSharding(bool $toggle = true) Configure sort sharding
  * @method $this disableAutomaticUidColumn(bool $toggle = true) Configure UID
  */

34-46: Consider strengthening the implementation with explicit property handling.

The current implementation relies heavily on magic methods. Consider:

  1. Adding explicit property declarations
  2. Implementing a constructor for required properties
  3. Adding property validation
  4. Using a more specific return type for getOptions()

Example enhancement:

 class SearchIndexDefinition extends IndexDefinition
 {
+    private string $name;
+    private string $index;
+    /** @var list<string> */
+    private array $columns;
+    /** @var string|list<string>|null */
+    private string|array|null $partitionBy = null;
+    /** @var list<string>|array<string, string>|null */
+    private array|null $orderBy = null;
+    private ?bool $sortOrderSharding = null;
+    private ?bool $disableAutomaticUidColumn = null;
+
+    /**
+     * @param string $name
+     * @param string $index
+     * @param list<string> $columns
+     */
+    public function __construct(string $name, string $index, array $columns)
+    {
+        $this->name = $name;
+        $this->index = $index;
+        $this->columns = $columns;
+    }

     /**
-     * @return array{ sortOrderSharding?: bool, disableAutomaticUidColumn?: bool }
+     * Returns the Cloud Spanner specific options for this search index.
+     *
+     * @return array{
+         sortOrderSharding?: bool,
+         disableAutomaticUidColumn?: bool
+     }
      */
     public function getOptions(): array
     {
         return array_filter([
             'sortOrderSharding' => $this->sortOrderSharding,
             'disableAutomaticUidColumn' => $this->disableAutomaticUidColumn,
         ], static fn($v) => $v !== null);
     }
 }
src/Query/Concerns/UsesFullTextSearch.php (1)

1-101: Consider enhancing documentation and examples

While the implementation is solid, consider these improvements for better developer experience:

  1. Add code examples in the class-level PHPDoc showing common usage patterns
  2. Document the expected format of the $options array
  3. Add @throws annotations to document possible exceptions
  4. Consider adding constants for valid search types and boolean values

These additions would make the API more discoverable and easier to use correctly.

src/Query/Builder.php (1)

32-32: LGTM! The trait addition is well-placed.

The UsesFullTextSearch trait is appropriately grouped with other query-related traits, maintaining good code organization.

Some architectural considerations for the full-text search implementation:

  • Ensure the trait methods handle large result sets efficiently
  • Consider implementing result caching for frequently used search queries
  • Add monitoring/logging for search performance metrics
src/Schema/Blueprint.php (2)

247-261: Enhance method documentation with parameter details

The implementation looks good, but the method documentation could be more descriptive:

Consider expanding the PHPDoc to describe:

  • $function: The tokenization function (e.g., TOKENIZE_FULLTEXT, TOKENIZE_NGRAMS)
  • $target: The target column being tokenized
  • $options: Available tokenization options
     /**
      * @param string $column
      * @param TokenizerFunction $function
      * @param string $target
      * @param array $options
+     * @example
+     * $table->tokenList('content_tokens', TokenizerFunction::TOKENIZE_FULLTEXT, 'content');
+     * $table->tokenList('ngram_tokens', TokenizerFunction::TOKENIZE_NGRAMS, 'title', ['min_size' => 2, 'max_size' => 4]);
      * @return ColumnDefinition
      */

288-307: Enhance fullText method with documentation and validation

The implementation looks good but could benefit from the following improvements:

  1. Add comprehensive PHPDoc with examples:
     /**
      * @param string|list<string> $columns
      * @param string|null $name
      * @param string|null $algorithm
+     * @example
+     * // Basic usage
+     * $table->fullText('content');
+     * 
+     * // With multiple columns and custom name
+     * $table->fullText(['title', 'content'], 'my_search_idx');
+     * 
+     * // With partitioning and ordering
+     * $table->fullText('content')
+     *       ->partitionBy(['user_id'])
+     *       ->orderBy(['created_at'])
+     *       ->options(['sort_order_sharding' => true]);
      * @return SearchIndexDefinition
      */
  1. Consider adding validation for supported algorithms if there are specific ones supported by Spanner.
CHANGELOG.md (1)

Line range hint 3-9: Maintain consistent hyphenation in compound adjectives.

The term "full text" should be hyphenated when used as a compound adjective modifying "search".

-- add support for full text search (#231)
+- add support for full-text search (#231)
🧰 Tools
🪛 LanguageTool

[uncategorized] ~3-~3: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... v8.3.0 (2024-09-02) - add support for full text search (#231) - add support for change ...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

README.md (3)

352-355: Improve heading style consistency and hyphenation

  1. Use setext-style heading (with underlines) to match the document's style:
-### Full Text Search
+Full-Text Search
+================
  1. Hyphenate "full-text" when used as a compound adjective:
-Spanner supports [Full Text Search]
+Spanner supports [Full-Text Search]
🧰 Tools
🪛 LanguageTool

[uncategorized] ~352-~352: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...angeStream('stream_name'); }); ``` ### Full Text Search Spanner supports [Full Text Sea...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[uncategorized] ~354-~354: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...### Full Text Search Spanner supports [Full Text Search](https://cloud.google.com/spanne...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🪛 Markdownlint

352-352: Expected: setext; Actual: atx
Heading style

(MD003, heading-style)


356-368: Enhance schema definition documentation

The example is clear, but consider adding a note about available tokenizer functions. Based on the PR objectives, we should document all supported tokenization options:

  • TOKENIZE_FULLTEXT
  • TOKENIZE_SUBSTRING
  • TOKENIZE_NGRAMS
  • TOKEN
  • TOKENIZE_BOOL
  • TOKENIZE_NUMBER

Also, consider documenting the available options for fullText() method, such as:

  • PARTITION BY
  • ORDER BY
  • OPTIONS(sort_order_sharding)

369-376: Expand query builder usage documentation

Consider enhancing this section with:

  1. Examples for all search methods:
// Full-text search
User::query()->searchFullText('UserNameTokens', 'John OR Kevin', ['enhance_query' => true]);

// Substring search
User::query()->searchSubstring('UserNameTokens', 'Jo');

// N-grams search
User::query()->searchNgrams('UserNameTokens', 'John');
  1. Document available options for each search method
  2. Add common use cases and best practices
tests/Query/BuilderTest.php (4)

1107-1129: LGTM! Consider adding more test cases.

The test correctly validates the basic full-text search functionality. However, consider adding test cases for:

  • Empty search terms
  • Special characters in search terms
  • Case sensitivity

1131-1153: LGTM! Consider testing additional search options.

The test effectively validates the enhance_query option. Consider expanding it to test other available search options that Cloud Spanner supports.


1155-1177: LGTM! Consider adding edge case tests.

The test effectively validates n-gram search functionality. Consider adding test cases for:

  • Minimum n-gram size at boundaries
  • Empty or invalid n-gram patterns
  • Unicode character sequences

1179-1201: LGTM! Consider testing more complex substring patterns.

The test effectively validates substring search functionality. Consider adding test cases for:

  • Multi-character substrings
  • Substrings at start/end of text
  • Unicode substrings
  • Empty substrings
src/Schema/Grammar.php (1)

64-65: Nitpick: Consider using a multi-line string for better readability.

Since the SQL query spans multiple lines, consider using a multi-line string (heredoc or nowdoc syntax) for better readability and maintainability.

Apply this diff to refactor the SQL query:

- return sprintf(
-     'select index_name as `index_name` from information_schema.indexes where table_schema = \'\' and table_name = %s',
-     $this->quoteString($table),
- );
+ return <<<SQL
+     select index_name as `index_name` 
+     from information_schema.indexes
+     where table_schema = '' and table_name = {$this->quoteString($table)}
+ SQL;

Similarly, apply this diff to the compileForeignKeys method:

- return sprintf(
-     'select constraint_name as `key_name` from information_schema.table_constraints where constraint_type = "FOREIGN KEY" and table_schema = \'\' and table_name = %s',
-     $this->quoteString($table),
- );
+ return <<<SQL
+     select constraint_name as `key_name`
+     from information_schema.table_constraints
+     where constraint_type = 'FOREIGN KEY' and table_schema = '' and table_name = {$this->quoteString($table)}
+ SQL;

Also applies to: 78-79

tests/Schema/BlueprintTest.php (3)

261-304: Ensure proper exception handling during SQL statement execution

The test currently executes SQL statements without handling potential exceptions. To improve test robustness and provide meaningful feedback on failures, consider wrapping the statement execution within a try-catch block.

Apply this diff to add exception handling:

 foreach ($statements as $statement) {
-    $conn->statement($statement);
+    try {
+        $conn->statement($statement);
+    } catch (\Exception $e) {
+        $this->fail("Failed to execute statement: {$e->getMessage()}");
+    }
 }

301-303: Consider adding cleanup procedures after test execution

To maintain test isolation and prevent side effects, consider dropping the created tables ($parentTableName and $childTableName) after the test runs.

You can add the following code to clean up:

// Cleanup created tables
$blueprint = new Blueprint($childTableName);
$blueprint->drop();
$blueprint->build($conn, $grammar);

$blueprint = new Blueprint($parentTableName);
$blueprint->drop();
$blueprint->build($conn, $grammar);

261-261: Add DocBlock comment to the test method

Adding a DocBlock comment to test_tokenList_and_search_index will enhance code readability and provide context for future maintainers.

Apply this diff to include the comment:

+/**
+ * Test the creation of token list columns and full-text search indexes with interleaving.
+ */
 public function test_tokenList_and_search_index(): void
 {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4d0e373 and 68be9eb.

📒 Files selected for processing (13)
  • CHANGELOG.md (1 hunks)
  • README.md (1 hunks)
  • compose.yaml (1 hunks)
  • src/Concerns/SharedGrammarCalls.php (2 hunks)
  • src/Query/Builder.php (1 hunks)
  • src/Query/Concerns/UsesFullTextSearch.php (1 hunks)
  • src/Query/Grammar.php (2 hunks)
  • src/Schema/Blueprint.php (5 hunks)
  • src/Schema/Grammar.php (14 hunks)
  • src/Schema/SearchIndexDefinition.php (1 hunks)
  • src/Schema/TokenizerFunction.php (1 hunks)
  • tests/Query/BuilderTest.php (7 hunks)
  • tests/Schema/BlueprintTest.php (4 hunks)
✅ Files skipped from review due to trivial changes (2)
  • compose.yaml
  • src/Schema/TokenizerFunction.php
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.md

[uncategorized] ~3-~3: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... v8.3.0 (2024-09-02) - add support for full text search (#231) - add support for change ...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

README.md

[uncategorized] ~352-~352: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...angeStream('stream_name'); }); ``` ### Full Text Search Spanner supports [Full Text Sea...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[uncategorized] ~354-~354: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...### Full Text Search Spanner supports [Full Text Search](https://cloud.google.com/spanne...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🪛 Markdownlint
README.md

352-352: Expected: setext; Actual: atx
Heading style

(MD003, heading-style)

🔇 Additional comments (26)
src/Concerns/SharedGrammarCalls.php (1)

20-20: Verify PHP version compatibility for BackedEnum usage.

The addition of BackedEnum implies a PHP 8.1+ requirement. Let's verify this is compatible with the project's constraints.

✅ Verification successful

BackedEnum usage is compatible with PHP version requirements

The project requires PHP ^8.2 which fully supports BackedEnum (introduced in PHP 8.1). The import and usage are safe.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check composer.json for PHP version constraint
echo "PHP Version Requirement:"
cat composer.json | jq -r '.require.php'

Length of output: 104

src/Schema/SearchIndexDefinition.php (2)

1-20: LGTM! File structure and headers are well-organized.

The file includes proper copyright notice, license information, and namespace declaration.


34-46: Verify integration with Blueprint class.

Let's verify how this class is integrated with the Blueprint class.

✅ Verification successful

Let me gather more specific information about how SearchIndexDefinition is used in Blueprint.php.


Integration with Blueprint class is properly implemented

The SearchIndexDefinition class is correctly integrated within Blueprint's fullText() method. The method instantiates SearchIndexDefinition with the required parameters (name, index, columns, algorithm) and returns the instance. The implementation aligns with the class definition we reviewed.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check how SearchIndexDefinition is used in Blueprint class
# Expected: Find usage in fullText() method

# Search for SearchIndexDefinition usage in Blueprint class
ast-grep --pattern 'class Blueprint {
  $$$
  fullText($$$) {
    $$$
    new SearchIndexDefinition($$$)
    $$$
  }
  $$$
}'

# Backup: Search for any direct references
rg -l "SearchIndexDefinition" "src/Schema/Blueprint.php"

Length of output: 207


Script:

#!/bin/bash
# Get the actual implementation details of how SearchIndexDefinition is used
rg -A 10 "SearchIndexDefinition" "src/Schema/Blueprint.php"

# Also check for any fullText method implementations
ast-grep --pattern 'fullText($$$) {
  $$$
}'

Length of output: 589

src/Query/Concerns/UsesFullTextSearch.php (3)

1-19: LGTM: File structure follows best practices

The file has proper licensing, PHP opening tag, and clear namespace declaration.


20-21: LGTM: Trait follows Laravel conventions

The trait is properly named and placed in the Concerns namespace, following Laravel's architectural patterns.


76-101: ⚠️ Potential issue

Enhance security in addSearchCondition method

The method should validate the $type parameter and sanitize user input:

Apply these security enhancements:

 protected function addSearchCondition(
     string $type,
     string $tokens,
     string $query,
     array $options = [],
     string $boolean = 'and',
 ): void
 {
+    $validTypes = ['SearchFullText', 'SearchNgrams', 'SearchSubstring'];
+    if (!in_array($type, $validTypes, true)) {
+        throw new InvalidArgumentException('Invalid search type');
+    }
+
+    // Sanitize user input before adding to bindings
+    $sanitizedQuery = htmlspecialchars($query, ENT_QUOTES, 'UTF-8');
+
     $this->wheres[] = [
         'type' => $type,
         'tokens' => $tokens,
         'boolean' => $boolean,
-        'query' => $query,
+        'query' => $sanitizedQuery,
         'options' => $options,
     ];
-    $this->addBinding($query);
+    $this->addBinding($sanitizedQuery);
 }

Additionally, verify that the query builder's prepare statement mechanism properly handles parameter binding to prevent SQL injection.

src/Query/Grammar.php (3)

25-25: LGTM!

The addition of Illuminate\Support\Arr is appropriate for array manipulation in the new search functionality.


145-174: LGTM! Clean and consistent implementation of search methods.

The three search methods follow a consistent pattern and have proper type documentation. The separation into distinct methods for each search type (SEARCH, SEARCH_NGRAMS, SEARCH_SUBSTRING) aligns well with Spanner's functionality.


175-190: ⚠️ Potential issue

Add input validation and consider SQL injection risks.

While the implementation is clean, there are a few security and robustness considerations:

  1. The method lacks validation for required parameters in the $where array.
  2. Raw query string handling could be vulnerable to SQL injection if not properly sanitized upstream.
  3. Option value formatting might need additional type checking and escaping.

Consider adding these safety measures:

 protected function buildSearchFunction(string $function, array $where): string
 {
+    if (!isset($where['tokens']) || !isset($where['query'])) {
+        throw new InvalidArgumentException("Missing required 'tokens' or 'query' parameters");
+    }
+    
     $tokens = $this->wrap($where['tokens']);
-    $rawQuery = $where['query'];
+    $rawQuery = $this->sanitizeSearchQuery($where['query']);
     $options = $where['options'] ?? [];
+    
+    // Validate option values
+    foreach ($options as $value) {
+        if (!is_scalar($value)) {
+            throw new InvalidArgumentException('Search options must be scalar values');
+        }
+    }
     
     return $function . '(' . implode(', ', [
         $tokens,
         $this->quoteString($rawQuery),
         ...Arr::map($options, fn($v, $k) => "$k => {$this->formatOptionValue($v)}"),
     ]) . ')';
 }

+/**
+ * Sanitize the search query to prevent SQL injection
+ * @param string $query
+ * @return string
+ */
+protected function sanitizeSearchQuery(string $query): string
+{
+    // Add appropriate sanitization logic here
+    return $query;
+}

Let's verify the usage of these search methods across the codebase:

src/Query/Builder.php (1)

32-32: Verify the full-text search implementation.

Let's verify the trait implementation and its integration with Spanner:

✅ Verification successful

Full-text search implementation is properly integrated and tested

The verification confirms:

  • The UsesFullTextSearch trait exists and implements all required methods: searchFullText, searchNgrams, and searchSubstring
  • Each search method follows a consistent interface with proper type hints
  • Comprehensive test coverage exists in BuilderTest.php with dedicated test cases for each search method
  • Schema-level support is verified through test_tokenList_and_search_index in BlueprintTest.php
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the full-text search implementation

# Test 1: Check if the trait exists and contains the required methods
echo "Checking trait implementation..."
rg -A 10 "trait UsesFullTextSearch" src/

# Test 2: Verify the search method implementations
echo "Checking search methods..."
rg -A 5 "function search(FullText|Ngrams|Substring)" src/

# Test 3: Check for test coverage
echo "Checking test coverage..."
rg -A 5 "function test.*search" tests/

Length of output: 4544

src/Schema/Blueprint.php (2)

173-173: LGTM: Consistent formatting updates

The addition of trailing commas in array definitions improves consistency and makes future additions easier.

Also applies to: 184-184, 206-206


280-280: LGTM: Minor formatting update

src/Schema/Grammar.php (12)

42-42: LGTM!

The addition of the Invisible modifier to the $modifiers property is correct and aligns with the new functionality for invisible columns.


110-111: LGTM!

The addition of the addRowDeletionPolicy method call to the compileCreate method is correct and aligns with the new functionality for row deletion policies.


166-167: LGTM!

The change to use the wrapArray method and pass the $command->columns directly is correct and simplifies the code.


217-220: LGTM!

The implementation of the compileDropFullText method is correct and handles dropping a full-text search index.


271-271: LGTM!

The changes to use string interpolation instead of concatenation in the compileCreateSequence, compileCreateSequenceIfNotExists, and compileAlterSequence methods are correct and improve readability.

Also applies to: 281-281, 291-291


400-400: LGTM!

The change to use the formatOptions method in the formatChangeStreamOptions method is correct and ensures consistent formatting of options.


517-518: LGTM!

The change to use string interpolation instead of sprintf in the compileDropIndex method is correct and improves readability.


843-848: LGTM!

The implementation of the modifyInvisible method is correct and handles adding the HIDDEN modifier for invisible columns.


321-322: Verify the usage of the new formatOptions method.

The implementation of the formatOptions method looks correct and handles formatting options as key-value pairs. However, ensure that this method is called appropriately from other parts of the codebase where options need to be formatted.

Run the following script to verify the usage of the formatOptions method:

#!/bin/bash
# Description: Verify the usage of the `formatOptions` method.

# Test: Search for calls to the `formatOptions` method.
# Expect: Usage in places where options need to be formatted.
ast-grep --language php --pattern '
  $_ = function $_ {
    $$$
    $this->formatOptions($_)
    $$$
  };
'

Also applies to: 334-338


177-208: Verify the usage of the new compileFullText method.

The implementation of the compileFullText method looks correct and handles the various options for creating a full-text search index. However, ensure that this method is called appropriately from other parts of the codebase where full-text search indexes are created.

Run the following script to verify the usage of the compileFullText method:

#!/bin/bash
# Description: Verify the usage of the `compileFullText` method.

# Test: Search for calls to the `compileFullText` method.
# Expect: Usage in places where full-text search indexes are created.
ast-grep --language php --pattern '
  $_ = function $_ {
    $$$
    $this->compileFullText($_)
    $$$
  };
'

790-806: Verify the usage of the new typeTokenList method.

The implementation of the typeTokenList method looks correct and handles creating the column definition for a TOKENLIST type. However, ensure that this method is called appropriately from other parts of the codebase where TOKENLIST columns are defined.

Run the following script to verify the usage of the typeTokenList method:

✅ Verification successful

The typeTokenList method is correctly integrated and used in the codebase

The verification shows that:

  • The method is properly integrated into the Schema system through Blueprint::tokenList() method
  • It's being actively used in tests (tests/Schema/BlueprintTest.php, tests/Query/BuilderTest.php) with various tokenizer functions (FullText, Ngrams, Substring)
  • The implementation is demonstrated in the README.md with a practical example
  • The method correctly generates the expected SQL syntax as verified by test assertions
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of the `typeTokenList` method.

# Test: Search for calls to the `typeTokenList` method.
# Expect: Usage in places where `TOKENLIST` columns are defined.
ast-grep --language php --pattern '
  $_ = function $_ {
    $$$
    $this->typeTokenList($_)
    $$$
  };
'

Length of output: 396


Script:

#!/bin/bash
# Search for calls to typeTokenList method
ast-grep --pattern '$this->typeTokenList($_)'

# Search for any references to TOKENLIST in the codebase
rg -i "tokenlist" -A 3 -B 3

# Search for the class that contains typeTokenList to understand its usage
ast-grep --pattern 'class $_ {
  $$$
  protected function typeTokenList($_) {
    $$$
  }
  $$$
}'

Length of output: 6938


20-20: Verify the usage of the BackedEnum type.

Ensure that the BackedEnum type is used correctly and consistently throughout the codebase, especially in the formatOptions method where it is used to format option values.

Run the following script to verify the usage of BackedEnum:

✅ Verification successful

BackedEnum usage is correctly implemented

The verification shows that BackedEnum is properly used in the codebase:

  • It's correctly imported in both Schema/Grammar.php and Concerns/SharedGrammarCalls.php
  • It's properly typed in method signatures and docblocks
  • The formatOptionValue method in SharedGrammarCalls.php correctly handles BackedEnum values by accessing their value property
  • The formatOptions method in Grammar.php correctly accepts and processes BackedEnum values as part of its options array
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of the `BackedEnum` type.

# Test 1: Search for the `BackedEnum` type usage. 
# Expect: Usage in the `formatOptions` method and potentially other places.
rg -A 5 'BackedEnum'

# Test 2: Search for the `formatOptions` method usage. 
# Expect: Consistent usage with `BackedEnum` values.
ast-grep --language php --pattern '
  $_ = function $_ {
    $$$
    $this->formatOptions($_)
    $$$
  };
'

Length of output: 396


Script:

#!/bin/bash
# Let's try a different approach to verify BackedEnum usage

# Search for BackedEnum usage in PHP files
rg -t php 'BackedEnum' -B 2 -A 5

# Search for formatOptions method definition and usage
rg -t php 'formatOptions' -B 2 -A 5

# Search for enum declarations that might be using BackedEnum
rg -t php 'enum.*implements' -B 1 -A 3

Length of output: 4033

tests/Schema/BlueprintTest.php (2)

59-59: Successfully added tokenList column to test all valid types

Including the tokenList column in test_create_with_all_valid_types enhances the test coverage by verifying the creation of token list columns.


284-290: Verify the inclusion of pid in the full-text index

Including pid (parent ID) in the full-text search index may not be necessary if it's not intended for textual search. Please confirm whether pid should be part of the full-text index or if indexing only nameTokens is sufficient.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (10)
src/Schema/Blueprint.php (2)

247-261: Enhance documentation and add parameter validation.

The method implementation looks good, but could benefit from the following improvements:

  1. Enhance the PHPDoc with parameter descriptions and example usage:
     /**
      * @param string $column
+     *     The name of the column to store tokens
      * @param TokenizerFunction $function
+     *     The tokenization function (e.g., TOKENIZE_FULLTEXT, TOKENIZE_NGRAMS)
      * @param string $target
+     *     The target column to generate tokens from
      * @param array $options
+     *     Additional tokenization options
+     * @example
+     *     $table->tokenList('description_tokens', TokenizerFunction::TOKENIZE_FULLTEXT, 'description');
      * @return ColumnDefinition
      */
  1. Consider adding parameter validation:
     public function tokenList(string $column, TokenizerFunction $function, string $target, array $options = []): ColumnDefinition
     {
+        if (empty($target)) {
+            throw new InvalidArgumentException('Target column cannot be empty');
+        }
+
         return $this->addColumn('tokenList', $column, [
             'function' => $function,
             'target' => $target,
             'options' => $options,
         ])->invisible()->nullable();
     }

288-307: Add type hints and enhance documentation.

The method implementation is solid but could be improved with better type safety and documentation:

  1. Add type hints and enhance PHPDoc:
     /**
+     * Create a full-text search index on the specified columns.
      * @param string|list<string> $columns
+     *     The column(s) to index for full-text search
      * @param string|null $name
+     *     Optional custom name for the index (default: auto-generated)
      * @param string|null $algorithm
+     *     Optional search algorithm to use
+     * @example
+     *     // Single column
+     *     $table->fullText('description');
+     *     
+     *     // Multiple columns with custom name
+     *     $table->fullText(['title', 'description'], 'content_search');
      * @return SearchIndexDefinition
      */
-    public function fullText($columns, $name = null, $algorithm = null)
+    public function fullText(string|array $columns, ?string $name = null, ?string $algorithm = null): SearchIndexDefinition
  1. Consider adding parameter validation:
     {
         $type = 'fullText';
         $columns = (array) $columns;
+        
+        if (empty($columns)) {
+            throw new InvalidArgumentException('At least one column must be specified for full-text search index');
+        }
src/Schema/Grammar.php (3)

170-208: Consider adding validation for search index options.

The compileFullText method implementation looks good, but consider adding validation for the search index options to prevent invalid configurations.

Consider adding validation before line 203:

+        $validOptions = ['sort_order_sharding'];
+        $invalidOptions = array_diff(array_keys($command->getOptions()), $validOptions);
+        if (!empty($invalidOptions)) {
+            throw new InvalidArgumentException(
+                'Invalid search index options: ' . implode(', ', $invalidOptions)
+            );
+        }
         $schema .= $command->getOptions() !== []
             ? ' ' . $this->formatOptions($command->getOptions())
             : '';

Line range hint 271-291: Improve sequence options formatting consistency.

The sequence-related methods have been updated to use the new formatOptions method, which is good for consistency. However, there's an inconsistency in the whitespace handling between these methods.

Apply this diff to maintain consistent whitespace:

-        return "create sequence {$this->wrap($command->sequence)}{$this->formatSequenceOptions($command)}";
+        return "create sequence {$this->wrap($command->sequence)} {$this->formatSequenceOptions($command)}";

-        return "create sequence if not exists {$this->wrap($command->sequence)}{$this->formatSequenceOptions($command)}";
+        return "create sequence if not exists {$this->wrap($command->sequence)} {$this->formatSequenceOptions($command)}";

-        return 'alter sequence ' . $this->wrap($command->sequence) . ' set' . $this->formatSequenceOptions($command);
+        return 'alter sequence ' . $this->wrap($command->sequence) . ' set ' . $this->formatSequenceOptions($command);

790-806: Enhance error handling in typeTokenList method.

The token list implementation looks good, but could benefit from additional error handling.

Consider adding validation:

     protected function typeTokenList(Fluent $column): string
     {
+        if (empty($column->target)) {
+            throw new InvalidArgumentException('Token list target column is required');
+        }
+
         $args = [];
         $args[] = $this->wrap($column->target);
tests/Schema/BlueprintTest.php (1)

261-304: LGTM: Well-structured test for full-text search features

The new test method comprehensively covers:

  • Interleaved table relationships
  • TokenList column creation with language support
  • Full-text search index creation with multiple options
  • Proper verification of generated SQL statements

Consider adding the following test cases to improve coverage:

  • Test with different tokenizer functions (NGRAMS, BOOL, NUMBER)
  • Test error cases for invalid language tags
  • Test multiple tokenList columns in the same table
tests/Query/BuilderTest.php (4)

1106-1128: Consider adding more test cases for comprehensive coverage.

The test verifies basic full-text search functionality well, but could be enhanced with additional test cases:

  • Test with AND operator
  • Test with special characters
  • Test with empty search string
  • Test with non-existent terms
  • Test error cases (e.g., searching non-indexed column)

1130-1152: Test additional search options for better coverage.

The test only verifies the 'enhance_query' option. Consider adding test cases for:

  • Multiple options combined
  • Different option values
  • Invalid options

1154-1176: Enhance n-gram search test coverage.

Consider adding test cases for:

  • Different n-gram lengths
  • Edge cases (e.g., search term shorter than min_ngrams)
  • Multiple n-gram matches
  • Non-matching n-grams

1178-1200: Add more substring search test cases.

Consider adding test cases for:

  • Substring at different positions (start, middle, end)
  • Empty substring
  • Special characters
  • Case sensitivity
  • Multiple matches
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 68be9eb and f2b4449.

📒 Files selected for processing (7)
  • src/Concerns/SharedGrammarCalls.php (2 hunks)
  • src/Query/Builder.php (1 hunks)
  • src/Query/Grammar.php (2 hunks)
  • src/Schema/Blueprint.php (2 hunks)
  • src/Schema/Grammar.php (10 hunks)
  • tests/Query/BuilderTest.php (2 hunks)
  • tests/Schema/BlueprintTest.php (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/Concerns/SharedGrammarCalls.php
  • src/Query/Builder.php
🔇 Additional comments (9)
src/Query/Grammar.php (3)

25-25: LGTM!

The Arr class import is appropriate for the array manipulation needs in the new search functionality.


145-174: LGTM! The search method implementations are well-structured.

The three search methods (whereSearchFullText, whereSearchNgrams, whereSearchSubstring) are well-documented and follow a consistent pattern.


175-191: Consider enhancing type safety and validation.

While the implementation is solid, consider these improvements:

  1. The options array type could be more specific than array<string, scalar>. Consider defining an enum or constants for valid option keys.
  2. The buildSearchFunction could validate the $function parameter against allowed values.

Example enhancement:

private const SEARCH_FUNCTIONS = [
    'search' => true,
    'search_ngrams' => true,
    'search_substring' => true,
];

private const SEARCH_OPTIONS = [
    'distance' => true,
    'min_tokens' => true,
    // ... other valid options
];

protected function buildSearchFunction(string $function, array $where): string
{
    if (!isset(self::SEARCH_FUNCTIONS[$function])) {
        throw new InvalidArgumentException("Invalid search function: {$function}");
    }
    
    // Validate options
    foreach ($where['options'] as $key => $value) {
        if (!isset(self::SEARCH_OPTIONS[$key])) {
            throw new InvalidArgumentException("Invalid search option: {$key}");
        }
    }
    
    // ... rest of the implementation
}

Let's verify the usage of formatOptionValue:

src/Schema/Grammar.php (4)

42-42: LGTM: Added 'Invisible' modifier support.

The addition of 'Invisible' to the modifiers array correctly enables support for hidden columns in the schema.


324-332: LGTM: Well-implemented options formatting.

The formatOptions method is well-implemented with proper type handling and snake case conversion for option keys.


836-848: LGTM: Clean implementation of invisible column modifier.

The modifyInvisible method is well-implemented, using the 'hidden' keyword as per Google Cloud Spanner's syntax.


Line range hint 1-849: Verify full-text search compatibility with the emulator.

Since this PR adds full-text search support, we should verify compatibility with the emulator as mentioned in issue #234.

✅ Verification successful

Based on the evidence gathered, I can now provide a final response:

Full-text search functionality is already well-tested with emulator support

The codebase shows comprehensive testing for full-text search functionality:

  • Tests cover various search types (full-text, ngrams, substring) with the emulator
  • TestCase.php includes proper emulator setup via setUpEmulatorInstance
  • Search functionality tests are running against both regular and emulator environments through getDefaultConnection()
  • No emulator-specific limitations or restrictions are documented for search functionality
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if there are any emulator-specific conditionals or tests
# that need to be updated for full-text search support.

# Test 1: Look for emulator-related test files
fd -e php test emulator

# Test 2: Search for emulator-specific code paths
rg -l "emulator|fake|mock" --type php

# Test 3: Check for any existing full-text search tests
rg -l "fulltext|full_text|search" --type php test/

Length of output: 462


Script:

#!/bin/bash
# Let's check for emulator-specific test cases and search functionality tests

# Look for emulator-related test files in the correct test directory
fd -e php emulator tests/

# Search for emulator-specific code and configurations
rg -A 5 "emulator|fake|mock" tests/TestCase.php

# Check for any existing full-text search tests in the correct test directory
rg -l "fulltext|full_text|search" tests/

# Look for any emulator-specific limitations or configurations
rg -A 5 "emulator.*limitation|not.*supported.*emulator" tests/

Length of output: 487


Script:

#!/bin/bash
# Let's examine the specific test files that mention search functionality

# Check the search-related test implementations
rg -A 10 "fulltext|full_text|search" tests/Schema/BlueprintTest.php

# Look for any search index test cases
rg -A 10 "fulltext|full_text|search" tests/Query/BuilderTest.php

# Check for any emulator-specific setup or configurations in test cases
rg -B 5 -A 10 "setUp|beforeEach" tests/TestCase.php

# Look for any search-related grammar implementations
rg -A 5 "compileFullText|compileSearch" src/Schema/Grammar.php

Length of output: 6431

tests/Schema/BlueprintTest.php (2)

23-23: LGTM: Import statement correctly added

The TokenizerFunction import is necessary for the new full-text search functionality being tested.


59-59: LGTM: Comprehensive test case update

The test_create_with_all_valid_types method is appropriately updated to include tokenList column testing with language tag support.

Also applies to: 87-87

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
CHANGELOG.md (1)

3-3: Maintain consistent hyphenation in compound adjectives.

For better readability and grammatical correctness, "full text search" should be hyphenated as "full-text search" when used as a compound adjective.

-- add support for full text search (#231)
+- add support for full-text search (#231)
🧰 Tools
🪛 LanguageTool

[uncategorized] ~3-~3: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... v8.3.0 (2024-09-02) - add support for full text search (#231) - add support for invisib...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

src/Schema/Grammar.php (1)

324-338: Document supported option types and values.

The formatOptions method provides good type safety with BackedEnum. Consider:

  1. Adding PHPDoc to document supported option types and their expected values
  2. Validating the snake_case conversion for special cases (e.g., acronyms)
tests/Schema/BlueprintTest.php (1)

301-303: Consider enhancing test coverage.

While the test executes the SQL statements, it would be beneficial to:

  1. Add assertions to verify the actual database state after execution
  2. Add cleanup code to drop the created tables
  3. Include test cases for error scenarios (e.g., invalid language tags, missing parent table)

Example enhancement:

 foreach ($statements as $statement) {
     $conn->statement($statement);
 }
+
+// Verify database state
+$this->assertTrue(
+    $conn->getSchemaBuilder()->hasTable($childTableName),
+    'Child table should exist'
+);
+
+// Cleanup
+$blueprint = new Blueprint($childTableName);
+$blueprint->drop();
+$blueprint->build($conn, $grammar);
+
+$blueprint = new Blueprint($parentTableName);
+$blueprint->drop();
+$blueprint->build($conn, $grammar);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f2b4449 and 447f8af.

📒 Files selected for processing (3)
  • CHANGELOG.md (1 hunks)
  • src/Schema/Grammar.php (8 hunks)
  • tests/Schema/BlueprintTest.php (4 hunks)
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.md

[uncategorized] ~3-~3: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... v8.3.0 (2024-09-02) - add support for full text search (#231) - add support for invisib...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🔇 Additional comments (6)
CHANGELOG.md (1)

Line range hint 4-10: LGTM! Well-structured changelog entries.

The changelog entries are clear, concise, and properly reference their corresponding PR numbers. The format is consistent and provides sufficient information about each change.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~3-~3: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... v8.3.0 (2024-09-02) - add support for full text search (#231) - add support for invisib...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

src/Schema/Grammar.php (3)

Line range hint 271-291: LGTM! Minor formatting improvements.

The sequence-related method changes maintain consistency with the rest of the codebase.


790-806: Add validation for target column and document tokenizer functions.

The typeTokenList implementation looks good but consider:

  1. Validating that the target column exists in the table
  2. Adding PHPDoc to document supported tokenizer functions and their use cases
#!/bin/bash
# Check for existing token list columns to understand common patterns
ast-grep --pattern 'tokenlist as ($_($$))'

170-208: Consider adding input validation for partition and order by clauses.

The implementation looks solid, but consider adding validation for:

  1. Partition by columns to ensure they exist in the table
  2. Order by values to validate against allowed values (ASC/DESC)
tests/Schema/BlueprintTest.php (2)

23-23: LGTM!

The import statement is correctly placed and necessary for the new test functionality.


261-304: LGTM! Comprehensive test implementation.

The test thoroughly validates the SQL generation for tokenList and fullText search index functionality, including proper interleaving with parent tables.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 447f8af and 4865c3d.

📒 Files selected for processing (2)
  • CHANGELOG.md (1 hunks)
  • src/Schema/Blueprint.php (2 hunks)
🧰 Additional context used
🪛 LanguageTool
CHANGELOG.md

[uncategorized] ~3-~3: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... v8.3.0 (2024-09-02) - add support for full text search (#235) - add support for invisib...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🔇 Additional comments (2)
src/Schema/Blueprint.php (2)

288-307: 🛠️ Refactor suggestion

Add type hints and improve documentation.

The implementation needs the following improvements:

  1. Add type hints to parameters and return type:
-    public function fullText($columns, $name = null, $algorithm = null)
+    public function fullText(string|array $columns, ?string $name = null, ?string $algorithm = null): SearchIndexDefinition
  1. Enhance PHPDoc with parameter descriptions:
     /**
-     * @param string|list<string> $columns
-     * @param string|null $name
-     * @param string|null $algorithm
+     * @param string|list<string> $columns The columns to include in the full-text search index
+     * @param string|null $name The name of the index (auto-generated if null)
+     * @param string|null $algorithm The search algorithm to use
      * @return SearchIndexDefinition
      */
  1. The past review comment about missing features (PARTITION BY, ORDER BY, OPTIONS) is still valid and should be addressed.

247-261: Verify TokenizerFunction type and enhance documentation.

The implementation looks good, but a few suggestions:

  1. The documentation could be enhanced to describe:
    • The purpose of the target parameter
    • Available options in the options array
    • Why the column is made invisible and nullable by default
  2. Let's verify the TokenizerFunction type exists and includes all tokenization options mentioned in the PR objectives.
✅ Verification successful

TokenizerFunction type exists and is well-implemented

The implementation is verified and complete:

  • TokenizerFunction is properly defined as an enum with string values
  • All required tokenization options are present: Exact, Ngrams, Substring, FullText, Bool, and Number
  • Tests demonstrate proper usage with options like 'language_tag'

Suggested documentation improvements for the tokenList method:

/**
 * Create a token list column for full-text search capabilities.
 *
 * @param string $column Name of the token list column
 * @param TokenizerFunction $function Tokenization strategy (e.g., Substring, FullText)
 * @param string $target Name of the source column to be tokenized
 * @param array $options Additional tokenization options (e.g., ['language_tag' => 'en'])
 * @return ColumnDefinition
 */
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify TokenizerFunction type exists and includes required options
# Expected: Find TokenizerFunction class/enum with TOKENIZE_FULLTEXT, TOKENIZE_SUBSTRING, etc.

# Search for TokenizerFunction definition
echo "Searching for TokenizerFunction definition..."
rg -l "class|enum.*TokenizerFunction"

# Search for tokenization options
echo "Searching for tokenization options..."
rg "TOKENIZE_(FULLTEXT|SUBSTRING|NGRAMS|BOOL|NUMBER)|TOKEN\b"

Length of output: 2047


Script:

#!/bin/bash
# Let's examine the TokenizerFunction class content
echo "=== TokenizerFunction Class Content ==="
cat src/Schema/TokenizerFunction.php

# Let's also check the tests to understand usage
echo -e "\n=== TokenizerFunction Tests ==="
rg -A 5 "tokenList|TokenizerFunction" tests/Schema/BlueprintTest.php

Length of output: 2243

@taka-oyama taka-oyama force-pushed the feature/full-text-search branch 2 times, most recently from d131225 to 8fb35db Compare November 8, 2024 04:48
@taka-oyama taka-oyama changed the base branch from master to feature/consolidate-options November 8, 2024 04:48
@colopl colopl deleted a comment from coderabbitai bot Nov 8, 2024
@taka-oyama taka-oyama requested review from a team and halnique November 8, 2024 04:49
@taka-oyama taka-oyama force-pushed the feature/full-text-search branch from 8fb35db to 4e91eb4 Compare November 8, 2024 04:51
@colopl colopl deleted a comment from coderabbitai bot Nov 8, 2024
Base automatically changed from feature/consolidate-options to master November 8, 2024 05:53
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (10)
src/Schema/TokenizerFunction.php (1)

23-31: Consider adding PHPDoc comments for better documentation.

To improve maintainability and help other developers understand the purpose of each tokenizer function, consider adding PHPDoc comments.

Here's a suggested enhancement:

 enum TokenizerFunction: string
 {
+    /**
+     * Matches exact tokens.
+     * @since 1.x.x
+     */
     case Exact = 'token';
+
+    /**
+     * Tokenizes text into n-grams for partial matching.
+     * @since 1.x.x
+     */
     case Ngrams = 'tokenize_ngrams';
+
+    /**
+     * Enables substring matching within text.
+     * @since 1.x.x
+     */
     case Substring = 'tokenize_substring';
+
+    /**
+     * Provides full-text search capabilities.
+     * @since 1.x.x
+     */
     case FullText = 'tokenize_fulltext';
+
+    /**
+     * Tokenizes boolean values.
+     * @since 1.x.x
+     */
     case Bool = 'tokenize_bool';
+
+    /**
+     * Tokenizes numeric values.
+     * @since 1.x.x
+     */
     case Number = 'tokenize_number';
 }
src/Schema/SearchIndexDefinition.php (2)

21-33: Add return type hints to @method annotations.

While the documentation is thorough, consider adding return type hints to the @method annotations for better type safety and IDE support.

-@method $this partitionBy(string|string[] $columns)
-@method $this orderBy(string|string[] $columns)
-@method $this sortOrderSharding(bool $toggle = true)
-@method $this disableAutomaticUidColumn(bool $toggle = true)
+@method self partitionBy(string|string[] $columns): static
+@method self orderBy(string|string[] $columns): static
+@method self sortOrderSharding(bool $toggle = true): static
+@method self disableAutomaticUidColumn(bool $toggle = true): static

34-46: Consider defining option keys as class constants.

To improve maintainability and prevent typos, consider defining the option keys as class constants.

 class SearchIndexDefinition extends IndexDefinition
 {
+    private const OPTION_SORT_ORDER_SHARDING = 'sortOrderSharding';
+    private const OPTION_DISABLE_AUTOMATIC_UID_COLUMN = 'disableAutomaticUidColumn';
+
     /**
      * @return array{ sortOrderSharding?: bool, disableAutomaticUidColumn?: bool }
      */
     public function getOptions(): array
     {
         return array_filter([
-            'sortOrderSharding' => $this->sortOrderSharding,
-            'disableAutomaticUidColumn' => $this->disableAutomaticUidColumn,
+            self::OPTION_SORT_ORDER_SHARDING => $this->sortOrderSharding,
+            self::OPTION_DISABLE_AUTOMATIC_UID_COLUMN => $this->disableAutomaticUidColumn,
         ], static fn($v) => $v !== null);
     }
 }
src/Query/Grammar.php (1)

180-190: Consider enhancing type safety and validation.

While the implementation is solid, consider these improvements:

  1. Add runtime validation for required array keys
  2. Define a strict type for options using a custom DTO or enum for valid option keys

Example enhancement:

     protected function buildSearchFunction(string $function, array $where): string
     {
+        if (!isset($where['tokens'], $where['query'])) {
+            throw new InvalidArgumentException('Missing required keys: tokens, query');
+        }
+
         $tokens = $this->wrap($where['tokens']);
         $rawQuery = $where['query'];
         $options = $where['options'];
src/Schema/Blueprint.php (2)

247-261: Enhance PHPDoc with examples and options documentation.

The tokenList method implementation looks good, but the documentation could be improved to help users understand how to use it correctly.

Add examples and document valid options in the PHPDoc:

     /**
      * @param string $column
      * @param TokenizerFunction $function
      * @param string $target
      * @param array $options
+     * @example
+     * $table->tokenList('description_tokens', TokenizerFunction::TOKENIZE_FULLTEXT, 'description');
+     * $table->tokenList('title_tokens', TokenizerFunction::TOKENIZE_NGRAMS, 'title', ['min_length' => 2, 'max_length' => 4]);
+     * @throws \InvalidArgumentException If invalid options are provided
      * @return ColumnDefinition
      */

294-307: Add validation for algorithm parameter.

The method accepts an algorithm parameter but doesn't validate it. Consider adding validation to ensure only supported algorithms are used.

     public function fullText($columns, $name = null, $algorithm = null)
     {
+        $validAlgorithms = ['DOCUMENT', 'HTML'];
+        if ($algorithm !== null && !in_array($algorithm, $validAlgorithms, true)) {
+            throw new \InvalidArgumentException(
+                "Invalid algorithm. Supported values are: " . implode(', ', $validAlgorithms)
+            );
+        }
+
         $type = 'fullText';
         $columns = (array) $columns;
CHANGELOG.md (1)

3-3: Use proper hyphenation for compound adjective.

The compound adjective "full text" should be hyphenated when modifying "search".

Apply this diff:

-- add support for full text search (#235)
+- add support for full-text search (#235)
🧰 Tools
🪛 LanguageTool

[uncategorized] ~3-~3: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... v8.3.0 (2024-11-08) - add support for full text search (#235) - consolidate schema opti...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

README.md (3)

352-355: Enhance section header and introduction

  1. Use hyphenation for the compound adjective "full-text" consistently throughout the documentation.
  2. Consider expanding the introduction to highlight key benefits and use cases of full-text search.
-### Full Text Search
+### Full-Text Search

-Spanner supports [Full Text Search](https://cloud.google.com/spanner/docs/full-text-search) which allows you to search for text in columns.
+Spanner supports [Full-Text Search](https://cloud.google.com/spanner/docs/full-text-search) which allows you to perform efficient text searches across columns. This feature is particularly useful for implementing search functionality in applications that need to handle large text data with support for various languages and search patterns.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~352-~352: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...angeStream('stream_name'); }); ``` ### Full Text Search Spanner supports [Full Text Sea...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[uncategorized] ~354-~354: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...### Full Text Search Spanner supports [Full Text Search](https://cloud.google.com/spanne...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🪛 Markdownlint

352-352: Expected: setext; Actual: atx
Heading style

(MD003, heading-style)


356-368: Expand schema definition documentation

The example is good but could be more comprehensive. Consider adding:

  1. Documentation for all available tokenizer functions (FullText, Substring, NGrams).
  2. Explanation of available options for each tokenizer function.
  3. More details about the implications of using invisible columns.
 $schemaBuilder->create('user', function (Blueprint $table) {
     $table->uuid('id')->primary();
     $table->string('name');
-    // adds an invisible column for full text search
+    // adds an invisible column for full-text search
+    // Available tokenizer functions: TokenizerFunction::FullText, TokenizerFunction::Substring, TokenizerFunction::NGrams
+    // Common options:
+    //   - language_tag: Specifies the language for tokenization (e.g., 'en', 'ja')
+    //   - case_sensitive: Controls case sensitivity (true/false)
+    //   - symbols: Controls symbol handling (true/false)
     $table->tokenList('UserNameTokens', TokenizerFunction::FullText, 'name', ['language_tag' => 'en']);
     
-    // adds a SEARCH INDEX
+    // adds a SEARCH INDEX with default options
+    // You can also specify additional options:
+    // $table->fullText(['UserNameTokens'])
+    //       ->partitionBy(['id'])
+    //       ->orderBy(['id'])
+    //       ->options(['sort_order_sharding' => true]);
     $table->fullText(['UserNameTokens']);
 });

369-376: Enhance usage examples and documentation

The current example is basic. Consider adding:

  1. Examples for all search methods (searchFullText, searchSubstring, searchNgrams).
  2. Documentation for available options and their effects.
  3. Best practices and performance considerations.
 Once the schema has been applied, you can use the search methods in the query builder to search for text in the columns as below.

 ```php
+// Full-text search with query enhancement
 User::query()->searchFullText('UserNameTokens', 'John OR Kevin', ['enhance_query' => true])->get();
+
+// Substring search for partial matches
+User::query()->searchSubstring('UserNameTokens', 'Jo', ['case_sensitive' => false])->get();
+
+// N-grams search for flexible matching
+User::query()->searchNgrams('UserNameTokens', 'John', ['min_length' => 2])->get();

+Available options for search methods:
+- enhance_query: Enables query enhancement for better search results
+- case_sensitive: Controls case sensitivity of the search
+- language_tag: Specifies the language for tokenization
+- min_length/max_length: Controls n-gram size for n-grams search
+
+Best practices:
+- Use searchFullText for natural language queries
+- Use searchSubstring for prefix/suffix matching
+- Use searchNgrams for fuzzy matching
+- Consider indexing performance when choosing tokenization options


</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used: CodeRabbit UI**
**Review profile: CHILL**

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between f3d950bc0f58fb0c5ebf3664c06d27e026f0d45f and d53a9951b6d09cd5ced43951f808d8aea1278309.

</details>

<details>
<summary>📒 Files selected for processing (12)</summary>

* `CHANGELOG.md` (1 hunks)
* `README.md` (1 hunks)
* `compose.yaml` (1 hunks)
* `src/Query/Builder.php` (1 hunks)
* `src/Query/Concerns/UsesFullTextSearch.php` (1 hunks)
* `src/Query/Grammar.php` (2 hunks)
* `src/Schema/Blueprint.php` (2 hunks)
* `src/Schema/Grammar.php` (3 hunks)
* `src/Schema/SearchIndexDefinition.php` (1 hunks)
* `src/Schema/TokenizerFunction.php` (1 hunks)
* `tests/Query/BuilderTest.php` (2 hunks)
* `tests/Schema/BlueprintTest.php` (4 hunks)

</details>

<details>
<summary>✅ Files skipped from review due to trivial changes (1)</summary>

* compose.yaml

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🪛 LanguageTool</summary>

<details>
<summary>CHANGELOG.md</summary>

[uncategorized] ~3-~3: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... v8.3.0 (2024-11-08)  - add support for full text search (#235) - consolidate schema opti...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

</details>
<details>
<summary>README.md</summary>

[uncategorized] ~352-~352: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...angeStream('stream_name'); }); ```  ### Full Text Search  Spanner supports [Full Text Sea...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

---

[uncategorized] ~354-~354: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...### Full Text Search  Spanner supports [Full Text Search](https://cloud.google.com/spanne...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

</details>

</details>
<details>
<summary>🪛 Markdownlint</summary>

<details>
<summary>README.md</summary>

352-352: Expected: setext; Actual: atx
Heading style

(MD003, heading-style)

</details>

</details>

</details>

<details>
<summary>🔇 Additional comments (17)</summary>

<details>
<summary>src/Schema/TokenizerFunction.php (1)</summary>

`23-31`: **Implementation looks good and aligns with requirements!**

The enum implementation correctly defines all necessary tokenizer functions required for the full-text search feature. The backing string values match Cloud Spanner's tokenizer function names.

</details>
<details>
<summary>src/Schema/SearchIndexDefinition.php (2)</summary>

`1-20`: **LGTM: License and namespace are properly defined.**

The license header is complete and the namespace correctly reflects the file structure.

---

`34-34`: **Verify parent class compatibility.**

Let's verify the integration with the parent class and its usage in the Blueprint class.

<details>
<summary>✅ Verification successful</summary>

Let me try a different approach to verify the integration points.

---

**Parent class and integration points verified successfully**

The verification shows proper integration:
- `SearchIndexDefinition` correctly extends `IndexDefinition` (which in turn extends `BaseIndexDefinition`)
- The class is properly used in the codebase:
  - `Blueprint::fullText()` method creates and returns `SearchIndexDefinition` instances
  - `Grammar` class handles the compilation of full-text commands using `SearchIndexDefinition`



</details>

<details>
<summary>🏁 Scripts executed</summary>

The following scripts were executed for the analysis:

Script:

```shell
#!/bin/bash
# Description: Verify parent class and integration points

# Test 1: Check parent class existence and its public interface
echo "Checking IndexDefinition class..."
ast-grep --pattern 'class IndexDefinition {
  $$$
}'

# Test 2: Verify usage in Blueprint class
echo "Checking usage in Blueprint class..."
ast-grep --pattern 'class Blueprint {
  $$$
  fullText($$$) {
    $$$
  }
  $$$
}'

Length of output: 299


Script:

#!/bin/bash
# Search for IndexDefinition class with ripgrep for better context
echo "Searching for IndexDefinition class definition..."
rg "class IndexDefinition" -A 5

# Search for SearchIndexDefinition usage
echo -e "\nSearching for SearchIndexDefinition usage..."
rg "SearchIndexDefinition" -A 3

# Search for fullText method implementations
echo -e "\nSearching for fullText method implementations..."
rg "function fullText" -A 5

Length of output: 2202

src/Query/Concerns/UsesFullTextSearch.php (3)

1-20: LGTM! File structure and licensing are properly set up.


22-74: LGTM! Well-structured search methods with consistent signatures.

The implementation follows good practices with proper type hints and maintains DRY principle by delegating to a helper method.

The previous review comment about adding input validation for the parameters is still applicable.

Consider enhancing the PHPDoc blocks with:

  1. Usage examples for each search method
  2. Description of how options affect the search behavior
  3. Explanation of when to use each search type (FullText vs Ngrams vs Substring)

Example addition:

/**
 * Performs a full-text search on specified tokens.
 *
 * @param string $tokens Column or columns to search in
 * @param string $query Search query to match
 * @param array<string, scalar> $options Search options (e.g. ['language' => 'en'])
 * @param string $boolean How to join with previous conditions ('and'/'or')
 * @return $this
 *
 * @example
 * $query->searchFullText('description', 'search term', ['language' => 'en']);
 */

76-101: Consider adding type validation and improving documentation.

The helper method correctly structures the search condition, but could benefit from additional safeguards and documentation:

 protected function addSearchCondition(
     string $type,
     string $tokens,
     string $query,
     array $options = [],
     string $boolean = 'and',
 ): void
 {
+    $allowedTypes = ['SearchFullText', 'SearchNgrams', 'SearchSubstring'];
+    if (!in_array($type, $allowedTypes, true)) {
+        throw new InvalidArgumentException("Invalid search type: {$type}");
+    }
+
     $this->wheres[] = [
         'type' => $type,
         'tokens' => $tokens,
         'boolean' => $boolean,
         'query' => $query,
         'options' => $options,
     ];
     $this->addBinding($query);
 }

Let's verify the query binding implementation:

src/Query/Grammar.php (2)

150-173: LGTM! Well-structured search implementation.

The search methods are well-documented, follow DRY principles, and properly handle query escaping. The implementation aligns well with the PR objectives for adding full-text search support.


25-25: Verify usage of Illuminate\Support\Arr.

The Arr class appears to be imported but not used in the new methods. Consider removing if not needed.

src/Query/Builder.php (1)

32-32: LGTM! Verify trait implementation.

The addition of UsesFullTextSearch trait aligns with the PR objectives to implement full-text search capabilities.

Let's verify the trait implementation:

✅ Verification successful

Let me gather more information about the trait implementation since the previous results only confirmed the file exists but didn't show the actual implementation.


Trait implementation verified and properly structured

The UsesFullTextSearch trait is well-implemented with:

  • All three expected search methods: searchFullText, searchNgrams, and searchSubstring
  • Consistent method signatures with proper type hints and return types
  • Protected helper method addSearchCondition for DRY implementation
  • Proper PHPDoc blocks with parameter type documentation
  • Fluent interface pattern maintained with return $this
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the trait exists and contains the expected search methods

# Test 1: Check if the trait file exists
fd "UsesFullTextSearch.php" src/Query/Concerns/

# Test 2: Verify the trait contains the expected search methods
ast-grep --pattern $'trait UsesFullTextSearch {
  $$$
  public function searchFullText($$$) {
    $$$
  }
  $$$
  public function searchNgrams($$$) {
    $$$
  }
  $$$
  public function searchSubstring($$$) {
    $$$
  }
  $$$
}' src/Query/Concerns/

Length of output: 340


Script:

#!/bin/bash
# Let's examine the actual content of the trait file
cat src/Query/Concerns/UsesFullTextSearch.php

Length of output: 2729

CHANGELOG.md (1)

Line range hint 4-11: LGTM!

The remaining changelog entries are well-structured and provide clear documentation of the changes.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~3-~3: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... v8.3.0 (2024-11-08) - add support for full text search (#235) - consolidate schema opti...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

tests/Query/BuilderTest.php (5)

1106-1128: LGTM! Comprehensive test coverage for full-text search.

The test effectively validates both the SQL generation and search results, including multiple matches with OR conditions.


1130-1152: LGTM! Good coverage of search options.

The test effectively validates the enhance_query option functionality, including proper SQL generation and result filtering.


1154-1176: LGTM! Thorough test of n-grams search.

The test effectively validates n-grams search functionality with options, including proper SQL generation and result filtering.


1178-1200: LGTM! Good coverage of substring search.

The test effectively validates substring search functionality, including proper SQL generation and result filtering.


1106-1200: Consider refactoring common test patterns.

The test methods share similar setup code for creating tables and inserting test data. The previous review comment about refactoring these patterns is still valid and would improve maintainability.

src/Schema/Grammar.php (1)

777-787: The typeTokenList method is correctly implemented

The typeTokenList method properly constructs the tokenlist type with the tokenizer function and options, ensuring accurate SQL syntax. The use of array_filter and implode effectively handles optional parameters.

tests/Schema/BlueprintTest.php (1)

287-287: ⚠️ Potential issue

Fix syntax error: Missing variable in assignment.

In line 287, the variable assignment is missing the $this keyword. It should be $this->getDefaultConnection();.

Apply this diff to fix the syntax error:

-            $conn = $->getDefaultConnection();
+            $conn = $this->getDefaultConnection();

Likely invalid or redundant comment.

@colopl colopl deleted a comment from coderabbitai bot Nov 8, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tests/Query/BuilderTest.php (1)

1106-1200: 🛠️ Refactor suggestion

Consider refactoring common test patterns.

The test methods share similar setup code and patterns.

Apply this refactor to reduce duplication and improve maintainability:

+private function createSearchTable(string $tableName, TokenizerFunction $tokenizer): void 
+{
+    $conn = $this->getDefaultConnection();
+    $grammar = new Grammar();
+    $blueprint = new Blueprint($tableName, function (Blueprint $table) use ($tokenizer) {
+        $table->create();
+        $table->uuid('id')->primary();
+        $table->string('name');
+        $table->tokenList('nameTokens', $tokenizer, 'name');
+        $table->fullText('nameTokens');
+    });
+    $blueprint->build($conn, $grammar);
+}
+
+private function insertTestData(string $tableName): void
+{
+    $this->getDefaultConnection()->table($tableName)->insert([
+        ['id' => $this->generateUuid(), 'name' => 'test1'],
+        ['id' => $this->generateUuid(), 'name' => 'test2'],
+    ]);
+}
+
 public function test_search(): void
 {
     $tableName = $this->generateTableName('FTS_Search');
-    $conn = $this->getDefaultConnection();
-    $grammar = new Grammar();
-    $blueprint = new Blueprint($tableName, function (Blueprint $table) {
-        $table->create();
-        $table->uuid('id')->primary();
-        $table->string('name');
-        $table->tokenList('nameTokens', TokenizerFunction::FullText, 'name');
-        $table->fullText('nameTokens');
-    });
-    $blueprint->build($conn, $grammar);
-
-    $conn->table($tableName)->insert([
-        ['id' => $this->generateUuid(), 'name' => 'test1'],
-        ['id' => $this->generateUuid(), 'name' => 'test2'],
-    ]);
+    $this->createSearchTable($tableName, TokenizerFunction::FullText);
+    $this->insertTestData($tableName);

Apply similar changes to other test methods.

🧹 Nitpick comments (8)
src/Schema/Grammar.php (1)

199-205: Refactor to handle non-empty options consistently.

You are appending “options (…)” only if getOptions() is non-empty. Ensure that future search index extensions follow the same pattern consistently so that new optional parameters integrate cleanly.

tests/Schema/BlueprintTest.php (1)

59-59: Token list column usage.

Declaring a tokenList column with substring tokenization for the 'text' column is consistent with the tested approach. Keep in mind that not all tokenizers support all column types or data.

src/Query/Concerns/UsesFullTextSearch.php (1)

40-74: Consider extracting common validation logic

Since all search methods share the same parameters and validation requirements, consider extracting the validation logic into a shared protected method to promote DRY principles.

+    /**
+     * @throws \InvalidArgumentException
+     */
+    protected function validateSearchParams(
+        string $tokens,
+        string $query,
+        array $options,
+        string $boolean
+    ): void {
+        if (!in_array($boolean, ['and', 'or'], true)) {
+            throw new InvalidArgumentException('Boolean parameter must be either "and" or "or"');
+        }
+        if (trim($tokens) === '') {
+            throw new InvalidArgumentException('Tokens parameter cannot be empty');
+        }
+        if (trim($query) === '') {
+            throw new InvalidArgumentException('Query parameter cannot be empty');
+        }
+        foreach ($options as $key => $value) {
+            if (!is_scalar($value)) {
+                throw new InvalidArgumentException("Option '$key' must have a scalar value");
+            }
+        }
+    }
README.md (1)

352-376: Ensure consistent hyphenation in documentation.

Use consistent hyphenation for the compound adjective "full-text" throughout the documentation:

-### Full Text Search
+### Full-Text Search

-Spanner supports [Full Text Search]
+Spanner supports [Full-Text Search]
🧰 Tools
🪛 LanguageTool

[uncategorized] ~352-~352: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...angeStream('stream_name'); }); ``` ### Full Text Search Spanner supports [Full Text Sea...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[uncategorized] ~354-~354: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...### Full Text Search Spanner supports [Full Text Search](https://cloud.google.com/spanne...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🪛 Markdownlint (0.37.0)

352-352: Expected: setext; Actual: atx
Heading style

(MD003, heading-style)

tests/Query/BuilderTest.php (4)

1106-1128: Test method validates basic full-text search functionality.

The test correctly validates:

  • Table creation with full-text search capability
  • SQL query generation
  • Search results for multiple terms with OR operator

However, consider adding tests for:

  • Empty search terms
  • Special characters in search terms
  • Case sensitivity

1130-1152: Test method validates search with options.

The test correctly validates:

  • Search with enhance_query option
  • SQL query generation with options
  • Result filtering

However, consider testing:

  • Multiple search options
  • Invalid options handling

1154-1176: Test method validates n-grams search functionality.

The test correctly validates:

  • Table creation with n-grams tokenizer
  • SQL query generation with min_ngrams option
  • Result filtering for n-grams search

Consider testing:

  • Different min_ngrams values
  • Edge cases with very short/long search terms

1178-1200: Test method validates substring search functionality.

The test correctly validates:

  • Table creation with substring tokenizer
  • SQL query generation
  • Result filtering for substring search

Consider testing:

  • Empty substring
  • Substring at different positions (start, middle, end)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f3d950b and 5a2a245.

📒 Files selected for processing (12)
  • CHANGELOG.md (1 hunks)
  • README.md (1 hunks)
  • compose.yaml (1 hunks)
  • src/Query/Builder.php (1 hunks)
  • src/Query/Concerns/UsesFullTextSearch.php (1 hunks)
  • src/Query/Grammar.php (1 hunks)
  • src/Schema/Blueprint.php (2 hunks)
  • src/Schema/Grammar.php (2 hunks)
  • src/Schema/SearchIndexDefinition.php (1 hunks)
  • src/Schema/TokenizerFunction.php (1 hunks)
  • tests/Query/BuilderTest.php (2 hunks)
  • tests/Schema/BlueprintTest.php (4 hunks)
✅ Files skipped from review due to trivial changes (1)
  • compose.yaml
🧰 Additional context used
🪛 LanguageTool
README.md

[uncategorized] ~352-~352: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...angeStream('stream_name'); }); ``` ### Full Text Search Spanner supports [Full Text Sea...

(EN_COMPOUND_ADJECTIVE_INTERNAL)


[uncategorized] ~354-~354: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...### Full Text Search Spanner supports [Full Text Search](https://cloud.google.com/spanne...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

CHANGELOG.md

[uncategorized] ~3-~3: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... v8.3.0 (2024-11-08) - add support for full text search (#235) - consolidate schema opti...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

🪛 Markdownlint (0.37.0)
README.md

352-352: Expected: setext; Actual: atx
Heading style

(MD003, heading-style)

🔇 Additional comments (18)
src/Schema/Grammar.php (3)

168-189: New full-text index compilation looks good.

The newly introduced compileFullText method is coherent and aligns with the existing grammar structure. The usage of columnize for columns and the additional partitioning logic looks fine.


190-197: Order by syntax issue regarding multiple columns.

This code does not handle multiple order-by columns properly. It appends them without commas, producing invalid syntax for queries with more than one column. This issue matches a previous review's concern.


775-786: Validate tokenization parameters and function usage.

The typeTokenList method is straightforward. However, consider additional validation or guards if user input for column target or options can be manipulated at runtime, to mitigate potential schema generation inconsistencies.

tests/Schema/BlueprintTest.php (5)

23-23: TokenizerFunction import recognized.

The new import indicates usage of custom tokenizers. No issues found.


87-87: Hidden token list column confirmed.

The string "hidden" ensures that the underlying tokenlist column is not directly visible. This is correct for storing token data behind the scenes.


285-294: Parent table creation test appears correct.

The parent table creation for testing the interleaved table is aligned with Spanner's best practices.


298-315: Child table, interleaved indexing, and full-text index definitions.

You define a separate child table with a token list column and a full-text index referencing the parent table. This test thoroughly exercises the new search index capabilities, including storing columns, partitioning, sorting, and custom options.


317-323: Index creation assertion.

The final assertion checks the generated SQL statements with the expected schema definitions. This is essential to ensure consistent behavior across environments, although Spanner Emulator limitations are noted.

src/Schema/TokenizerFunction.php (1)

1-31: TokenizerFunction enum is cleanly implemented.

The definition enumerates all supported tokenization methods with explicit string values, providing clarity for the schema builder.

src/Schema/SearchIndexDefinition.php (1)

1-46: Encapsulation of search index properties.

The SearchIndexDefinition class cleanly extends IndexDefinition, adding optional flags and configurations for specialized search indexes. The usage of array_filter to omit null options is concise and effective.

src/Query/Concerns/UsesFullTextSearch.php (2)

22-38: Add input validation for search parameters

The implementation would benefit from input validation to ensure data integrity and prevent potential issues.


76-100: LGTM! Well-structured implementation following Laravel patterns.

The method correctly handles the addition of search conditions and parameter binding.

src/Query/Grammar.php (3)

144-152: LGTM! Clean implementation with good separation of concerns.

The method properly delegates to buildSearchFunction for SQL generation.


154-172: LGTM! Consistent implementation across search methods.

The methods maintain consistency in implementation and documentation.


174-189: Verify SQL injection protection

While the implementation looks correct, let's verify the SQL injection protection:

  1. Ensure quoteString method properly escapes the query parameter
  2. Verify that formatOptions safely handles the options array
src/Query/Builder.php (1)

32-32: LGTM! Proper trait integration.

The UsesFullTextSearch trait is correctly integrated alongside other query builder concerns.

src/Schema/Blueprint.php (1)

247-261: LGTM! Well-structured implementation of token list column.

The method correctly implements token list column creation with appropriate parameters and follows best practices by marking it as invisible and nullable.

CHANGELOG.md (1)

3-3: 🛠️ Refactor suggestion

Enhance the full-text search changelog entry with implementation details.

The current entry should be expanded to better reflect the significance of this feature:

-- add support for full text search (#235)
+- add support for full-text search (#235)
+  - Added `Blueprint::fullText()` for SEARCH INDEX support with:
+    - PARTITION BY
+    - ORDER BY
+    - OPTIONS(sort_order_sharding)
+  - Added TOKENLIST column type with tokenization options:
+    - TOKENIZE_FULLTEXT
+    - TOKENIZE_SUBSTRING
+    - TOKENIZE_NGRAMS
+    - TOKEN
+    - TOKENIZE_BOOL
+    - TOKENIZE_NUMBER
+  - Added search query methods:
+    - `Builder::searchFullText()`
+    - `Builder::searchNgrams()`
+    - `Builder::searchSubstring()`

This change:

  1. Adds proper hyphenation for the compound adjective "full-text"
  2. Provides comprehensive details about the feature implementation
  3. Helps users understand the full scope of the new functionality

Likely invalid or redundant comment.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~3-~3: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ... v8.3.0 (2024-11-08) - add support for full text search (#235) - consolidate schema opti...

(EN_COMPOUND_ADJECTIVE_INTERNAL)

Comment on lines +288 to +307
/**
* @param string|list<string> $columns
* @param string|null $name
* @param string|null $algorithm
* @return SearchIndexDefinition
*/
public function fullText($columns, $name = null, $algorithm = null)
{
$type = 'fullText';
$columns = (array) $columns;

$this->commands[] = $command = new SearchIndexDefinition([
'name' => $type,
'index' => $name ?? $this->createIndexName($type, $columns),
'columns' => $columns,
'algorithm' => $algorithm,
]);

return $command;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add support for missing features mentioned in PR objectives.

The current implementation doesn't support the following features mentioned in the PR objectives:

  • PARTITION BY
  • ORDER BY
  • OPTIONS(sort_order_sharding)

Consider extending the SearchIndexDefinition class to support these features:

    public function fullText($columns, $name = null, $algorithm = null)
    {
        $type = 'fullText';
        $columns = (array) $columns;

        $this->commands[] = $command = new SearchIndexDefinition([
            'name' => $type,
            'index' => $name ?? $this->createIndexName($type, $columns),
            'columns' => $columns,
            'algorithm' => $algorithm,
+           'partition_by' => null,  // Add support for PARTITION BY
+           'order_by' => null,      // Add support for ORDER BY
+           'options' => [],         // Add support for OPTIONS(sort_order_sharding)
        ]);

        return $command;
    }

Example usage after implementing these features:

$table->fullText('description')
    ->partitionBy(['user_id'])
    ->orderBy(['created_at'])
    ->options(['sort_order_sharding' => true]);

@taka-oyama taka-oyama requested review from a team and halnique December 18, 2024 05:58
@colopl colopl deleted a comment from coderabbitai bot Dec 18, 2024
@taka-oyama taka-oyama merged commit 9be524f into master Dec 18, 2024
1 check passed
@taka-oyama taka-oyama deleted the feature/full-text-search branch December 18, 2024 08:13
@taka-oyama
Copy link
Collaborator Author

Thank you for the review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request waiting for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for full-text search

3 participants