Skip to content

library: xgo/dql#2610

Merged
xushiwei merged 5 commits intogoplus:mainfrom
xushiwei:q
Feb 11, 2026
Merged

library: xgo/dql#2610
xushiwei merged 5 commits intogoplus:mainfrom
xushiwei:q

Conversation

@xushiwei
Copy link
Member

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @xushiwei, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive dql library, significantly expanding the project's capabilities for structured data extraction and manipulation. It establishes a consistent API for interacting with diverse data formats, from web content to programming language constructs. The inclusion of a fetcher package streamlines common data retrieval patterns, while the underlying stream infrastructure ensures robust and adaptable data source handling, including caching for network efficiency.

Highlights

  • New DQL Library: Introduced a new dql (DOM Query Language) library to provide a unified interface for querying structured data from various sources.
  • Multi-Format Data Querying: The dql library now supports querying HTML, JSON, YAML, Go AST (Abstract Syntax Tree), and Go reflection data.
  • Integrated Fetcher Package: A fetcher package has been added, including pre-built fetchers for common web scraping tasks such as GitHub issue tasks, GitHub repository lists, pkg.go.dev import counts, and PyTorch function documentation.
  • Flexible Stream Handling: Implemented a generic stream mechanism that supports various protocols (http, https, inline, zip) with options for HTTP caching.
  • Go Module Updates: The go.mod file has been updated to Go version 1.24.0 and includes new dependencies for YAML parsing and network operations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • .github/codecov.yml
    • Added dql to the coverage exclusion list.
  • demo/dql-links/links.xgo
    • Added a demo script to extract links from an HTML document using dql.
  • dql/README.md
    • Added a README file for the dql library.
  • dql/dql.go
    • Added core dql functionalities, including error definitions and integer parsing utilities.
  • dql/fetcher/fetch.go
    • Added the fetcher package for registering and performing data fetching operations.
  • dql/fetcher/github.com/issueTask/issueTask.xgo
    • Added a fetcher for GitHub issue tasks.
  • dql/fetcher/github.com/repoList/repoList.xgo
    • Added a fetcher for GitHub repository lists.
  • dql/fetcher/hrefs/hrefs.xgo
    • Added a generic fetcher to extract all href attributes from a page.
  • dql/fetcher/pkg.go.dev/importedBy/importedBy.xgo
    • Added a fetcher for pkg.go.dev to get package import counts.
  • dql/fetcher/pytorch.org/fndoc/fndoc.xgo
    • Added a fetcher for PyTorch function documentation.
  • dql/golang/golang.go
    • Added dql support for Go AST and reflection.
  • dql/html/html.go
    • Added dql implementation for HTML document querying.
  • dql/json/json.go
    • Added dql implementation for JSON data querying.
  • dql/maps/maps.go
    • Added core dql functionalities for map-based data structures.
  • dql/reflects/reflects.go
    • Added dql implementation for Go reflection.
  • dql/stream/http/cached/cached.go
    • Added HTTP stream opener with caching capabilities.
  • dql/stream/http/httpstrm.go
    • Added basic HTTP stream opener.
  • dql/stream/http/nocache/nocache.go
    • Added HTTP stream opener without caching.
  • dql/stream/inline/inline.go
    • Added stream opener for inline text.
  • dql/stream/stream.go
    • Added generic stream registration and opening mechanism.
  • dql/stream/stream_test.go
    • Added tests for the stream package.
  • dql/stream/zip/zipstrm.go
    • Added stream opener for files within zip archives.
  • dql/xml/xml.go
    • Added dql implementation for XML document querying.
  • dql/yaml/yaml.go
    • Added dql implementation for YAML data querying.
  • go.mod
    • Updated Go version to 1.24.0.
    • Added toolchain go1.24.2.
    • Included new dependencies github.com/goccy/go-yaml and golang.org/x/net.
  • go.sum
    • Updated checksums for go.mod changes.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/go.yml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@codecov
Copy link

codecov bot commented Feb 11, 2026

Codecov Report

❌ Patch coverage is 63.63636% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.02%. Comparing base (bf44579) to head (9c33cb4).
⚠️ Report is 16 commits behind head on main.

Files with missing lines Patch % Lines
parser/parser.go 63.63% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2610      +/-   ##
==========================================
- Coverage   94.04%   94.02%   -0.02%     
==========================================
  Files          33       32       -1     
  Lines        9905     9891      -14     
==========================================
- Hits         9315     9300      -15     
- Misses        421      422       +1     
  Partials      169      169              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@xgopilot
Copy link

xgopilot bot commented Feb 11, 2026

Code Review Summary

This PR introduces the DQL (DOM Query Language) library - a comprehensive set of packages for querying HTML, XML, JSON, YAML documents and Go AST/reflect structures. The architecture is well-designed with consistent iterator-based APIs.

Key concerns requiring attention:

  1. Critical: Wrong import paths in dql/fetcher/hrefs/hrefs.xgo (uses hdq instead of xgo/dql)
  2. Critical: Incomplete stub methods in html.go (One(), ParentN(), etc.) will panic at runtime
  3. Security: SSRF vulnerability in HTTP fetching - no URL validation for internal network access
  4. Security: Path traversal risk in stream.Open() - no sanitization for file paths
  5. Performance: XML CharData storage may cause data corruption - the slice is reused by decoder

The overall code structure is clean and follows Go idioms well. The iterator pattern using iter.Seq is consistently applied across all packages.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the new dql library, providing modules for querying various data formats and handling streams. However, a security audit identified several high and medium severity Server-Side Request Forgery (SSRF) and Path Traversal vulnerabilities. The core resource-opening logic in dql/stream lacks validation, and several site-specific fetchers are vulnerable to SSRF bypass via path traversal sequences. Beyond security, there are also areas for improvement concerning error handling (many functions use panic), type safety (reflection bypasses Go's type safety), and consistency (e.g., naming conventions and todo items). Addressing these points, especially implementing strict input validation and sanitization for file paths and URLs, will significantly improve the robustness, maintainability, and security of the dql library.

func getCacheDir() string {
root, err := os.UserCacheDir()
if err != nil {
panic(err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Panicking in getCacheDir if os.UserCacheDir() returns an error is not ideal for a library function. It would be better to return the error to the caller or handle it gracefully, perhaps by falling back to a temporary directory or a default location, rather than crashing the program.

Comment on lines +58 to +72
f, err := os.Create(cacheFile)
if err != nil {
return
}
defer f.Close()
_, err = io.Copy(f, resp.Body)
return
}

func ReadCache(cacheFile string, fi fs.FileInfo) (ret io.ReadCloser, err error) {
return os.Open(cacheFile)
}

// -------------------------------------------------------------------------------------

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security: Cache implementation concerns:

  1. No TTL/expiration - cached content is served indefinitely
  2. No cache eviction - unbounded disk growth
  3. Race condition - concurrent requests for same URL may corrupt cache
  4. The TODO comment acknowledges missing checksum validation

Consider adding file locking and TTL-based expiration.

)

var (
// DefaultUserAgent is the default UserAgent and is used by HTTPSource.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance/Safety: The HTTP client has no timeout configured. http.DefaultClient may hang indefinitely on slow/unresponsive servers. Consider setting a reasonable timeout:

Client = &http.Client{
    Timeout: 30 * time.Second,
}

@xushiwei xushiwei merged commit 280675a into goplus:main Feb 11, 2026
15 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant