Skip to content

Conversation

@hoblin
Copy link

@hoblin hoblin commented Feb 10, 2026

Summary

  • Replace per-call tree traversal in key_value? with a lazily-built Set of all node keys per locale
  • Cache invalidates automatically on data.reload
  • Reduces i18n-tasks missing runtime by ~19% on large projects (145s → 118s with 31 locales, 91 locale files)

Problem

key_value? is called thousands of times per locale pair during missing_keys computation. Each call does a full tree lookup via t(key, locale)data.tSiblings#get → recursive split_key navigation. For projects with many locales, this dominates Phase 4 (compute missing keys) which accounts for ~73% of total runtime.

Solution

Pre-build a flat Set<String> of all node keys per locale on first key_value? call. Subsequent lookups are O(1) hash membership checks instead of O(tree_depth) traversals. The cache is invalidated when data.reload is called.

Test plan

  • All existing specs pass (3 pre-existing Prism scanner failures unrelated to this change)
  • remove_unused specs pass — cache correctly invalidates on data.reload
  • i18n-tasks missing produces identical results (0 false positives/negatives)
  • Benchmarked on production-scale project: 31 locales, 91 locale files, 168K lines of YAML

Replace per-call tree traversal in key_value? with a lazily-built
Set of all node keys per locale. Invalidates on data.reload.

Reduces `i18n-tasks missing` runtime by ~19% on large projects
(145s → 118s with 31 locales, 91 locale files).
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes missing-keys computation by replacing repeated per-call tree lookups in key_value? with a lazily-built, per-locale Set of keys, and introduces cache invalidation tied to data.reload.

Changes:

  • Wrap the data adapter’s reload to invalidate a new key_value? cache.
  • Implement key_value? as an O(1) membership check against a cached per-locale Set.
  • Add cache builder (build_key_value_set) and invalidation helper (invalidate_key_value_cache!).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +102 to +105
def build_key_value_set(locale)
keys = Set.new
data[locale].nodes { |node| keys << node.full_key(root: false) unless node.key.nil? }
keys
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key_value? now checks membership in a Set of node keys, which changes semantics for leaf nodes with value == nil. Previously key_value? returned false in that case because t(key, locale) returns nil when the stored translation value is nil (see Node#value_or_children_hash). With the current implementation, those keys will be treated as present, which can hide missing translations. Consider only adding keys for non-leaf nodes, or for leaf nodes where node.value is not nil (i.e., mimic t(...).nil? behavior).

Copilot uses AI. Check for mistakes.
Comment on lines 65 to +68
def key_value?(key, locale = base_locale)
!t(key, locale).nil?
@key_value_cache ||= {}
locale_cache = (@key_value_cache[locale] ||= build_key_value_set(locale))
locale_cache.include?(key)
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key_value? previously accepted Symbol keys (they were coerced via Siblings#get(full_key.to_s)), but locale_cache.include?(key) will not match because the Set is populated with Strings. To preserve existing behavior, coerce key to a String before lookup (and ideally do the same normalization when building the set).

Copilot uses AI. Check for mistakes.
# whether the value for key exists in locale (defaults: base_locale)
def key_value?(key, locale = base_locale)
!t(key, locale).nil?
@key_value_cache ||= {}
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache is keyed by locale as passed in. Elsewhere the data layer normalizes locales to strings (FileSystemBase#get(locale) calls locale.to_s), so calling key_value? with both :en and "en" will build duplicate caches and waste memory. Consider normalizing locale = locale.to_s before indexing @key_value_cache and before calling build_key_value_set.

Suggested change
@key_value_cache ||= {}
@key_value_cache ||= {}
locale = locale.to_s

Copilot uses AI. Check for mistakes.
Comment on lines +25 to +27
adapter.define_singleton_method(:reload) do
task.invalidate_key_value_cache!
super()
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapping adapter.reload via define_singleton_method and calling super() assumes every configured data adapter implements reload with a compatible signature. Since data.adapter is configurable, this can raise NoMethodError (no reload) or break if an adapter defines reload(*args) later. Consider guarding with respond_to?(:reload) and forwarding args/kwargs/block (super(*args, **kwargs, &block)), or wrapping via Module#prepend to avoid signature mismatches.

Suggested change
adapter.define_singleton_method(:reload) do
task.invalidate_key_value_cache!
super()
if adapter.respond_to?(:reload)
adapter.define_singleton_method(:reload) do |*args, **kwargs, &block|
task.invalidate_key_value_cache!
super(*args, **kwargs, &block)
end

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant