Skip to content

Conversation

@preciz
Copy link
Contributor

@preciz preciz commented Dec 17, 2025

mapper is only called for [:text, :comment] cases and takes a list so it's safe to refactor the Stream into a recursion I believe.

There is a 10x speedup in this benchmark:

  Mix.install([
    {:benchee, "~> 1.0"},
    {:floki, path: "."}
  ])

  read_file = fn name ->
    __ENV__.file
    |> Path.dirname()
    |> Path.join(name)
    |> File.read!()
    |> Floki.parse_document!()
  end

  doc = read_file.("big.html")
  doc_with_comments = [{"div", [], [{:comment, "comment 1"}, "text", {:comment, "comment 2"}]} | doc]


  IO.puts "Document parsed. Running benchmark..."

  Benchee.run(
    %{
      "filter_out :comment" => fn -> Floki.filter_out(doc_with_comments, :comment) end,
      "filter_out :text" => fn -> Floki.filter_out(doc_with_comments, :text) end
    },
    time: 5,
    memory_time: 2
  )

Copilot AI review requested due to automatic review settings December 17, 2025 14:19
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the filter_out function by replacing Stream-based operations with direct tail recursion, achieving a reported 10x performance improvement. The optimization specifically targets the filtering of :text and :comment nodes from HTML trees.

Key changes:

  • Replaces Stream.filter and Stream.map with a tail-recursive do_mapper helper function
  • Uses an accumulator pattern with a final reverse to maintain list order
  • Eliminates Stream enumeration overhead while preserving identical behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@philss philss merged commit ab1d361 into philss:main Dec 18, 2025
12 checks passed
@philss
Copy link
Owner

philss commented Dec 18, 2025

🚀 🚀 🚀 thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants