New tool: ingest_email.py #111

gvanrossum-ms · 2025-12-03T06:49:35Z

This does the same as @add_messages in test_gmail.py does, but now the storage provider interface has changed to allow storing the "is it ingested" flag per message id in the database, so it is transactionally safe.

(Almost all by Claude Opus 4.5 Preview.)

I'm not in a hurry to remove test_email.py -- we need to make sure that all its use cases are now incorporated into either ingest_email.py ro query.py. I think @parse_messages is the only one left.

gvanrossum

Let's treat this as a draft PR

tools/ingest_email.py

robgruen · 2025-12-03T17:04:39Z

tools/ingest_email.py

+    """Ingest email files into a database."""
+
+    # Collect all .eml files
+    if verbose:


Is there a cleaner way to do this verbose printing? Could there be a printVerbose() method that we call instead of print() and then it decides to print the message or not based on the verbose flag?

It would feel a little "clever", so I'll skip it.

robgruen · 2025-12-03T17:09:15Z

tools/ingest_email.py

+                    print(f"      {preview}")
+
+            # Pass source_id to mark as ingested atomically with the message
+            source_ids = [email_id] if email_id else None


maybe emit a warning when the email id doesn't exist.

Do we try to create one based on the from/to/timestamp/subject hash if we don't have one otherwise?

We could. I'll defer doing that until there's demand.

robgruen · 2025-12-03T17:13:25Z

typeagent/storage/sqlite/provider.py

+    def mark_source_ingested(self, source_id: str) -> None:
+        """Mark a source as ingested.
+
+        This performs an INSERT but does NOT commit. It should be called within


I haven't tried this, but the docs say you can do cursor.in_transaction to see if there's an active transaction. That could keep someone from calling this if there's no active transaction.

Meh, this function only exists so add_messages_with_indexing can call it. I'm not too worried about users calling it wrongly.

gvanrossum-ms added 2 commits December 2, 2025 22:18

New tools/ingest_email.py

4e27059

Update to store 'already indexed' status in database

2ea1614

gvanrossum-ms requested a review from robgruen December 3, 2025 06:49

gvanrossum-ms temporarily deployed to build-pipeline December 3, 2025 06:49 — with GitHub Actions Inactive

gvanrossum reviewed Dec 3, 2025

View reviewed changes

tools/ingest_email.py Outdated Show resolved Hide resolved

tools/ingest_email.py Outdated Show resolved Hide resolved

tools/ingest_email.py Show resolved Hide resolved

gvanrossum-ms marked this pull request as draft December 3, 2025 16:58

gvanrossum-ms added 3 commits December 3, 2025 12:01

Update docs to describe how to run gmail_dump.py

4169f0d

Make source_ids a keyword-only parameter

4fcf182

Sort glob result; error on non-email files

037ede4

gvanrossum-ms temporarily deployed to build-pipeline December 3, 2025 20:21 — with GitHub Actions Inactive

gvanrossum-ms marked this pull request as ready for review December 3, 2025 20:23

robgruen approved these changes Dec 3, 2025

View reviewed changes

gvanrossum merged commit 18e22b5 into main Dec 3, 2025
15 checks passed

gvanrossum deleted the email branch December 3, 2025 20:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New tool: ingest_email.py #111

New tool: ingest_email.py #111

Uh oh!

gvanrossum-ms commented Dec 3, 2025 •

edited

Loading

Uh oh!

gvanrossum left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

robgruen Dec 3, 2025

Uh oh!

gvanrossum Dec 3, 2025

Uh oh!

robgruen Dec 3, 2025

Uh oh!

gvanrossum Dec 3, 2025

Uh oh!

robgruen Dec 3, 2025

Uh oh!

gvanrossum Dec 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

New tool: ingest_email.py #111

New tool: ingest_email.py #111

Uh oh!

Conversation

gvanrossum-ms commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gvanrossum left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

robgruen Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

gvanrossum Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

robgruen Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

gvanrossum Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

robgruen Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

gvanrossum Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gvanrossum-ms commented Dec 3, 2025 •

edited

Loading