Skip to content

Conversation

@gvanrossum-ms
Copy link
Collaborator

@gvanrossum-ms gvanrossum-ms commented Dec 3, 2025

This does the same as @add_messages in test_gmail.py does, but now the storage provider interface has changed to allow storing the "is it ingested" flag per message id in the database, so it is transactionally safe.

(Almost all by Claude Opus 4.5 Preview.)

I'm not in a hurry to remove test_email.py -- we need to make sure that all its use cases are now incorporated into either ingest_email.py ro query.py. I think @parse_messages is the only one left.

Copy link
Collaborator

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's treat this as a draft PR

@gvanrossum-ms gvanrossum-ms marked this pull request as draft December 3, 2025 16:58
@gvanrossum-ms gvanrossum-ms marked this pull request as ready for review December 3, 2025 20:23
"""Ingest email files into a database."""

# Collect all .eml files
if verbose:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a cleaner way to do this verbose printing? Could there be a printVerbose() method that we call instead of print() and then it decides to print the message or not based on the verbose flag?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would feel a little "clever", so I'll skip it.

print(f" {preview}")

# Pass source_id to mark as ingested atomically with the message
source_ids = [email_id] if email_id else None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe emit a warning when the email id doesn't exist.

Do we try to create one based on the from/to/timestamp/subject hash if we don't have one otherwise?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could. I'll defer doing that until there's demand.

def mark_source_ingested(self, source_id: str) -> None:
"""Mark a source as ingested.
This performs an INSERT but does NOT commit. It should be called within
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tried this, but the docs say you can do cursor.in_transaction to see if there's an active transaction. That could keep someone from calling this if there's no active transaction.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh, this function only exists so add_messages_with_indexing can call it. I'm not too worried about users calling it wrongly.

@gvanrossum gvanrossum merged commit 18e22b5 into main Dec 3, 2025
15 checks passed
@gvanrossum gvanrossum deleted the email branch December 3, 2025 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants