Improve email ingestion: handle inline replies and track quoted content attribution #114

gvanrossum · 2025-12-03T22:52:04Z

(AI-generated description)

Summary

Enhances email import to properly handle inline replies (where the sender responds inline to quoted text) and tracks whether each text chunk is original content or quoted from someone else.

Changes

New Features

Inline reply detection: Recognizes emails where the sender responds inline to > quoted text, not just top-posted replies
Chunk source attribution: New chunk_sources field on EmailMessage that parallels text_chunks:
- None = original content from the email sender
- str = quoted content (the string is the quoted person's name, or " " if unknown)
Quoted person extraction: Parses "On Mon, Dec 10, 2020 at 10:30 AM John Doe wrote:" headers to extract the quoted person's name

Implementation

New parse_email_chunks() function returns list[tuple[str, str | None]] with full text and source attribution
Preserves quoted content unabbreviated (previously it was discarded or summarized)
Handles signature markers (-- ) to exclude signatures from parsed content

Why This Matters

Higher-level ingestion code can now decide how to index quoted text so it doesn't get incorrectly attributed to the email's sender. This enables more accurate knowledge extraction from email threads.

Testing

Added comprehensive tests for is_inline_reply() and parse_email_chunks()
All existing tests continue to pass

gvanrossum-ms added 3 commits December 3, 2025 14:01

Recognize inline replies, even in the presence of unquoted trailers

7267ae4

Rename bottom-posting to top-posting

37006c1

Add chunk_sources to track quoted vs original email content

3c1bc26

gvanrossum temporarily deployed to build-pipeline December 3, 2025 22:52 — with GitHub Actions Inactive

gvanrossum marked this pull request as draft December 3, 2025 22:54

Add CODEOWNERS

21282ba

gvanrossum-ms temporarily deployed to build-pipeline December 3, 2025 23:00 — with GitHub Actions Inactive

gvanrossum changed the title ~~Improve email import: handle inline replies and track quoted content attribution~~ Improve email ingestion: handle inline replies and track quoted content attribution Dec 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve email ingestion: handle inline replies and track quoted content attribution #114

Improve email ingestion: handle inline replies and track quoted content attribution #114

Uh oh!

gvanrossum commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Improve email ingestion: handle inline replies and track quoted content attribution #114

Are you sure you want to change the base?

Improve email ingestion: handle inline replies and track quoted content attribution #114

Uh oh!

Conversation

gvanrossum commented Dec 3, 2025

Summary

Changes

New Features

Implementation

Why This Matters

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants