Skip to content

Conversation

@bmerkle
Copy link
Contributor

@bmerkle bmerkle commented Dec 3, 2025

Refactoring for issue #112

all inline imports are removed

formatted imports and from imports
(sorted, arranged etc, via ruff tool) ruff check --select I --fix .

circular dependecies are removed, a new types.py file was introduced

searchlib.py improved handling of pydantic classes

note that the patch set is quite large (99 files) but i looked through to assert that nothing broke.

unit tests ran fine. (except mcp server)

- Rearranged import statements for better readability and consistency.
- Consolidated imports from the same module into single lines where applicable.
- Removed unused imports and organized them to follow a logical order.
- Introduced a new `types.py` file to hold
shared type definitions, reducing circular dependencies.
- Updated version number to 0.3.3 in `uv.lock`.
- searchlib.py improved handling of pydantic classes
@bmerkle bmerkle changed the title Refactoring for issue https://github.com/microsoft/typeagent-py/issues/112 Refactoring for issue 112 Dec 3, 2025
@bmerkle
Copy link
Contributor Author

bmerkle commented Dec 3, 2025

(typeagent) PS C:\work\microsoft\typeagent-py> .\make.bat test
Running unit tests...
================================================================= test session starts =================================================================
platform win32 -- Python 3.13.5, pytest-8.4.2, pluggy-1.6.0
rootdir: C:\work\microsoft\typeagent-py
configfile: pyproject.toml
testpaths: test
plugins: anyio-4.11.0, logfire-4.14.2, asyncio-1.2.0, mock-3.15.1
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=function, asyncio_default_test_loop_scope=function
collected 358 items

test\test_add_messages_with_indexing.py ... [ 0%]
test\test_auth.py ....... [ 2%]
test\test_collections.py ........................... [ 10%]
test\test_conversation_metadata.py .................... [ 15%]
test\test_demo.py . [ 16%]
test\test_embedding_consistency.py ... [ 17%]
test\test_embeddings.py ........... [ 20%]
test\test_factory.py ... [ 20%]
test\test_incremental_index.py .. [ 21%]
test\test_interfaces.py ...................... [ 27%]
test\test_knowledge.py .... [ 28%]
test\test_kplib.py ...... [ 30%]
test\test_mcp_server.py ss [ 31%]
test\test_message_text_index_population.py . [ 31%]
test\test_message_text_index_serialization.py .. [ 31%]
test\test_messageindex.py ....s... [ 34%]
test\test_online.py . [ 34%]
test\test_podcast_incremental.py .. [ 34%]
test\test_podcasts.py . [ 35%]
test\test_property_index_population.py . [ 35%]
test\test_propindex.py ............ [ 38%]
test\test_query.py .................................. [ 48%]
test\test_query_method.py .. [ 48%]
test\test_related_terms_fast.py . [ 49%]
test\test_related_terms_index_population.py . [ 49%]
test\test_reltermsindex.py ............. [ 53%]
test\test_searchlib.py ....................................................... [ 68%]
test\test_secindex.py ... [ 69%]
test\test_secindex_storage_integration.py . [ 69%]
test\test_semrefindex.py ...................... [ 75%]
test\test_serialization.py ........ [ 77%]
test\test_sqlite_indexes.py ..................... [ 83%]
test\test_sqlitestore.py ..... [ 85%]
test\test_storage_providers_unified.py .............................. [ 93%]
test\test_timestampindex.py . [ 93%]
test\test_transcripts.py ....... [ 95%]
test\test_utils.py .... [ 96%]
test\test_vectorbase.py ........... [100%]

====================================================== 355 passed, 3 skipped in 64.52s (0:01:04) ======================================================

Copy link
Collaborator

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should install and run isort as part of 'make format'...

Comment on lines -6 to +9
import pytest
import sqlite3

import numpy as np
import pytest
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 3rd party modules should be in one block. But fixtures is local and needs to be separated by a blank line.

# Licensed under the MIT License.

import pytest

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put that blank line back please.

Comment on lines 113 to 117
assert group.terms[0] == term1
assert group.terms[1] == term2
assert pydantic_dataclass_to_dict(group.terms[0]) == pydantic_dataclass_to_dict(term1)
assert pydantic_dataclass_to_dict(group.terms[1]) == pydantic_dataclass_to_dict(term2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why this was needed. Doesn't pydantic's dataclass create an __eq__ method etc.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use Pydantic dataclasses, you get the normal Python dataclass dunder methods, including eq.
If you use Pydantic models, equality is handled differently, there is no generation of eq etc.

you are right that we are currently using dataclasses and not models, but if we would switch to models, the new code above would also work (so in this case it is more generic and prepared for changes in the future) .

Copy link
Collaborator

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flight's leaving.

@@ -0,0 +1,14 @@
"""Legacy debug script placeholder.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file probably shouldn't be imported. If it should, it needs a copyright header.

from pathlib import Path
from typing import List, Tuple


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't mess with this spacing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I missed make format


import colorama

# Azure SDK imports
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather change this comment than move the imports.

Comment on lines -11 to +12
from colorama import init as colorama_init, Back, Fore, Style
from colorama import Back, Fore, Style
from colorama import init as colorama_init
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Comment on lines -5 to +6
from dataclasses import dataclass
import time
from dataclasses import dataclass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original was better...

if self._storage_provider is None:
# Import here to avoid circular import
from ..storage.memory import MemoryStorageProvider

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't 'make format' put this blank line back?

from typing import Any, cast


def pydantic_dataclass_to_dict(obj: Any) -> Any:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? It seems significant but I fail to see why.

Comment on lines 8 to 10
MemoryStorageProvider,
MemoryMessageCollection,
MemorySemanticRefCollection,
MemoryMessageCollection,
MemorySemanticRefCollection,
MemoryStorageProvider,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad formatting. Use 'make format'

exc_tb: object,
) -> None:
"""Exit transaction context. No-op for in-memory storage."""
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to preserve such 'pass' lines.

import typing

from ...knowpro import interfaces, serialization
from .schema import ShreddedMessage, ShreddedSemanticRef
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct! My guiding principle here is that the "closest" import comes last.

ValueError: If embeddings in the database don't match the expected size.
"""
from .schema import deserialize_embedding

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you delete the last online import, also delete the blank line (which was imposed by the formatter).

@bmerkle
Copy link
Contributor Author

bmerkle commented Dec 4, 2025

@gvanrossum question: can we have a simple, low-threshold communcation channel (like chat, discord, ms-teams, or linked-in chat) to brainstrom or discuss in advance things ? Then i might explain beforehand and we can easily chat if it make sense. :-)

in the PR here I used ruff instead of isort and format. It might sort thing differently than isort so this change caused some noise or disagreement about sorting and formatting i think. Appended is a short comparison of ruff vs. isort and format.

@bmerkle
Copy link
Contributor Author

bmerkle commented Dec 4, 2025

✅ What Ruff brings — and where it may “beat” isort

Ruff is a fast, all-in-one linter + formatter + import-sorter — it tries to replace not only isort, but also tools like linters, formatters, and more in one go.

If you enable the import-sorting rules (rule group “I”), Ruff can sort imports — so you don’t need a separate tool just for imports.
Medium

For large code bases or in continuous-integration / pre-commit settings, Ruff tends to be much faster than combining multiple tools (formatter + linter + isort) because it’s implemented in Rust and optimized for performance.

For convenience: with Ruff you can have linting + formatting + import handling in a single command or setup, reducing the number of dependencies or tools you need to maintain.

So if you want a unified toolchain, care about speed/performance, and are okay with potentially slightly different import-sorting behavior — Ruff is a compelling modern choice.

⚠️ Where isort still shines (or where Ruff may fall short)

Import sorting in Ruff is “intended to be near-equivalent” to isort (especially if you configure isort with certain profiles), but there are known differences — e.g. in how aliased imports are grouped, or how inline comments are handled.

Ruff’s import-sorting capabilities are “good enough” for many, but if you rely on very precise import ordering/grouping rules, or need maximum control over formatting decisions, isort might be more predictable and configurable.

Because Ruff is broader in scope (linter + formatter + sorter), if something goes wrong — configuration issues, conflicts with formatting — it may be harder to isolate and debug than a dedicated import sorter.

Some users report edge cases or inconsistencies when mixing Ruff’s sorting + formatting + other tools (especially if also using other formatters or import tools).

@bmerkle
Copy link
Contributor Author

bmerkle commented Dec 4, 2025

i did not run make format check , and the refactoring broke some typechecking tests.
sorry i will rework the PR.

@bmerkle
Copy link
Contributor Author

bmerkle commented Dec 4, 2025

@gvanrossum okay, much of the noise was caused by the reformatting of imports at the top of the file.
I did not only fix inline imports, but at the same time rearranged the imports because i had to touch them anyway.
Via ruff i did a automatic reformatting, but it should conform to PEP 8 and be consistent because it is automatic.
So hopefully this explains the modifications regarding imports. I looked through all files and they LGTM.

ruff formats imports according to the PEP 8 import ordering rules, which generally means:

  • first import ...
  • then from ... import ...

Ruff’s import sorter (ruff --fix, rule I001, internally inspired by isort) enforces:

  • Standard library imports
  • Third-party imports
  • Local application imports

…cies

Added dataclasses.py to wrap pydantic’s decorator with dataclass_transform + overloads so pyright understands generated initializers and fields;
updated all knowpro modules, tests, and related utilities to import the wrapper instead of pydantic.dataclasses.dataclass.

Adjusted dependent modules (interfaces.py, kplib.py, search.py, search_query_schema.py, date_time_schema.py, answer_response_schema.py, universal_message.py, email_message.py, and several tests) to use the new wrapper while preserving import ordering and existing behavior.

all other changes a import and formatting related modifications (which should be ok)
make format, check, test is running clean
@bmerkle
Copy link
Contributor Author

bmerkle commented Dec 4, 2025

relevant changes:

Added dataclasses.py to wrap pydantic’s decorator with dataclass_transform + overloads so pyright understands generated initializers and fields;
updated all knowpro modules, tests, and related utilities to import the wrapper instead of pydantic.dataclasses.dataclass.

Adjusted dependent modules (interfaces.py, kplib.py, search.py, search_query_schema.py, date_time_schema.py, answer_response_schema.py, universal_message.py, email_message.py, and several tests) to use the new wrapper while preserving import ordering and existing behavior.

note:

all other changes a import and formatting related modifications (which should be ok)
make format, check, test is running clean

@bmerkle bmerkle changed the title Refactoring for issue 112 Refactoring inline imports and resolve circular dependencies Dec 4, 2025
@gvanrossum
Copy link
Collaborator

chat

I'm on Discord but I don't want to reveal my handle here and I have heavy privacy barriers around it. If you email me privately at [email protected] we can arrange something.

@KRRT7
Copy link
Contributor

KRRT7 commented Dec 7, 2025

Generally in vibe-coded projects there tend to be a lot of inline imports because LLMs are bad at writing code properly and tend to hack around things by adding inline imports vs structuring and importing the right way the first time around.

My suggestion would be to reset against main, implement the necessary changes, and then run make format only, strictly resolving the inline imports / implementing the refactor. This way you'll have an easy-to-review PR for this specific concern.

Later on you could make a separate PR switching over to ruff and avoiding the 101 files diff in this PR and have it happen in that subsequent PR

Another thing: instead of asking LLMs about the differences between isort and ruff import sorting, I'd highly suggest reading over https://docs.astral.sh/ruff/faq/#how-does-ruffs-import-sorting-compare-to-isort as it's up to date.

@bmerkle
Copy link
Contributor Author

bmerkle commented Dec 7, 2025

My suggestion would be to reset against main, implement the necessary changes, and then run make format only, strictly resolving the inline imports / implementing the refactor. This way you'll have an easy-to-review PR for this specific concern.

Later on you could make a separate PR switching over to ruff and avoiding the 101 files diff in this PR and have it happen in that subsequent PR

Hi @KRRT7 thanks for your comments.
regarding the PR i am open to reset against main, and do it again, but i think we will end up with the same number of modification.

Refactoring the inline import is a kind of mass operations when it affects several files.
I also automatically rearranged the import and from statements of the beginning of the files.
While they follow the accepted PEP style and have been done consistent and automatically with ruff, they have raised some discussion (alphabetical sort in the list, etc).

The interesting part of this PR is the removal of circular dependencies. I can outline and explain this to better understand the PR.

The rest of the PR is pretty trivial as it concerns only the above mentioned import and inline rearrangement and it by nature a mass change. I reviewed this in VSCode and you have only to really check the beginning sections of the files. And the inline imports which have been pulled up to the top of the file.

You could argue that i have done all 3 things (remove circular dependencies, remove inline import, reformat imports) in one step, but i think it made sense.

Please lets me know what you think. I can resubmit 3 PR for each step, but if we reformat the imports, number of modifications we be the same.

@gvanrossum
Copy link
Collaborator

I strongly prefer to undo all the import reformats for this PR, if possible. The other two are intertwined and should probably be kept together.

@bmerkle
Copy link
Contributor Author

bmerkle commented Dec 7, 2025

OK i will create a new PR.

@bmerkle
Copy link
Contributor Author

bmerkle commented Dec 9, 2025

please cancel/close this PR as it is too large and covers too much different aspects.
I will create 3 separate, smaller PR for each aspect:

@bmerkle bmerkle closed this Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants