Skip to content

Conversation

@walter9388
Copy link

@walter9388 walter9388 commented Feb 20, 2025

Relating to #982

Took me a while to get round to this, but here we are....

I think there are three different levels of approaching this issue:

  1. Remove extra None types only:
    -def f(x: Optional[Union[int, None]]): pass
    +def f(x: int | None): pass
    -def g(x: Union[Optional[int], None]): pass
    +def g(x: int | None): pass
  2. Remove any duplicated scalar types at the same depth by name in Union blocks:
    -def f(x: Union[Union[Union[Union[a, b], c], d], a]): pass
    +def f(x: a | b | c | d): pass
    -def g(x: Union[a.b | a.c, a.b, list[str], str]): pass
    +def g(x: a.b | a.c | list[str] | str): pass
  3. General deduplication at any depth on any block:
    -def f(x: Union[list[Union[int, str]], list[Union[str, int]]]): pass
    +def f(x: list[int | str]): pass

I settled on level 2 as this was still possible with a single pass and seemed more useful than just focusing on None types.
I couldn't see a way of approaching the general problem (level 3) without recursively making a tree structure and then assessing the leaf nodes. However, I am not deeply familiar with the standard python libraries for parsing ASTs etc., so if there are simple built in methods for problems like this I would be interested to know!

I used the existing scan in _fix_union to determine the delimitators at depth==1 between the types. This seemed to work well, but I definitely ran into some interesting edge cases when it came to handling comments, whitespace and multilines.

I have managed to get this working for a variety of test cases, and I would be interested to hear your feedback.

Btw I enjoy your YouTube content! I have learnt a lot of niche things I would have struggled to pick up otherwise. So thank you for that!

Copy link
Owner

@asottile asottile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a quick first pass -- will look more closely later

Copy link
Author

@walter9388 walter9388 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just highlighting a few things to be aware of.

def _fix_optional(i: int, tokens: list[Token]) -> None:
j = find_op(tokens, i, '[')
k = find_closing_bracket(tokens, j)
k, contains_none = _find_closing_bracket_and_if_contains_none(tokens, j)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified the general find_closing_bracket function to also check for whether the optional block already contains None in the same pass.

Comment on lines +30 to +34
tokens[k:k + 1] = [
Token('UNIMPORTANT_WS', ' '),
Token('CODE', '| '),
Token('CODE', 'None'),
]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for changing the single token containing | None to explicit whitespace, | and None is for the deduplication and whitespace removal functions used in _fix_union. This also applies to the multiline version a few lines below.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be helpful to do all the tokens then -- | would be 'OP' and None would be 'NAME'

Comment on lines +137 to +139
to_delete += _remove_consecutive_unimportant_ws(
tokens, [x for x in range(j, k) if x not in to_delete],
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not convinced this is the best approach to remove whitespace, but not sure about what to do in situations where lines are completely deleted other than comments. I have written a niche test for this situation in test id='duplicated types in multi-line nested unions or optionals'.

def _fix_optional(i: int, tokens: list[Token]) -> None:
j = find_op(tokens, i, '[')
k = find_closing_bracket(tokens, j)
k, contains_none = _find_closing_bracket_and_if_contains_none(tokens, j)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typically I add helper functions above where they're called rather than below

) -> list[int]:
to_delete = []
prev_name = ''
for kk in idxs:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why kk?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants