VSCode pylance auto-completion for `HfArgumentParser` (limited support) #27275

vwxyzjn · 2023-11-03T20:38:38Z

Problem description

This PR empowers limited support of VSCode auto-completion for HfArgumentParser. Currently, the return type of parse_args_into_dataclasses is DataClass = NewType("DataClass", Any), which limits pylance's ability to infer types even if we specifically assert a dataclass type fo the args.

What does this PR do?

This PR modifies the return type to be List[TypeVar("T")]. So when we assert the args to be of a certain type (i.e., assert isinstance(args, RewardConfig)), auto-completion works as expected.

Alternatives considered

Some argparse libraries such as tyro can automatically infer the type of the args, but it doesn't seem to work with the current HfArgumentParser paradigm for two reasons:

The inferred type seems only to support one type, so the inferred type is args2: RewardConfig | Config2, so we can't parse multiple dataclasses in parse_args_into_dataclasses and infer type correctly.

Automatic type detection also requires us to change the workflow: we need to do parse_args_into_dataclasses([RewardConfig, Config2]) instead of parse_args_into_dataclasses()

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

CC @muellerzr, @pacman100, @lewtun, @brentyi

HuggingFaceDocBuilderDev · 2023-11-03T20:53:17Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

brentyi · 2023-11-03T21:11:22Z

src/transformers/hf_argparser.py

 import yaml


 DataClass = NewType("DataClass", Any)


Any idea why these NewType()s were used originally instead of a simple alias? Like DataClass = Any, DataclassType = Any.

brentyi · 2023-11-03T21:11:33Z

src/transformers/hf_argparser.py

        args_filename=None,
        args_file_flag=None,
-    ) -> Tuple[DataClass, ...]:
+    ) -> Tuple[T, ...]:


To me this looks the same as annotating the return type to Tuple[Any, ...], since T isn't used anywhere else.

The usual use of TypeVar is to infer an output type from either an input type (if T is used as annotation for an argument, like identity(x: T) -> T) or from a concretized version of a generic class (like __getitem__() return T for a list[T]).

I have some ideas for suggestions that I can leave in a comment!

Thanks so much for the comment. I think the most ideal workflow is something like below working, but not sure if it's possible 👀

class HfArgumentParser: def __init__(self, test: List[Type[T]]): self.test = test def parse_args_into_dataclasses(self) -> List[T]: return self.test parser = HfArgumentParser([RewardConfig, Config2]) args, args2 = parser.parse_args_into_dataclasses()

The first option I listed below should do this exactly! The trick is that HfArgumentParser needs to inherit from Generic[T].

brentyi · 2023-11-03T21:45:49Z

Hello!

As some alternative suggestions, typing.Generic does make specificity in parse_args_into_dataclasses() possible.

(1) This implementation gets us:

(2) typing.overload can also help, although there are some limitations from Python's type system¹. This implementation gets us:

As an FYI, tyro should correctly resolve the types of:

import dataclasses
import tyro

@dataclasses.dataclass
class TrainArgs:
    lr: float = 3e-4

@dataclasses.dataclass
class RewardConfig:
    weight: float = 0.01

# This currently prefixes arguments with `--0.` for TrainArgs, and `--1.` for RewardConfig. Could be made configurable.
train, reward = tyro.cli(tuple[TrainArgs, RewardConfig])

A similar API in HfArgumentParser could make the types much cleaner, but comes with all of the obvious downsides of a breaking change.

As in the shared code, a maximum number of dataclasses needs to be hardcoded. This might be solved in the future with better variadic generic support in the Python type system: https://github.com/python/mypy/issues/16394 ↩

vwxyzjn · 2023-11-06T13:56:55Z

@brentyi thanks so much for the detailed suggestions! I personally like "(2) typing.overload can also help" and set a maximum to something like 10, but the code will look quite hacky... Would like some input from transformers maintainers

amyeroberts · 2023-11-07T16:20:31Z

@vwxyzjn @brentyi Thanks for opening this PR and the work on improving the codebase!

As a general note, the type annotations in the library are not intended to be complete or fully compatible with type checkers e.g. running mypy will throw a bunch of errors. They are there as general documentation to guide the user.

So, in this case, DataClass is a more descriptive annotation as a return type than T, even though they both effectively represent "Any".

vwxyzjn · 2023-11-07T19:23:05Z

Hi @amyeroberts thanks for the comment! I agree that type annotation do not necessarily need to work with mypy. The main thing I was thinking is the developer experience with auto-completion.

With @brentyi's option 1, we would still get the descriptive API like before

def parse_args_into_dataclasses(self) -> Tuple[DataclassT, ...]:

but by default it's unable to recognize which dataclass type it is, so for usage like HfArgumentParser((ModelArguments, RewardModelArguments, EvaluationArguments)), the auto-completion can cause some confusion (in the screen shot below, pylance thought the train's type is either TrainArgs or RewardConfig)

With @brentyi's option 2

Our internal code would become uglier like

    @overload
    def parse_args_into_dataclasses(self: HfArgumentParser[T1, None, None, None]) -> T1:
        return self._parse_args_into_dataclasses()

    @overload
    def parse_args_into_dataclasses(
        self: HfArgumentParser[T1, T2, None, None]
    ) -> Tuple[T1, T2]:
        return self._parse_args_into_dataclasses()

    @overload
    def parse_args_into_dataclasses(
        self: HfArgumentParser[T1, T2, T3, None]
    ) -> Tuple[T1, T2, T3]:
        return self._parse_args_into_dataclasses()

    @overload
    def parse_args_into_dataclasses(
        self: HfArgumentParser[T1, T2, T3, T4]
    ) -> Tuple[T1, T2, T3, T4]:
        return self._parse_args_into_dataclasses()

    def parse_args_into_dataclasses(self) -> Any:
        return self._parse_args_into_dataclasses()

but the user will have a more seamless experience (pylance would correctly recognize train's type is TrainArgs)

Would you be in favor of either options? 1st option would require the least amount of change of course. Both options are non-breaking and just empower pylance's auto-completion in different ways.

brentyi · 2023-11-07T19:55:34Z

As a user of the library I'd personally appreciate the option 2 approach, it's nice when completions work! But I also agree it adds maintenance burden.

If folks would prefer to avoid typing.Generic, an option 3 is to just swap the

DataClass = NewType("DataClass", Any)
DataClassType = NewType("DataClassType", Any)

for

DataClass = Any
DataClassType = Any  # or type / typing.Type

This will make the assertions in the PR description useful for correct autocompletion; NewType has static analysis implications (in this case, preventing type narrowing) + usage connotations that IMO would be nice to avoid here.

amyeroberts · 2023-11-09T12:11:05Z

I completely understand the motivation behind this. However adding additional code for type checking (in particular overloads) is something which has been proposed and rejected before. The primary reason for this is that we don't formally support type checking and we don't want to add additional code we need to maintain in order to support it. For example: this comment and releated PR, or this comment.

github-actions · 2023-12-04T08:03:31Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

enable typing annotion override

6d78510

black

301f5b5

brentyi reviewed Nov 3, 2023

View reviewed changes

quick fix

b87f068

amyeroberts mentioned this pull request Nov 9, 2023

Overload pipeline to return the appropriate type for a task #26125

Closed

github-actions bot closed this Dec 12, 2023

VSCode pylance auto-completion for HfArgumentParser (limited support) #27275

VSCode pylance auto-completion for HfArgumentParser (limited support) #27275

Uh oh!

Conversation

vwxyzjn commented Nov 3, 2023

Problem description

What does this PR do?

Alternatives considered

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Nov 3, 2023

Uh oh!

brentyi Nov 3, 2023

Choose a reason for hiding this comment

Uh oh!

brentyi Nov 3, 2023

Choose a reason for hiding this comment

Uh oh!

vwxyzjn Nov 3, 2023

Choose a reason for hiding this comment

Uh oh!

brentyi Nov 3, 2023

Choose a reason for hiding this comment

Uh oh!

brentyi commented Nov 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Footnotes

Uh oh!

vwxyzjn commented Nov 6, 2023

Uh oh!

amyeroberts commented Nov 7, 2023

Uh oh!

vwxyzjn commented Nov 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

With @brentyi's option 1, we would still get the descriptive API like before

With @brentyi's option 2

Uh oh!

brentyi commented Nov 7, 2023

Uh oh!

amyeroberts commented Nov 9, 2023

Uh oh!

github-actions bot commented Dec 4, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

VSCode pylance auto-completion for `HfArgumentParser` (limited support) #27275

VSCode pylance auto-completion for `HfArgumentParser` (limited support) #27275

brentyi commented Nov 3, 2023 •

edited

Loading

vwxyzjn commented Nov 7, 2023 •

edited

Loading