Skip to content

Conversation

@guyrosin
Copy link
Contributor

@guyrosin guyrosin commented Dec 15, 2020

What does this PR do?

When calling a Pipeline, the kwargs argument is not passed to the tokenizer (it is actually not used at all).
I think the intended behavior is to pass it (as the base tokenizer's __call__() method already supports kwargs), and that's what this PR does.
[Related to #8180]

The call order is:

SpecificPipeline.__call__(..., **kwargs)
# Which calls
Pipeline.__call__(..., **kwargs)
# Which calls
SpecificPipeline._parse_and_tokenize(..., **kwargs)
# Which in turn calls
self.tokenizer(...) # No kwargs in this call

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@sgugger @LysandreJik

@guyrosin guyrosin reopened this Jan 15, 2021
@sgugger
Copy link
Collaborator

sgugger commented Jan 15, 2021

cc @Narsil who probably knows better.

@Narsil
Copy link
Contributor

Narsil commented Jan 15, 2021

Hi @guyrosin , we actually don't want to enable that.
The problem is that kwargs are used both by _parse_and_tokenize and by generate/forward.
See discussion here: #9432 (comment)

I'm guessing you want to override a tokenizer argument at runtime in the pipeline. The best way to do that is to whitelist all arguments of the tokenizer (like we did with truncation). and only pass **kwargs to generate. That's the best way to isolate arguments of both functionalities without creating a mess. Hopefully there won't be any arguments with the same name in both function calls..

The **kwargs in the function signature, is legacy for now as it simply captures previous arguments that used to be sent, and prevents triggering an error for previously written code.

@guyrosin
Copy link
Contributor Author

Ohh, got it. Thanks for the explanation @Narsil!

@guyrosin guyrosin closed this Jan 16, 2021
@guyrosin guyrosin deleted the pipeline-args-fix branch January 16, 2021 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants