Make GoogleTranslator multiprocessing-safe#1543
Conversation
|
DCO Assistant Lite bot All contributors have signed the DCO ✍️ ✅ |
GoogleTranslator stores a live Google Cloud translation client on the instance, which cannot be pickled when parallel_attempts > 1 and causes PicklingError during multiprocessing startup. This change prevents the client from being pickled and allows it to be recreated per worker process, following the same pattern used by RivaTranslator.# On branch fix-googletranslator-pickling Signed-off-by: Harsh Motla <[email protected]>
7092f34 to
97d5644
Compare
|
recheck |
|
@cla-assistant check |
|
@motlaharsh0909-lgtm "You can sign the DCO by just posting a Pull Request Comment same as the below format. I have read the DCO Document and I hereby sign the DCO" |
|
I have read the DCO Document and I hereby sign the DCO |
|
recheck |
jmartin-tech
left a comment
There was a problem hiding this comment.
Testing of this is queued, the guard suggestion offered is precautionary.
| self._tested = True | ||
|
|
||
| def _translate(self, text: str) -> str: | ||
| retry = 5 |
There was a problem hiding this comment.
Currently only the primary process expects to access langproviders meaning client should always be set during a call to translate, this guard however would ensure restoration of the object if access were to occur on a object that passed thru pickle:
| retry = 5 | |
| if not self.client: | |
| self._load_langprovider() |
jmartin-tech
left a comment
There was a problem hiding this comment.
Testing shows need to clear one more attribute:
cls(buf, protocol).dump(obj)
TypeError: cannot pickle 'module' object
Also some minor formatting asks.
| def __getstate__(self): | ||
| state = dict(self.__dict__) | ||
| state["client"] = None |
There was a problem hiding this comment.
Testing shows that the ftfy attribute also cannot pickle:
| def __getstate__(self): | |
| state = dict(self.__dict__) | |
| state["client"] = None | |
| def __getstate__(self): | |
| state = dict(self.__dict__) | |
| state["client"] = None | |
| state["ftfy"] = None |
|
|
||
| def __setstate__(self, state): | ||
| self.__dict__.update(state) | ||
| self.client = None |
There was a problem hiding this comment.
Minor whitespace nitpick:
| self.client = None | |
| self.client = None | |
Tell us what this change does
GoogleTranslator stores a live Google Cloud Translation client on the instance, which cannot be pickled when parallel_attempts > 1 and causes a PicklingError during multiprocessing startup.
This change prevents the client from being pickled and allows it to be recreated per worker process, following the same pattern used by RivaTranslator.
Fixes #1515
Related issue: #1515
Verification
Supporting configuration such as generator configuration file
run:
target_lang: ar
parallel_attempts: 4
langproviders:
model_type: remote.GoogleTranslator
Command used for verification
python -m garak
--config test_google_translate.yaml
--model_type openai
--model_name gpt-4o
Run the tests and ensure they pass
python -m pytest tests/
Verify the thing does what it should
Garak starts successfully with parallel_attempts > 1
GoogleTranslator is initialized per worker process
No PicklingError occurs during multiprocessing startup
Probes are queued and begin execution
Verify the thing does not do what it should not
The Google Cloud Translation client is not pickled
Multiprocessing worker initialization does not fail
Additional information
No hardware-specific requirements
Uses Google Cloud Translation API for verification
No new dependencies introduced