-
Notifications
You must be signed in to change notification settings - Fork 357
feat: Replace cloudpickle with native Fory serialization #2417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Removed cloudpickle dependency and related logic. * Replaced cloudpickle serialization with Fory's native serialization for unsupported types. * Updated serializers for Python arrays, NumPy arrays, and Objects to use native serialization. * Modified tests to reflect the removal of cloudpickle and use of native serialization. * Added global helper functions for testing to avoid issues with local function serialization. * Updated README.md with a note about bazel caching issues.
|
@chaokunyang I did not realize the xlang tests were being skipped in Python. I set |
|
It's called from java CrossLanguageTest.java. Java serialize object into bytes and write to file, then execute python tests and pass that filepath as inputs |
|
I think a comment the python xlang tests explain this would have helped, along with documentation of how to run them locally. |
Yes, a comment would be helpful, thanks for this suggestion, I added a comment in #2420 |
|
Hi @esafak , would you like to continue working on this? I did a benchmark several days before. It shows that pyfory is faster than cpickle and has smaller size compared to pickle for data in cpython benchmark . If we can drop-in replace pickle, it would be very promising. |
|
I'm busy, so feel free to take over. |
I implemented in #2629 |
<!-- **Thanks for contributing to Apache Fory™.** **If this is your first time opening a PR on fory, you can refer to [CONTRIBUTING.md](https://github.com/apache/fory/blob/main/CONTRIBUTING.md).** Contribution Checklist - The **Apache Fory™** community has requirements on the naming of pr titles. You can also find instructions in [CONTRIBUTING.md](https://github.com/apache/fory/blob/main/CONTRIBUTING.md). - Apache Fory™ has a strong focus on performance. If the PR you submit will have an impact on performance, please benchmark it first and provide the benchmark result here. --> ## Why? Implement serialization for any pickleable objects, so that pyfory can be used to replace pickle for smaller size and faster speed. ## What does this PR do? <!-- Describe the details of this PR. --> ## Related issues Closes apache#2417 ## Does this PR introduce any user-facing change? <!-- If any user-facing interface changes, please [open an issue](https://github.com/apache/fory/issues/new/choose) describing the need to do so and update the document if necessary. Delete section if not applicable. --> - [ ] Does this PR introduce any public API change? - [ ] Does this PR introduce any binary protocol compatibility change? ## Benchmark <!-- When the PR has an impact on performance (if you don't know whether the PR will have an impact on performance, you can submit the PR first, and if it will have impact on performance, the code reviewer will explain it), be sure to attach a benchmark data here. Delete section if not applicable. -->
<!-- **Thanks for contributing to Apache Fory™.** **If this is your first time opening a PR on fory, you can refer to [CONTRIBUTING.md](https://github.com/apache/fory/blob/main/CONTRIBUTING.md).** Contribution Checklist - The **Apache Fory™** community has requirements on the naming of pr titles. You can also find instructions in [CONTRIBUTING.md](https://github.com/apache/fory/blob/main/CONTRIBUTING.md). - Apache Fory™ has a strong focus on performance. If the PR you submit will have an impact on performance, please benchmark it first and provide the benchmark result here. --> Implement serialization for any pickleable objects, so that pyfory can be used to replace pickle for smaller size and faster speed. <!-- Describe the details of this PR. --> Closes apache#2417 <!-- If any user-facing interface changes, please [open an issue](https://github.com/apache/fory/issues/new/choose) describing the need to do so and update the document if necessary. Delete section if not applicable. --> - [ ] Does this PR introduce any public API change? - [ ] Does this PR introduce any binary protocol compatibility change? <!-- When the PR has an impact on performance (if you don't know whether the PR will have an impact on performance, you can submit the PR first, and if it will have impact on performance, the code reviewer will explain it), be sure to attach a benchmark data here. Delete section if not applicable. -->
<!-- **Thanks for contributing to Apache Fory™.** **If this is your first time opening a PR on fory, you can refer to [CONTRIBUTING.md](https://github.com/apache/fory/blob/main/CONTRIBUTING.md).** Contribution Checklist - The **Apache Fory™** community has requirements on the naming of pr titles. You can also find instructions in [CONTRIBUTING.md](https://github.com/apache/fory/blob/main/CONTRIBUTING.md). - Apache Fory™ has a strong focus on performance. If the PR you submit will have an impact on performance, please benchmark it first and provide the benchmark result here. --> Implement serialization for any pickleable objects, so that pyfory can be used to replace pickle for smaller size and faster speed. <!-- Describe the details of this PR. --> Closes apache#2417 <!-- If any user-facing interface changes, please [open an issue](https://github.com/apache/fory/issues/new/choose) describing the need to do so and update the document if necessary. Delete section if not applicable. --> - [ ] Does this PR introduce any public API change? - [ ] Does this PR introduce any binary protocol compatibility change? <!-- When the PR has an impact on performance (if you don't know whether the PR will have an impact on performance, you can submit the PR first, and if it will have impact on performance, the code reviewer will explain it), be sure to attach a benchmark data here. Delete section if not applicable. -->
<!-- **Thanks for contributing to Apache Fory™.** **If this is your first time opening a PR on fory, you can refer to [CONTRIBUTING.md](https://github.com/apache/fory/blob/main/CONTRIBUTING.md).** Contribution Checklist - The **Apache Fory™** community has requirements on the naming of pr titles. You can also find instructions in [CONTRIBUTING.md](https://github.com/apache/fory/blob/main/CONTRIBUTING.md). - Apache Fory™ has a strong focus on performance. If the PR you submit will have an impact on performance, please benchmark it first and provide the benchmark result here. --> Implement serialization for any pickleable objects, so that pyfory can be used to replace pickle for smaller size and faster speed. <!-- Describe the details of this PR. --> Closes #2417 <!-- If any user-facing interface changes, please [open an issue](https://github.com/apache/fory/issues/new/choose) describing the need to do so and update the document if necessary. Delete section if not applicable. --> - [ ] Does this PR introduce any public API change? - [ ] Does this PR introduce any binary protocol compatibility change? <!-- When the PR has an impact on performance (if you don't know whether the PR will have an impact on performance, you can submit the PR first, and if it will have impact on performance, the code reviewer will explain it), be sure to attach a benchmark data here. Delete section if not applicable. -->
What does this PR do?
Related issues
Contributes to #2409
Notes
Next
PickleSerializerwill be removed bit by bit.