Skip to content

Conversation

@esafak
Copy link
Contributor

@esafak esafak commented Jul 15, 2025

What does this PR do?

  • Removed cloudpickle dependency and related logic.
  • Replaced cloudpickle serialization with Fory's native serialization for unsupported types.
  • Updated serializers for Python arrays, NumPy arrays, and Objects to use native serialization.
  • Modified tests to reflect the removal of cloudpickle and use of native serialization.
  • Added global helper functions for testing to avoid issues with local function serialization.
  • Updated README.md with a note about bazel caching issues.

Related issues

Contributes to #2409

Notes

Next PickleSerializer will be removed bit by bit.

* Removed cloudpickle dependency and related logic.
* Replaced cloudpickle serialization with Fory's native serialization for unsupported types.
* Updated serializers for Python arrays, NumPy arrays, and Objects to use native serialization.
* Modified tests to reflect the removal of cloudpickle and use of native serialization.
* Added global helper functions for testing to avoid issues with local function serialization.
* Updated README.md with a note about bazel caching issues.
@esafak esafak requested a review from chaokunyang as a code owner July 15, 2025 15:40
@esafak esafak marked this pull request as draft July 15, 2025 16:00
@esafak
Copy link
Contributor Author

esafak commented Jul 15, 2025

@chaokunyang I did not realize the xlang tests were being skipped in Python. I set ENABLE_CROSS_LANGUAGE_TESTS and now pytest complained that fixture 'data_file_path' not found is missing. I could not find a definition for it now or in the past. How is it supposed to work?

@chaokunyang
Copy link
Collaborator

chaokunyang commented Jul 15, 2025

It's called from java CrossLanguageTest.java. Java serialize object into bytes and write to file, then execute python tests and pass that filepath as inputs

@esafak
Copy link
Contributor Author

esafak commented Jul 15, 2025

I think a comment the python xlang tests explain this would have helped, along with documentation of how to run them locally.

@chaokunyang
Copy link
Collaborator

chaokunyang commented Jul 16, 2025

I think a comment the python xlang tests explain this would have helped, along with documentation of how to run them locally.

Yes, a comment would be helpful, thanks for this suggestion, I added a comment in #2420

@chaokunyang
Copy link
Collaborator

chaokunyang commented Sep 10, 2025

Hi @esafak , would you like to continue working on this? I did a benchmark several days before. It shows that pyfory is faster than cpickle and has smaller size compared to pickle for data in cpython benchmark . If we can drop-in replace pickle, it would be very promising.

@esafak
Copy link
Contributor Author

esafak commented Sep 10, 2025

I'm busy, so feel free to take over.

@chaokunyang
Copy link
Collaborator

I'm busy, so feel free to take over.

I implemented in #2629

urlyy pushed a commit to urlyy/fory that referenced this pull request Sep 19, 2025
<!--
**Thanks for contributing to Apache Fory™.**

**If this is your first time opening a PR on fory, you can refer to
[CONTRIBUTING.md](https://github.com/apache/fory/blob/main/CONTRIBUTING.md).**

Contribution Checklist

- The **Apache Fory™** community has requirements on the naming of pr
titles. You can also find instructions in
[CONTRIBUTING.md](https://github.com/apache/fory/blob/main/CONTRIBUTING.md).

- Apache Fory™ has a strong focus on performance. If the PR you submit
will have an impact on performance, please benchmark it first and
provide the benchmark result here.
-->

## Why?

Implement serialization for any pickleable objects, so that pyfory can
be used to replace pickle for smaller size and faster speed.

## What does this PR do?

<!-- Describe the details of this PR. -->

## Related issues

Closes apache#2417 

## Does this PR introduce any user-facing change?

<!--
If any user-facing interface changes, please [open an
issue](https://github.com/apache/fory/issues/new/choose) describing the
need to do so and update the document if necessary.

Delete section if not applicable.
-->

- [ ] Does this PR introduce any public API change?
- [ ] Does this PR introduce any binary protocol compatibility change?

## Benchmark

<!--
When the PR has an impact on performance (if you don't know whether the
PR will have an impact on performance, you can submit the PR first, and
if it will have impact on performance, the code reviewer will explain
it), be sure to attach a benchmark data here.

Delete section if not applicable.
-->
chaokunyang added a commit to chaokunyang/fory that referenced this pull request Sep 19, 2025
<!--
**Thanks for contributing to Apache Fory™.**

**If this is your first time opening a PR on fory, you can refer to
[CONTRIBUTING.md](https://github.com/apache/fory/blob/main/CONTRIBUTING.md).**

Contribution Checklist

- The **Apache Fory™** community has requirements on the naming of pr
titles. You can also find instructions in
[CONTRIBUTING.md](https://github.com/apache/fory/blob/main/CONTRIBUTING.md).

- Apache Fory™ has a strong focus on performance. If the PR you submit
will have an impact on performance, please benchmark it first and
provide the benchmark result here.
-->

Implement serialization for any pickleable objects, so that pyfory can
be used to replace pickle for smaller size and faster speed.

<!-- Describe the details of this PR. -->

Closes apache#2417

<!--
If any user-facing interface changes, please [open an
issue](https://github.com/apache/fory/issues/new/choose) describing the
need to do so and update the document if necessary.

Delete section if not applicable.
-->

- [ ] Does this PR introduce any public API change?
- [ ] Does this PR introduce any binary protocol compatibility change?

<!--
When the PR has an impact on performance (if you don't know whether the
PR will have an impact on performance, you can submit the PR first, and
if it will have impact on performance, the code reviewer will explain
it), be sure to attach a benchmark data here.

Delete section if not applicable.
-->
@chaokunyang chaokunyang mentioned this pull request Sep 19, 2025
4 tasks
chaokunyang added a commit to chaokunyang/fory that referenced this pull request Sep 19, 2025
<!--
**Thanks for contributing to Apache Fory™.**

**If this is your first time opening a PR on fory, you can refer to
[CONTRIBUTING.md](https://github.com/apache/fory/blob/main/CONTRIBUTING.md).**

Contribution Checklist

- The **Apache Fory™** community has requirements on the naming of pr
titles. You can also find instructions in
[CONTRIBUTING.md](https://github.com/apache/fory/blob/main/CONTRIBUTING.md).

- Apache Fory™ has a strong focus on performance. If the PR you submit
will have an impact on performance, please benchmark it first and
provide the benchmark result here.
-->

Implement serialization for any pickleable objects, so that pyfory can
be used to replace pickle for smaller size and faster speed.

<!-- Describe the details of this PR. -->

Closes apache#2417

<!--
If any user-facing interface changes, please [open an
issue](https://github.com/apache/fory/issues/new/choose) describing the
need to do so and update the document if necessary.

Delete section if not applicable.
-->

- [ ] Does this PR introduce any public API change?
- [ ] Does this PR introduce any binary protocol compatibility change?

<!--
When the PR has an impact on performance (if you don't know whether the
PR will have an impact on performance, you can submit the PR first, and
if it will have impact on performance, the code reviewer will explain
it), be sure to attach a benchmark data here.

Delete section if not applicable.
-->
chaokunyang added a commit that referenced this pull request Sep 19, 2025
<!--
**Thanks for contributing to Apache Fory™.**

**If this is your first time opening a PR on fory, you can refer to
[CONTRIBUTING.md](https://github.com/apache/fory/blob/main/CONTRIBUTING.md).**

Contribution Checklist

- The **Apache Fory™** community has requirements on the naming of pr
titles. You can also find instructions in
[CONTRIBUTING.md](https://github.com/apache/fory/blob/main/CONTRIBUTING.md).

- Apache Fory™ has a strong focus on performance. If the PR you submit
will have an impact on performance, please benchmark it first and
provide the benchmark result here.
-->

Implement serialization for any pickleable objects, so that pyfory can
be used to replace pickle for smaller size and faster speed.

<!-- Describe the details of this PR. -->

Closes #2417

<!--
If any user-facing interface changes, please [open an
issue](https://github.com/apache/fory/issues/new/choose) describing the
need to do so and update the document if necessary.

Delete section if not applicable.
-->

- [ ] Does this PR introduce any public API change?
- [ ] Does this PR introduce any binary protocol compatibility change?

<!--
When the PR has an impact on performance (if you don't know whether the
PR will have an impact on performance, you can submit the PR first, and
if it will have impact on performance, the code reviewer will explain
it), be sure to attach a benchmark data here.

Delete section if not applicable.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants