Writing columns containing large binary blobs will result in huge, untruncated statistics headers & compression is ~100,000x worse than `pyarrow`

### Checks

- [x] I have checked that this issue has not already been reported.
- [x] I have confirmed this bug exists on the [latest version](https://pypi.org/project/polars/) of Polars.

### Reproducible example

```python
import polars

df = polars.DataFrame({"foo": [b"a" * 1024*1024*16]})
df.write_parquet("test.parquet")
df.write_parquet("test_pyarrow.parquet", use_pyarrow=True)
```

### Log output

```shell

```

### Issue description


This snippet will write the same dataframe containing a single row with 16MiB of binary data to the file system, in the native polars mode as well as the pyarrow mode.

The two files analysed with https://github.com/XiangpengHao/parquet-viewer:
`polars`
<img width="1146" height="1170" alt="Image" src="https://github.com/user-attachments/assets/8a8707b0-63eb-4477-badb-382d3881634d" />
`pyarrow`
<img width="1146" height="1170" alt="Image" src="https://github.com/user-attachments/assets/a0a19c5b-5bcf-478c-8ceb-cef49fd5b96f" />

Differences:

|   | File Size | Metadata size | Compression % | Uncompressed | Compressed row groups |
| ------------- | ------------- | ------------- |------------- |------------- |------------- |
| `pyarrow`  | 944 Bytes | 342 Bytes | 0.00% | 16 MiB | 598 Bytes |
| `polars` | 128 MiB | 32 MiB | 66.67% | 48 MiB | 32 MiB |
| relative | 142180x (!) | 98112x (!) | - | 3x | 56111x (!) |

### Expected behavior

- The individual row group compression should be ~50,000x better, it seems like the row itself is not compressed at all (598Bytes vs. 32 MiB)
- `pyarrow` doesn't seem to write statistics (even with `statistics=True`), but even if `polars` is different here, it should truncate the statistics after 128 bytes or so, instead of writing the entire row in an uncompressed manner in the header

This is extremely similar to an issue that was already solved with the base `arrow-rs` crate:
https://github.com/apache/arrow-rs/pull/7555
https://github.com/apache/arrow-rs/issues/7489

### Installed versions

<details>

```
>>> polars.show_versions()
--------Version info---------
Polars:              1.31.0
Index type:          UInt32
Platform:            Linux-6.15.3-1-MANJARO-x86_64-with-glibc2.41
Python:              3.12.9 (main, Mar 17 2025, 21:01:58) [Clang 20.1.0 ]
LTS CPU:             False

----Optional dependencies----
Azure CLI            <not installed>
adbc_driver_manager  <not installed>
altair               <not installed>
azure.identity       <not installed>
boto3                <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               <not installed>
gevent               <not installed>
google.auth          <not installed>
great_tables         <not installed>
matplotlib           <not installed>
numpy                2.2.5
openpyxl             <not installed>
pandas               2.2.3
polars_cloud         <not installed>
pyarrow              20.0.0
pydantic             2.11.4
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>
```

</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Writing columns containing large binary blobs will result in huge, untruncated statistics headers & compression is ~100,000x worse than `pyarrow` #23498

Checks

Reproducible example

Log output

Issue description

Expected behavior

Installed versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	File Size	Metadata size	Compression %	Uncompressed	Compressed row groups
`pyarrow`	944 Bytes	342 Bytes	0.00%	16 MiB	598 Bytes
`polars`	128 MiB	32 MiB	66.67%	48 MiB	32 MiB
relative	142180x (!)	98112x (!)	-	3x	56111x (!)

Writing columns containing large binary blobs will result in huge, untruncated statistics headers & compression is ~100,000x worse than pyarrow #23498

Description

Checks

Reproducible example

Log output

Issue description

Expected behavior

Installed versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Writing columns containing large binary blobs will result in huge, untruncated statistics headers & compression is ~100,000x worse than `pyarrow` #23498