Dataset sharding non-contiguous?

## Describe the bug
I'm not sure if this is a bug; more likely normal behavior but i wanted to double check.
Is it normal that `datasets.shard` does not produce chunks that, when concatenated produce the original ordering of the sharded dataset? 

This might be related to this pull request (https://github.com/huggingface/datasets/pull/4466) but I have to admit I did not properly look into the changes made.

## Steps to reproduce the bug
```python
max_shard_size = convert_file_size_to_int('300MB')
dataset_nbytes = dataset.data.nbytes
num_shards = int(dataset_nbytes / max_shard_size) + 1
num_shards = max(num_shards, 1)
print(f"{num_shards=}")
for shard_index in range(num_shards):
    shard = dataset.shard(num_shards=num_shards, index=shard_index)
    shard.to_parquet(f"tokenized/tokenized-{shard_index:03d}.parquet")
os.listdir('tokenized/')
```

## Expected results
I expected the shards to match the order of the data of the original dataset; i.e. `dataset[10]` being the same as `shard_1[10]` for example

## Actual results
Only the first element is the same; i.e. `dataset[0]` is the same as `shard_1[0]`

## Environment info

- `datasets` version: 2.3.2
- Platform: Linux-4.15.0-176-generic-x86_64-with-glibc2.31
- Python version: 3.10.4
- PyArrow version: 8.0.0
- Pandas version: 1.4.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dataset sharding non-contiguous? #4570

Describe the bug

Steps to reproduce the bug

Expected results

Actual results

Environment info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dataset sharding non-contiguous? #4570

Description

Describe the bug

Steps to reproduce the bug

Expected results

Actual results

Environment info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions