Problematic rank after calling `split_dataset_by_node` twice

### Describe the bug

I'm trying to split `IterableDataset` by `split_dataset_by_node`.
But when doing split on a already split dataset, the resulting `rank` is greater than `world_size`.

### Steps to reproduce the bug

Here is the minimal code for reproduction:
```py
>>> from datasets import load_dataset
>>> from datasets.distributed import split_dataset_by_node
>>> dataset = load_dataset('fla-hub/slimpajama-test', split='train', streaming=True)      
>>> dataset = split_dataset_by_node(dataset, 1, 32)
>>> dataset._distributed
DistributedConfig(rank=1, world_size=32)
>>> dataset = split_dataset_by_node(dataset, 1, 15)
>>> dataset._distributed                           
DistributedConfig(rank=481, world_size=480)
```
As you can see, the second rank 481 > 480, which is problematic.

### Expected behavior

I think this error comes from this line @lhoestq
https://github.com/huggingface/datasets/blob/a6ccf944e42c1a84de81bf326accab9999b86c90/src/datasets/iterable_dataset.py#L2943-L2944
We may need to obtain the rank first. Then the above code gives
```py
>>> dataset._distributed                           
DistributedConfig(rank=16, world_size=480)
```

### Environment info

datasets==2.20.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problematic rank after calling `split_dataset_by_node` twice #6990

Describe the bug

Steps to reproduce the bug

Expected behavior

Environment info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	world_size = world_size * dataset._distributed.world_size
	rank = world_size * dataset._distributed.rank + rank

Problematic rank after calling split_dataset_by_node twice #6990

Description

Describe the bug

Steps to reproduce the bug

Expected behavior

Environment info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Problematic rank after calling `split_dataset_by_node` twice #6990