Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/docs/api_docs/bodo_parallel_apis/barrier.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ Synchronize all processes. Block process from proceeding until all processes rea
A typical example is to make sure all processes see side effects simultaneously.
For example, a process can delete files from storage while
others wait before writing to file.
The following example uses [SPMD launch mode](../../bodo_parallelism/bodo_parallelism_basics.md#spmd):
The following example uses [SPMD launch mode][spmd]:

```py
```py
import shutil, os
import numpy as np

Expand Down
8 changes: 4 additions & 4 deletions docs/docs/api_docs/bodo_parallel_apis/random_shuffle.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
# bodo.random_shuffle

`bodo.random_shuffle(data, seed=None, dests=None, parallel=False)`
Manually shuffle data evenly across selected ranks.

### Arguments

- ``data``: data to shuffle.
- ``seed``: number to initialze random number generator.
- ``dests``: selected ranks to distribute shuffled data to. By default, distribution includes all ranks.
- ``parallel``: flag to indicate whether data is distributed. Default: `False`. Inside JIT default value depends on Bodo's distribution analysis algorithm for the data passed (For more information, see Data Distribution section below).

### Example Usage

Note that this example uses [SPMD launch mode](../../bodo_parallelism/bodo_parallelism_basics.md#spmd).
Note that this example uses [SPMD launch mode][spmd].

```py
import bodo
import pandas as pd
Expand Down
30 changes: 15 additions & 15 deletions docs/docs/api_docs/bodo_parallel_apis/rebalance.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,41 @@
# bodo.rebalance

`bodo.rebalance(data, dests=None, random=False, random_seed=None, parallel=False)`
`bodo.rebalance(data, dests=None, random=False, random_seed=None, parallel=False)`
Manually redistribute data evenly across [selected] ranks.

### Arguments

- ``data``: data to rebalance.
- ``dests``: selected ranks to distribute data to. By default, distribution includes all ranks.
- ``random``: flag to randomize order of the rows of the data. Default: `False`.
- ``random_seed``: number to initialize random number generator.
- ``parallel``: flag to indicate whether data is distributed. Default: `False`. Inside JIT default value depends on Bodo's distribution analysis algorithm for the data passed (For more information, see Data Distribution section below).

### Example Usage
### Example Usage

- Example with just the `parallel` flag set to `True`:

```py
import bodo
import pandas as pd

@bodo.jit
def mean_power():
df = pd.read_parquet("data/cycling_dataset.pq")
df = df.sort_values("power")[df["power"] > 400]
print(df.shape)
df = bodo.rebalance(df, parallel=True)
print("After rebalance: ", df.shape)

mean_power()
```

Save code in ``test_rebalance.py`` file and run with 4 processes.

```shell
BODO_NUM_WORKERS=4 python test_rebalance.py
```

```console
[stdout:0]
(5, 10)
Expand All @@ -54,33 +54,33 @@ Manually redistribute data evenly across [selected] ranks.
- Example to distribute the data from all ranks to subset of ranks using ``dests`` argument.

!!! note
The following example uses [SPMD launch mode](../../bodo_parallelism/bodo_parallelism_basics.md#spmd).
The following example uses [SPMD launch mode][spmd].


```py

import bodo
import pandas as pd

@bodo.jit(spawn=False)
def mean_power():
df = pd.read_parquet("data/cycling_dataset.pq")
df = df.sort_values("power")[df["power"] > 400]
return df

df = mean_power()
print(df.shape)
df = bodo.rebalance(df, dests=[1,3], parallel=True)
print("After rebalance: ", df.shape)
```
Save code in ``test_rebalance.py`` file and run with 4 processes.

```shell
mpiexec -n 4 python test_rebalance.py
```

Output:

```console
[stdout:0]
(5, 10)
Expand Down
44 changes: 22 additions & 22 deletions docs/docs/api_docs/bodo_parallel_apis/scatterv.md
Original file line number Diff line number Diff line change
@@ -1,71 +1,71 @@
# bodo.scatterv

`bodo.scatterv(data, warn_if_dist=True)`
<br>
<br>
Distribute data manually by *scattering* data from one process to all processes.

### Arguments

- ``data``: data to distribute.
- ``warn_if_dist``: flag to print a BodoWarning if ``data`` is already distributed.

!!! note
!!! note
Currently, `bodo.scatterv` only supports scattering from rank 0.

!!! note
The following examples use [SPMD launch mode](../../bodo_parallelism/bodo_parallelism_basics.md#spmd).
The following examples use [SPMD launch mode][spmd].

### Example Usage

- When used outside of JIT code, we recommend that the argument be set to ``None`` for all ranks except rank 0.
- When used outside of JIT code, we recommend that the argument be set to ``None`` for all ranks except rank 0.
For example:
```py

```py
import bodo
import pandas as pd


@bodo.jit(spawn=False, distributed=["df"])
def mean_power(df):
x = df.power.mean()
return x

df = None
# only rank 0 reads the data
if bodo.get_rank() == 0:
df = pd.read_parquet("data/cycling_dataset.pq")

df = bodo.scatterv(df)
res = mean_power(df)
print(res)
```

Save the code in ``test_scatterv.py`` file and run with `mpiexec`.

```shell
mpiexec -n 4 python test_scatterv.py
```

Output:
Output:

```console
[stdout:0] 102.07842132239877
[stdout:1] 102.07842132239877
[stdout:2] 102.07842132239877
[stdout:3] 102.07842132239877
```

!!! note
`data/cycling_dataset.pq` is located in the Bodo tutorial
[repo](https://github.com/bodo-ai/Bodo-tutorial).

- This is not a strict requirement. However, since this might be bad practice in certain situations,
- This is not a strict requirement. However, since this might be bad practice in certain situations,
Bodo will throw a warning if the data is not None on other ranks.

```py
import bodo
import pandas as pd

df = pd.read_parquet("data/cycling_dataset.pq")
df = bodo.scatterv(df)
res = mean_power(df)
Expand All @@ -82,7 +82,7 @@ Distribute data manually by *scattering* data from one process to all processes.

```console
BodoWarning: bodo.scatterv(): A non-None value for 'data' was found on a rank other than the root. This data won't be sent to any other ranks and will be overwritten with data from rank 0.

[stdout:0] 102.07842132239877
[stdout:1] 102.07842132239877
[stdout:2] 102.07842132239877
Expand All @@ -92,10 +92,10 @@ Distribute data manually by *scattering* data from one process to all processes.
- When using ``scatterv`` inside of JIT code, the argument must have the same type on each rank due to Bodo's typing constraints.
All inputs except for rank 0 are ignored.

```py
```py
import bodo
import pandas as pd

@bodo.jit(spawn=False)
def impl():
if bodo.get_rank() == 0:
Expand All @@ -113,7 +113,7 @@ Distribute data manually by *scattering* data from one process to all processes.
```

Output:

```console
[stdout:6]
A
Expand Down
41 changes: 0 additions & 41 deletions docs/docs/dataframe_library/index.md

This file was deleted.

Loading