Skip to content

Commit 7641e9b

Browse files
mariosaskoalbertvillanova
authored andcommitted
Deprecate download_custom (#6093)
* Deprecate `download_custom` * Better msg
1 parent e291b1d commit 7641e9b

File tree

2 files changed

+2
-7
lines changed

2 files changed

+2
-7
lines changed

docs/source/about_dataset_load.mdx

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -86,12 +86,6 @@ There are three main methods in [`DatasetBuilder`]:
8686

8787
Once the files are downloaded, [`SplitGenerator`] organizes them into splits. The [`SplitGenerator`] contains the name of the split, and any keyword arguments that are provided to the [`DatasetBuilder._generate_examples`] method. The keyword arguments can be specific to each split, and typically comprise at least the local path to the data files for each split.
8888

89-
<Tip>
90-
91-
[`DownloadManager.download_and_extract`] can download files from a wide range of sources. If the data files are hosted on a special access server, you should use [`DownloadManger.download_custom`]. Refer to the reference of [`DownloadManager`] for more details.
92-
93-
</Tip>
94-
9589
3. [`DatasetBuilder._generate_examples`] reads and parses the data files for a split. Then it yields dataset examples according to the format specified in the `features` from [`DatasetBuilder._info`]. The input of [`DatasetBuilder._generate_examples`] is actually the `filepath` provided in the keyword arguments of the last method.
9690

9791
The dataset is generated with a Python generator, which doesn't load all the data in memory. As a result, the generator can handle large datasets. However, before the generated samples are flushed to the dataset file on disk, they are stored in an `ArrowWriter` buffer. This means the generated samples are written by batch. If your dataset samples consumes a lot of memory (images or videos), then make sure to specify a low value for the `DEFAULT_WRITER_BATCH_SIZE` attribute in [`DatasetBuilder`]. We recommend not exceeding a size of 200 MB.

src/datasets/download/download_manager.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@
2828
from typing import Callable, Dict, Generator, Iterable, List, Optional, Tuple, Union
2929

3030
from .. import config
31-
from ..utils.deprecation_utils import DeprecatedEnum
31+
from ..utils.deprecation_utils import DeprecatedEnum, deprecated
3232
from ..utils.file_utils import cached_path, get_from_cache, hash_url_to_filename, is_relative_path, url_or_path_join
3333
from ..utils.info_utils import get_size_checksum_dict
3434
from ..utils.logging import get_logger, is_progress_bar_enabled, tqdm
@@ -349,6 +349,7 @@ def _record_sizes_checksums(self, url_or_urls: NestedDataStructure, downloaded_p
349349
path, record_checksum=self.record_checksums
350350
)
351351

352+
@deprecated("Use `.download`/`.download_and_extract` with `fsspec` URLs instead.")
352353
def download_custom(self, url_or_urls, custom_download):
353354
"""
354355
Download given urls(s) by calling `custom_download`.

0 commit comments

Comments
 (0)