Deprecate download_custom (#6093)

mariosasko · albertvillanova · commit 7641e9b6ea1f · 2023-10-24T09:09:27.000+02:00
* Deprecate `download_custom`

* Better msg
diff --git a/docs/source/about_dataset_load.mdx b/docs/source/about_dataset_load.mdx
@@ -86,12 +86,6 @@ There are three main methods in [`DatasetBuilder`]:
 
    Once the files are downloaded, [`SplitGenerator`] organizes them into splits. The [`SplitGenerator`] contains the name of the split, and any keyword arguments that are provided to the [`DatasetBuilder._generate_examples`] method. The keyword arguments can be specific to each split, and typically comprise at least the local path to the data files for each split.
 
-<Tip>
-
-[`DownloadManager.download_and_extract`] can download files from a wide range of sources. If the data files are hosted on a special access server, you should use [`DownloadManger.download_custom`]. Refer to the reference of [`DownloadManager`] for more details.
-
-</Tip>
-
 3. [`DatasetBuilder._generate_examples`] reads and parses the data files for a split. Then it yields dataset examples according to the format specified in the `features` from [`DatasetBuilder._info`]. The input of [`DatasetBuilder._generate_examples`] is actually the `filepath` provided in the keyword arguments of the last method.
 
    The dataset is generated with a Python generator, which doesn't load all the data in memory. As a result, the generator can handle large datasets. However, before the generated samples are flushed to the dataset file on disk, they are stored in an `ArrowWriter` buffer. This means the generated samples are written by batch. If your dataset samples consumes a lot of memory (images or videos), then make sure to specify a low value for the `DEFAULT_WRITER_BATCH_SIZE` attribute in [`DatasetBuilder`]. We recommend not exceeding a size of 200 MB.
diff --git a/src/datasets/download/download_manager.py b/src/datasets/download/download_manager.py
@@ -28,7 +28,7 @@
 from typing import Callable, Dict, Generator, Iterable, List, Optional, Tuple, Union
 
 from .. import config
-from ..utils.deprecation_utils import DeprecatedEnum
+from ..utils.deprecation_utils import DeprecatedEnum, deprecated
 from ..utils.file_utils import cached_path, get_from_cache, hash_url_to_filename, is_relative_path, url_or_path_join
 from ..utils.info_utils import get_size_checksum_dict
 from ..utils.logging import get_logger, is_progress_bar_enabled, tqdm
@@ -349,6 +349,7 @@ def _record_sizes_checksums(self, url_or_urls: NestedDataStructure, downloaded_p
                 path, record_checksum=self.record_checksums
             )
 
+    @deprecated("Use `.download`/`.download_and_extract` with `fsspec` URLs instead.")
     def download_custom(self, url_or_urls, custom_download):
         """
         Download given urls(s) by calling `custom_download`.