diff --git a/docs/source/de/guides/inference.md b/docs/source/de/guides/inference.md
index 6bd3e111dd..21e06cf070 100644
--- a/docs/source/de/guides/inference.md
+++ b/docs/source/de/guides/inference.md
@@ -8,7 +8,6 @@ Inferenz ist der Prozess, bei dem ein trainiertes Modell verwendet wird, um Vorh
 - [Inferenz API](https://huggingface.co/docs/api-inference/index): ein Service, der Ihnen ermöglicht, beschleunigte Inferenz auf der Infrastruktur von Hugging Face kostenlos auszuführen. Dieser Service ist eine schnelle Möglichkeit, um anzufangen, verschiedene Modelle zu testen und AI-Produkte zu prototypisieren.
 - [Inferenz Endpunkte](https://huggingface.co/inference-endpoints/index): ein Produkt zur einfachen Bereitstellung von Modellen im Produktivbetrieb. Die Inferenz wird von Hugging Face in einer dedizierten, vollständig verwalteten Infrastruktur auf einem Cloud-Anbieter Ihrer Wahl durchgeführt.
 
-Diese Dienste können mit dem [`InferenceClient`] Objekt aufgerufen werden. Dieser fungiert als Ersatz für den älteren [`InferenceApi`] Client und fügt spezielle Unterstützung für Aufgaben und das Ausführen von Inferenz hinzu, sowohl auf [Inferenz API](https://huggingface.co/docs/api-inference/index) als auch auf [Inferenz Endpunkten](https://huggingface.co/docs/inference-endpoints/index). Im Abschnitt [Legacy InferenceAPI client](#legacy-inferenceapi-client) erfahren Sie, wie Sie zum neuen Client migrieren können.
 
 <Tip>
 
@@ -89,34 +88,34 @@ Die Authentifizierung ist NICHT zwingend erforderlich, wenn Sie die Inferenz API
 
 Das Ziel von [`InferenceClient`] ist es, die einfachste Schnittstelle zum Ausführen von Inferenzen auf Hugging Face-Modellen bereitzustellen. Es verfügt über eine einfache API, die die gebräuchlichsten Aufgaben unterstützt. Hier ist eine Liste der derzeit unterstützten Aufgaben:
 
-| Domäne | Aufgabe                           | Unterstützt   | Dokumentation                             |
-|--------|--------------------------------|--------------|------------------------------------|
-| Audio | [Audio Classification](https://huggingface.co/tasks/audio-classification)           | ✅ | [`~InferenceClient.audio_classification`] |
-| | [Automatic Speech Recognition](https://huggingface.co/tasks/automatic-speech-recognition)   | ✅ | [`~InferenceClient.automatic_speech_recognition`] |
-| | [Text-to-Speech](https://huggingface.co/tasks/text-to-speech)                 | ✅ | [`~InferenceClient.text_to_speech`] |
-| Computer Vision | [Image Classification](https://huggingface.co/tasks/image-classification)           | ✅ | [`~InferenceClient.image_classification`] |
-| | [Image Segmentation](https://huggingface.co/tasks/image-segmentation)             | ✅ | [`~InferenceClient.image_segmentation`] |
-| | [Image-to-Image](https://huggingface.co/tasks/image-to-image)                 | ✅ | [`~InferenceClient.image_to_image`] |
-| | [Image-to-Text](https://huggingface.co/tasks/image-to-text)                  | ✅ | [`~InferenceClient.image_to_text`] |
-| | [Object Detection](https://huggingface.co/tasks/object-detection)            | ✅ | [`~InferenceClient.object_detection`] |
-| | [Text-to-Image](https://huggingface.co/tasks/text-to-image)                  | ✅ | [`~InferenceClient.text_to_image`] |
-| | [Zero-Shot-Image-Classification](https://huggingface.co/tasks/zero-shot-image-classification)                  | ✅ | [`~InferenceClient.zero_shot_image_classification`] |
-| Multimodal | [Documentation Question Answering](https://huggingface.co/tasks/document-question-answering) | ✅ | [`~InferenceClient.document_question_answering`] |
-| | [Visual Question Answering](https://huggingface.co/tasks/visual-question-answering)      | ✅ | [`~InferenceClient.visual_question_answering`] |
-| NLP | [Conversational](https://huggingface.co/tasks/conversational)                 | ✅ | [`~InferenceClient.conversational`] |
-| | [Feature Extraction](https://huggingface.co/tasks/feature-extraction)             | ✅ | [`~InferenceClient.feature_extraction`] |
-| | [Fill Mask](https://huggingface.co/tasks/fill-mask)                      | ✅ | [`~InferenceClient.fill_mask`] |
-| | [Question Answering](https://huggingface.co/tasks/question-answering)             | ✅ | [`~InferenceClient.question_answering`] |
-| | [Sentence Similarity](https://huggingface.co/tasks/sentence-similarity) | ✅ | [`~InferenceClient.sentence_similarity`] |
-| | [Summarization](https://huggingface.co/tasks/summarization)                  | ✅ | [`~InferenceClient.summarization`] |
-| | [Table Question Answering](https://huggingface.co/tasks/table-question-answering)       | ✅ | [`~InferenceClient.table_question_answering`] |
-| | [Text Classification](https://huggingface.co/tasks/text-classification)            | ✅ | [`~InferenceClient.text_classification`] |
-| | [Text Generation](https://huggingface.co/tasks/text-generation)   | ✅ | [`~InferenceClient.text_generation`] |
-| | [Token Classification](https://huggingface.co/tasks/token-classification)           | ✅ | [`~InferenceClient.token_classification`] |
-| | [Translation](https://huggingface.co/tasks/translation)       | ✅ | [`~InferenceClient.translation`] |
-| | [Zero Shot Classification](https://huggingface.co/tasks/zero-shot-classification)       | ✅ | [`~InferenceClient.zero_shot_classification`] |
-| Tabular | [Tabular Classification](https://huggingface.co/tasks/tabular-classification)         | ✅ | [`~InferenceClient.tabular_classification`] |
-| | [Tabular Regression](https://huggingface.co/tasks/tabular-regression)             | ✅ | [`~InferenceClient.tabular_regression`] |
+| Domäne          | Aufgabe                                                                                       | Unterstützt | Dokumentation                                       |
+| --------------- | --------------------------------------------------------------------------------------------- | ----------- | --------------------------------------------------- |
+| Audio           | [Audio Classification](https://huggingface.co/tasks/audio-classification)                     | ✅           | [`~InferenceClient.audio_classification`]           |
+|                 | [Automatic Speech Recognition](https://huggingface.co/tasks/automatic-speech-recognition)     | ✅           | [`~InferenceClient.automatic_speech_recognition`]   |
+|                 | [Text-to-Speech](https://huggingface.co/tasks/text-to-speech)                                 | ✅           | [`~InferenceClient.text_to_speech`]                 |
+| Computer Vision | [Image Classification](https://huggingface.co/tasks/image-classification)                     | ✅           | [`~InferenceClient.image_classification`]           |
+|                 | [Image Segmentation](https://huggingface.co/tasks/image-segmentation)                         | ✅           | [`~InferenceClient.image_segmentation`]             |
+|                 | [Image-to-Image](https://huggingface.co/tasks/image-to-image)                                 | ✅           | [`~InferenceClient.image_to_image`]                 |
+|                 | [Image-to-Text](https://huggingface.co/tasks/image-to-text)                                   | ✅           | [`~InferenceClient.image_to_text`]                  |
+|                 | [Object Detection](https://huggingface.co/tasks/object-detection)                             | ✅           | [`~InferenceClient.object_detection`]               |
+|                 | [Text-to-Image](https://huggingface.co/tasks/text-to-image)                                   | ✅           | [`~InferenceClient.text_to_image`]                  |
+|                 | [Zero-Shot-Image-Classification](https://huggingface.co/tasks/zero-shot-image-classification) | ✅           | [`~InferenceClient.zero_shot_image_classification`] |
+| Multimodal      | [Documentation Question Answering](https://huggingface.co/tasks/document-question-answering)  | ✅           | [`~InferenceClient.document_question_answering`]    |
+|                 | [Visual Question Answering](https://huggingface.co/tasks/visual-question-answering)           | ✅           | [`~InferenceClient.visual_question_answering`]      |
+| NLP             | [Conversational](https://huggingface.co/tasks/conversational)                                 | ✅           | [`~InferenceClient.conversational`]                 |
+|                 | [Feature Extraction](https://huggingface.co/tasks/feature-extraction)                         | ✅           | [`~InferenceClient.feature_extraction`]             |
+|                 | [Fill Mask](https://huggingface.co/tasks/fill-mask)                                           | ✅           | [`~InferenceClient.fill_mask`]                      |
+|                 | [Question Answering](https://huggingface.co/tasks/question-answering)                         | ✅           | [`~InferenceClient.question_answering`]             |
+|                 | [Sentence Similarity](https://huggingface.co/tasks/sentence-similarity)                       | ✅           | [`~InferenceClient.sentence_similarity`]            |
+|                 | [Summarization](https://huggingface.co/tasks/summarization)                                   | ✅           | [`~InferenceClient.summarization`]                  |
+|                 | [Table Question Answering](https://huggingface.co/tasks/table-question-answering)             | ✅           | [`~InferenceClient.table_question_answering`]       |
+|                 | [Text Classification](https://huggingface.co/tasks/text-classification)                       | ✅           | [`~InferenceClient.text_classification`]            |
+|                 | [Text Generation](https://huggingface.co/tasks/text-generation)                               | ✅           | [`~InferenceClient.text_generation`]                |
+|                 | [Token Classification](https://huggingface.co/tasks/token-classification)                     | ✅           | [`~InferenceClient.token_classification`]           |
+|                 | [Translation](https://huggingface.co/tasks/translation)                                       | ✅           | [`~InferenceClient.translation`]                    |
+|                 | [Zero Shot Classification](https://huggingface.co/tasks/zero-shot-classification)             | ✅           | [`~InferenceClient.zero_shot_classification`]       |
+| Tabular         | [Tabular Classification](https://huggingface.co/tasks/tabular-classification)                 | ✅           | [`~InferenceClient.tabular_classification`]         |
+|                 | [Tabular Regression](https://huggingface.co/tasks/tabular-regression)                         | ✅           | [`~InferenceClient.tabular_regression`]             |
 
 
 <Tip>
@@ -190,93 +189,3 @@ Einige Aufgaben erfordern binäre Eingaben, zum Beispiel bei der Arbeit mit Bild
 [{'score': 0.9779096841812134, 'label': 'Blenheim spaniel'}, ...]
 ```
 
-## Legacy InferenceAPI client
-
-Der [`InferenceClient`] dient als Ersatz für den veralteten [`InferenceApi`]-Client. Er bietet spezifische Unterstützung für Aufgaben und behandelt Inferenz sowohl auf der [Inferenz API](https://huggingface.co/docs/api-inference/index) als auch auf den [Inferenz Endpunkten](https://huggingface.co/docs/inference-endpoints/index).
-
-Hier finden Sie eine kurze Anleitung, die Ihnen hilft, von [`InferenceApi`] zu [`InferenceClient`] zu migrieren.
-
-### Initialisierung
-
-Ändern Sie von
-
-```python
->>> from huggingface_hub import InferenceApi
->>> inference = InferenceApi(repo_id="bert-base-uncased", token=API_TOKEN)
-```
-
-zu
-
-```python
->>> from huggingface_hub import InferenceClient
->>> inference = InferenceClient(model="bert-base-uncased", token=API_TOKEN)
-```
-
-### Ausführen einer bestimmten Aufgabe
-
-Ändern Sie von
-
-```python
->>> from huggingface_hub import InferenceApi
->>> inference = InferenceApi(repo_id="paraphrase-xlm-r-multilingual-v1", task="feature-extraction")
->>> inference(...)
-```
-
-zu
-
-```python
->>> from huggingface_hub import InferenceClient
->>> inference = InferenceClient()
->>> inference.feature_extraction(..., model="paraphrase-xlm-r-multilingual-v1")
-```
-
-<Tip>
-
-Dies ist der empfohlene Weg, um Ihren Code an [`InferenceClient`] anzupassen. Dadurch können Sie von den aufgabenspezifischen Methoden wie `feature_extraction` profitieren.
-
-</Tip>
-
-### Eigene Anfragen ausführen
-
-Ändern Sie von
-
-```python
->>> from huggingface_hub import InferenceApi
->>> inference = InferenceApi(repo_id="bert-base-uncased")
->>> inference(inputs="The goal of life is [MASK].")
-[{'sequence': 'the goal of life is life.', 'score': 0.10933292657136917, 'token': 2166, 'token_str': 'life'}]
-```
-zu
-
-```python
->>> from huggingface_hub import InferenceClient
->>> client = InferenceClient()
->>> response = client.post(json={"inputs": "The goal of life is [MASK]."}, model="bert-base-uncased")
->>> response.json()
-[{'sequence': 'the goal of life is life.', 'score': 0.10933292657136917, 'token': 2166, 'token_str': 'life'}]
-```
-
-### Mit Parametern ausführen
-
-Ändern Sie von
-
-```python
->>> from huggingface_hub import InferenceApi
->>> inference = InferenceApi(repo_id="typeform/distilbert-base-uncased-mnli")
->>> inputs = "Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!"
->>> params = {"candidate_labels":["refund", "legal", "faq"]}
->>> inference(inputs, params)
-{'sequence': 'Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!', 'labels': ['refund', 'faq', 'legal'], 'scores': [0.9378499388694763, 0.04914155602455139, 0.013008488342165947]}
-```
-
-zu
-
-```python
->>> from huggingface_hub import InferenceClient
->>> client = InferenceClient()
->>> inputs = "Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!"
->>> params = {"candidate_labels":["refund", "legal", "faq"]}
->>> response = client.post(json={"inputs": inputs, "parameters": params}, model="typeform/distilbert-base-uncased-mnli")
->>> response.json()
-{'sequence': 'Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!', 'labels': ['refund', 'faq', 'legal'], 'scores': [0.9378499388694763, 0.04914155602455139, 0.013008488342165947]}
-```
diff --git a/docs/source/en/guides/inference.md b/docs/source/en/guides/inference.md
index 23cbfdd8c5..a5e31e833d 100644
--- a/docs/source/en/guides/inference.md
+++ b/docs/source/en/guides/inference.md
@@ -11,10 +11,6 @@ The `huggingface_hub` library provides a unified interface to run inference acro
 2.  [Inference Endpoints](https://huggingface.co/docs/inference-endpoints/index): a product to easily deploy models to production. Inference is run by Hugging Face in a dedicated, fully managed infrastructure on a cloud provider of your choice.
 3.  Local endpoints: you can also run inference with local inference servers like [llama.cpp](https://github.com/ggerganov/llama.cpp), [Ollama](https://ollama.com/), [vLLM](https://github.com/vllm-project/vllm), [LiteLLM](https://docs.litellm.ai/docs/simple_proxy), or [Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference) by connecting the client to these local endpoints.
 
-These services can all be called from the [`InferenceClient`] object. It acts as a replacement for the legacy
-[`InferenceApi`] client, adding specific support for tasks and third-party providers.
-Learn how to migrate to the new client in the [Legacy InferenceAPI client](#legacy-inferenceapi-client) section.
-
 <Tip>
 
 [`InferenceClient`] is a Python client making HTTP calls to our APIs. If you want to make the HTTP calls directly using
diff --git a/docs/source/en/package_reference/inference_client.md b/docs/source/en/package_reference/inference_client.md
index eae0edc755..1a92641077 100644
--- a/docs/source/en/package_reference/inference_client.md
+++ b/docs/source/en/package_reference/inference_client.md
@@ -34,16 +34,3 @@ pip install --upgrade huggingface_hub[inference]
 ## InferenceTimeoutError
 
 [[autodoc]] InferenceTimeoutError
-
-## InferenceAPI
-
-[`InferenceAPI`] is the legacy way to call the Inference API. The interface is more simplistic and requires knowing
-the input parameters and output format for each task. It also lacks the ability to connect to other services like
-Inference Endpoints or AWS SageMaker. [`InferenceAPI`] will soon be deprecated so we recommend using [`InferenceClient`]
-whenever possible. Check out [this guide](../guides/inference#legacy-inferenceapi-client) to learn how to switch from
-[`InferenceAPI`] to [`InferenceClient`] in your scripts.
-
-[[autodoc]] InferenceApi
-    - __init__
-    - __call__
-    - all
diff --git a/docs/source/ko/guides/inference.md b/docs/source/ko/guides/inference.md
index f3ddb3e795..a6f9e5f0d1 100644
--- a/docs/source/ko/guides/inference.md
+++ b/docs/source/ko/guides/inference.md
@@ -8,7 +8,6 @@ rendered properly in your Markdown viewer.
 - [추론 API](https://huggingface.co/docs/api-inference/index): Hugging Face의 인프라에서 가속화된 추론을 실행할 수 있는 서비스로 무료로 제공됩니다. 이 서비스는 추론을 시작하고 다양한 모델을 테스트하며 AI 제품의 프로토타입을 만드는 빠른 방법입니다.
 - [추론 엔드포인트](https://huggingface.co/docs/inference-endpoints/index): 모델을 제품 환경에 쉽게 배포할 수 있는 제품입니다. 사용자가 선택한 클라우드 환경에서 완전 관리되는 전용 인프라에서 Hugging Face를 통해 추론이 실행됩니다.
 
-이러한 서비스들은 [`InferenceClient`] 객체를 사용하여 호출할 수 있습니다. 이는 이전의 [`InferenceApi`] 클라이언트를 대체하는 역할을 하며, 작업에 대한 특별한 지원을 추가하고 [추론 API](https://huggingface.co/docs/api-inference/index) 및 [추론 엔드포인트](https://huggingface.co/docs/inference-endpoints/index)에서 추론 작업을 처리합니다. 새 클라이언트로의 마이그레이션에 대한 자세한 내용은 [레거시 InferenceAPI 클라이언트](#legacy-inferenceapi-client) 섹션을 참조하세요.
 
 <Tip>
 
@@ -89,35 +88,35 @@ Hugging Face Hub에는 20만 개가 넘는 모델이 있습니다! [`InferenceCl
 
 [`InferenceClient`]의 목표는 Hugging Face 모델에서 추론을 실행하기 위한 가장 쉬운 인터페이스를 제공하는 것입니다. 이는 가장 일반적인 작업들을 지원하는 간단한 API를 가지고 있습니다. 현재 지원되는 작업 목록은 다음과 같습니다:
 
-| 도메인 | 작업                           | 지원 여부    | 문서                             |
-|--------|--------------------------------|--------------|------------------------------------|
-| 오디오 | [오디오 분류](https://huggingface.co/tasks/audio-classification)           | ✅ | [`~InferenceClient.audio_classification`] |
-| 오디오 | [오디오 투 오디오](https://huggingface.co/tasks/audio-to-audio)           | ✅ | [`~InferenceClient.audio_to_audio`] |
-| | [자동 음성 인식](https://huggingface.co/tasks/automatic-speech-recognition)   | ✅ | [`~InferenceClient.automatic_speech_recognition`] |
-| | [텍스트 투 스피치](https://huggingface.co/tasks/text-to-speech)                 | ✅ | [`~InferenceClient.text_to_speech`] |
-| 컴퓨터 비전 | [이미지 분류](https://huggingface.co/tasks/image-classification)           | ✅ | [`~InferenceClient.image_classification`] |
-| | [이미지 분할](https://huggingface.co/tasks/image-segmentation)             | ✅ | [`~InferenceClient.image_segmentation`] |
-| | [이미지 투 이미지](https://huggingface.co/tasks/image-to-image)                 | ✅ | [`~InferenceClient.image_to_image`] |
-| | [이미지 투 텍스트](https://huggingface.co/tasks/image-to-text)                  | ✅ | [`~InferenceClient.image_to_text`] |
-| | [객체 탐지](https://huggingface.co/tasks/object-detection)            | ✅ | [`~InferenceClient.object_detection`] |
-| | [텍스트 투 이미지](https://huggingface.co/tasks/text-to-image)                  | ✅ | [`~InferenceClient.text_to_image`] |
-| | [제로샷 이미지 분류](https://huggingface.co/tasks/zero-shot-image-classification)                  | ✅ | [`~InferenceClient.zero_shot_image_classification`] |
-| 멀티모달 | [문서 질의 응답](https://huggingface.co/tasks/document-question-answering) | ✅ | [`~InferenceClient.document_question_answering`] |
-| | [시각적 질의 응답](https://huggingface.co/tasks/visual-question-answering)      | ✅ | [`~InferenceClient.visual_question_answering`] |
-| 자연어 처리 | [대화형](https://huggingface.co/tasks/conversational)                 | ✅ | [`~InferenceClient.conversational`] |
-| | [특성 추출](https://huggingface.co/tasks/feature-extraction)             | ✅ | [`~InferenceClient.feature_extraction`] |
-| | [마스크 채우기](https://huggingface.co/tasks/fill-mask)                      | ✅ | [`~InferenceClient.fill_mask`] |
-| | [질의 응답](https://huggingface.co/tasks/question-answering)             | ✅ | [`~InferenceClient.question_answering`] |
-| | [문장 유사도](https://huggingface.co/tasks/sentence-similarity)            | ✅ | [`~InferenceClient.sentence_similarity`] |
-| | [요약](https://huggingface.co/tasks/summarization)                  | ✅ | [`~InferenceClient.summarization`] |
-| | [테이블 질의 응답](https://huggingface.co/tasks/table-question-answering)       | ✅ | [`~InferenceClient.table_question_answering`] |
-| | [텍스트 분류](https://huggingface.co/tasks/text-classification)            | ✅ | [`~InferenceClient.text_classification`] |
-| | [텍스트 생성](https://huggingface.co/tasks/text-generation)   | ✅ | [`~InferenceClient.text_generation`] |
-| | [토큰 분류](https://huggingface.co/tasks/token-classification)           | ✅ | [`~InferenceClient.token_classification`] |
-| | [번역](https://huggingface.co/tasks/translation)       | ✅ | [`~InferenceClient.translation`] |
-| | [제로샷 분류](https://huggingface.co/tasks/zero-shot-classification)       | ✅ | [`~InferenceClient.zero_shot_classification`] |
-| 타블로 | [타블로 작업 분류](https://huggingface.co/tasks/tabular-classification)         | ✅ | [`~InferenceClient.tabular_classification`] |
-| | [타블로 회귀](https://huggingface.co/tasks/tabular-regression)             | ✅ | [`~InferenceClient.tabular_regression`] |
+| 도메인      | 작업                                                                              | 지원 여부 | 문서                                                |
+| ----------- | --------------------------------------------------------------------------------- | --------- | --------------------------------------------------- |
+| 오디오      | [오디오 분류](https://huggingface.co/tasks/audio-classification)                  | ✅         | [`~InferenceClient.audio_classification`]           |
+| 오디오      | [오디오 투 오디오](https://huggingface.co/tasks/audio-to-audio)                   | ✅         | [`~InferenceClient.audio_to_audio`]                 |
+|             | [자동 음성 인식](https://huggingface.co/tasks/automatic-speech-recognition)       | ✅         | [`~InferenceClient.automatic_speech_recognition`]   |
+|             | [텍스트 투 스피치](https://huggingface.co/tasks/text-to-speech)                   | ✅         | [`~InferenceClient.text_to_speech`]                 |
+| 컴퓨터 비전 | [이미지 분류](https://huggingface.co/tasks/image-classification)                  | ✅         | [`~InferenceClient.image_classification`]           |
+|             | [이미지 분할](https://huggingface.co/tasks/image-segmentation)                    | ✅         | [`~InferenceClient.image_segmentation`]             |
+|             | [이미지 투 이미지](https://huggingface.co/tasks/image-to-image)                   | ✅         | [`~InferenceClient.image_to_image`]                 |
+|             | [이미지 투 텍스트](https://huggingface.co/tasks/image-to-text)                    | ✅         | [`~InferenceClient.image_to_text`]                  |
+|             | [객체 탐지](https://huggingface.co/tasks/object-detection)                        | ✅         | [`~InferenceClient.object_detection`]               |
+|             | [텍스트 투 이미지](https://huggingface.co/tasks/text-to-image)                    | ✅         | [`~InferenceClient.text_to_image`]                  |
+|             | [제로샷 이미지 분류](https://huggingface.co/tasks/zero-shot-image-classification) | ✅         | [`~InferenceClient.zero_shot_image_classification`] |
+| 멀티모달    | [문서 질의 응답](https://huggingface.co/tasks/document-question-answering)        | ✅         | [`~InferenceClient.document_question_answering`]    |
+|             | [시각적 질의 응답](https://huggingface.co/tasks/visual-question-answering)        | ✅         | [`~InferenceClient.visual_question_answering`]      |
+| 자연어 처리 | [대화형](https://huggingface.co/tasks/conversational)                             | ✅         | [`~InferenceClient.conversational`]                 |
+|             | [특성 추출](https://huggingface.co/tasks/feature-extraction)                      | ✅         | [`~InferenceClient.feature_extraction`]             |
+|             | [마스크 채우기](https://huggingface.co/tasks/fill-mask)                           | ✅         | [`~InferenceClient.fill_mask`]                      |
+|             | [질의 응답](https://huggingface.co/tasks/question-answering)                      | ✅         | [`~InferenceClient.question_answering`]             |
+|             | [문장 유사도](https://huggingface.co/tasks/sentence-similarity)                   | ✅         | [`~InferenceClient.sentence_similarity`]            |
+|             | [요약](https://huggingface.co/tasks/summarization)                                | ✅         | [`~InferenceClient.summarization`]                  |
+|             | [테이블 질의 응답](https://huggingface.co/tasks/table-question-answering)         | ✅         | [`~InferenceClient.table_question_answering`]       |
+|             | [텍스트 분류](https://huggingface.co/tasks/text-classification)                   | ✅         | [`~InferenceClient.text_classification`]            |
+|             | [텍스트 생성](https://huggingface.co/tasks/text-generation)                       | ✅         | [`~InferenceClient.text_generation`]                |
+|             | [토큰 분류](https://huggingface.co/tasks/token-classification)                    | ✅         | [`~InferenceClient.token_classification`]           |
+|             | [번역](https://huggingface.co/tasks/translation)                                  | ✅         | [`~InferenceClient.translation`]                    |
+|             | [제로샷 분류](https://huggingface.co/tasks/zero-shot-classification)              | ✅         | [`~InferenceClient.zero_shot_classification`]       |
+| 타블로      | [타블로 작업 분류](https://huggingface.co/tasks/tabular-classification)           | ✅         | [`~InferenceClient.tabular_classification`]         |
+|             | [타블로 회귀](https://huggingface.co/tasks/tabular-regression)                    | ✅         | [`~InferenceClient.tabular_regression`]             |
 
 <Tip>
 
@@ -190,73 +189,3 @@ pip install --upgrade huggingface_hub[inference]
 >>> client.image_classification("https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg")
 [{'score': 0.9779096841812134, 'label': 'Blenheim spaniel'}, ...]
 ```
-
-## 레거시 InferenceAPI 클라이언트[[legacy-inferenceapi-client]]
-
-[`InferenceClient`]는 레거시 [`InferenceApi`] 클라이언트를 대체하여 작동합니다. 특정 작업에 대한 지원을 제공하고 [추론 API](https://huggingface.co/docs/api-inference/index) 및 [추론 엔드포인트](https://huggingface.co/docs/inference-endpoints/index)에서 추론을 처리합니다.
-
-아래는 [`InferenceApi`]에서 [`InferenceClient`]로 마이그레이션하는 데 도움이 되는 간단한 가이드입니다.
-
-### 초기화[[initialization]]
-
-변경 전:
-
-```python
->>> from huggingface_hub import InferenceApi
->>> inference = InferenceApi(repo_id="bert-base-uncased", token=API_TOKEN)
-```
-
-변경 후:
-
-```python
->>> from huggingface_hub import InferenceClient
->>> inference = InferenceClient(model="bert-base-uncased", token=API_TOKEN)
-```
-
-### 특정 작업에서 실행하기[[run-on-a-specific-task]]
-
-변경 전:
-
-```python
->>> from huggingface_hub import InferenceApi
->>> inference = InferenceApi(repo_id="paraphrase-xlm-r-multilingual-v1", task="feature-extraction")
->>> inference(...)
-```
-
-변경 후:
-
-```python
->>> from huggingface_hub import InferenceClient
->>> inference = InferenceClient()
->>> inference.feature_extraction(..., model="paraphrase-xlm-r-multilingual-v1")
-```
-
-<Tip>
-
-위의 방법은 코드를 [`InferenceClient`]에 맞게 조정하는 권장 방법입니다. 이렇게 하면 `feature_extraction`과 같이 작업에 특화된 메소드를 활용할 수 있습니다.
-
-</Tip>
-
-### 사용자 정의 요청 실행[[run-custom-request]]
-
-변경 전:
-
-```python
->>> from huggingface_hub import InferenceApi
->>> inference = InferenceApi(repo_id="bert-base-uncased")
->>> inference(inputs="The goal of life is [MASK].")
-[{'sequence': 'the goal of life is life.', 'score': 0.10933292657136917, 'token': 2166, 'token_str': 'life'}]
-```
-
-### 매개변수와 함께 실행하기[[run-with-parameters]]
-
-변경 전:
-
-```python
->>> from huggingface_hub import InferenceApi
->>> inference = InferenceApi(repo_id="typeform/distilbert-base-uncased-mnli")
->>> inputs = "Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!"
->>> params = {"candidate_labels":["refund", "legal", "faq"]}
->>> inference(inputs, params)
-{'sequence': 'Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!', 'labels': ['refund', 'faq', 'legal'], 'scores': [0.9378499388694763, 0.04914155602455139, 0.013008488342165947]}
-```
diff --git a/docs/source/ko/package_reference/inference_client.md b/docs/source/ko/package_reference/inference_client.md
index 686c9282a9..0930a75351 100644
--- a/docs/source/ko/package_reference/inference_client.md
+++ b/docs/source/ko/package_reference/inference_client.md
@@ -35,13 +35,3 @@ pip install --upgrade huggingface_hub[inference]
 ## 반환 유형[[return-types]]
 
 대부분의 작업에 대해, 반환 값은 내장된 유형(string, list, image...)을 갖습니다. 보다 복잡한 유형을 위한 목록은 다음과 같습니다.
-
-
-## 추론 API[[huggingface_hub.InferenceApi]]
-
-[`InferenceAPI`]는 추론 API를 호출하는 레거시 방식입니다. 이 인터페이스는 더 간단하며 각 작업의 입력 매개변수와 출력 형식을 알아야 합니다. 또한 추론 엔드포인트나 AWS SageMaker와 같은 다른 서비스에 연결할 수 있는 기능이 없습니다. [`InferenceAPI`]는 곧 폐지될 예정이므로 가능한 경우 [`InferenceClient`]를 사용하는 것을 권장합니다. 스크립트에서 [`InferenceAPI`]를 [`InferenceClient`]로 전환하는 방법에 대해 알아보려면 [이 가이드](../guides/inference#legacy-inferenceapi-client)를 참조하세요.
-
-[[autodoc]] InferenceApi
-    - __init__
-    - __call__
-    - all
diff --git a/setup.py b/setup.py
index ec5ebfbb39..9a755682a6 100644
--- a/setup.py
+++ b/setup.py
@@ -1,3 +1,5 @@
+import sys
+
 from setuptools import find_packages, setup
 
 
@@ -77,7 +79,7 @@ def get_version() -> str:
     + [
         "jedi",
         "Jinja2",
-        "pytest>=8.1.1,<8.2.2",  # at least until 8.2.3 is released with https://github.com/pytest-dev/pytest/pull/12436
+        "pytest>=8.4.2",  # we need https://github.com/pytest-dev/pytest/pull/12436
         "pytest-cov",
         "pytest-env",
         "pytest-xdist",
@@ -88,13 +90,18 @@ def get_version() -> str:
         "urllib3<2.0",  # VCR.py broken with urllib3 2.0 (see https://urllib3.readthedocs.io/en/stable/v2-migration-guide.html)
         "soundfile",
         "Pillow",
-        "gradio>=4.0.0",  # to test webhooks # pin to avoid issue on Python3.12
         "requests",  # for gradio
         "numpy",  # for embeddings
         "fastapi",  # To build the documentation
     ]
 )
 
+if sys.version_info >= (3, 10):
+    # We need gradio to test webhooks server
+    # But gradio 5.0+ only supports python 3.10+ so we don't want to test earlier versions
+    extras["testing"].append("gradio>=5.0.0")
+    extras["testing"].append("requests")  # see https://github.com/gradio-app/gradio/pull/11830
+
 # Typing extra dependencies list is duplicated in `.pre-commit-config.yaml`
 # Please make sure to update the list there when adding a new typing dependency.
 extras["typing"] = [
diff --git a/src/huggingface_hub/README.md b/src/huggingface_hub/README.md
index cd5c1e2beb..b0e5cd65d9 100644
--- a/src/huggingface_hub/README.md
+++ b/src/huggingface_hub/README.md
@@ -112,242 +112,3 @@ With the `HfApi` class there are methods to query models, datasets, and Spaces b
   - `space_info()`
 
 These lightly wrap around the API Endpoints. Documentation for valid parameters and descriptions can be found [here](https://huggingface.co/docs/hub/endpoints).
-
-
-### Advanced programmatic repository management
-
-The `Repository` class helps manage both offline Git repositories and Hugging
-Face Hub repositories. Using the `Repository` class requires `git` and `git-lfs`
-to be installed.
-
-Instantiate a `Repository` object by calling it with a path to a local Git
-clone/repository:
-
-```python
->>> from huggingface_hub import Repository
->>> repo = Repository("<path>/<to>/<folder>")
-```
-
-The `Repository` takes a `clone_from` string as parameter. This can stay as
-`None` for offline management, but can also be set to any URL pointing to a Git
-repo to clone that repository in the specified directory:
-
-```python
->>> repo = Repository("huggingface-hub", clone_from="https://github.com/huggingface/huggingface_hub")
-```
-
-The `clone_from` method can also take any Hugging Face model ID as input, and
-will clone that repository:
-
-```python
->>> repo = Repository("w2v2", clone_from="facebook/wav2vec2-large-960h-lv60")
-```
-
-If the repository you're cloning is one of yours or one of your organisation's, then having the ability to commit and push to that repository is important. In order to do that, you should make sure to be logged-in using `hf auth login`,:
-
-```python
->>> repo = Repository("my-model", clone_from="<user>/<model_id>")
-```
-
-This works for models, datasets and spaces repositories; but you will need to
-explicitely specify the type for the last two options:
-
-```python
->>> repo = Repository("my-dataset", clone_from="<user>/<dataset_id>", repo_type="dataset")
-```
-
-You can also change between branches:
-
-```python
->>> repo = Repository("huggingface-hub", clone_from="<user>/<dataset_id>", revision='branch1')
->>> repo.git_checkout("branch2")
-```
-
-The `clone_from` method can also take any Hugging Face model ID as input, and
-will clone that repository:
-
-```python
->>> repo = Repository("w2v2", clone_from="facebook/wav2vec2-large-960h-lv60")
-```
-
-Finally, you can choose to specify the Git username and email attributed to that
-clone directly by using the `git_user` and `git_email` parameters. When
-committing to that repository, Git will therefore be aware of who you are and
-who will be the author of the commits:
-
-```python
->>> repo = Repository(
-...   "my-dataset",
-...   clone_from="<user>/<dataset_id>",
-...   repo_type="dataset",
-...   git_user="MyName",
-...   git_email="me@cool.mail"
-... )
-```
-
-The repository can be managed through this object, through wrappers of
-traditional Git methods:
-
-- `git_add(pattern: str, auto_lfs_track: bool)`. The `auto_lfs_track` flag
-  triggers auto tracking of large files (>10MB) with `git-lfs`
-- `git_commit(commit_message: str)`
-- `git_pull(rebase: bool)`
-- `git_push()`
-- `git_checkout(branch)`
-
-The `git_push` method has a parameter `blocking` which is `True` by default. When set to `False`, the push will
-happen behind the scenes - which can be helpful if you would like your script to continue on while the push is
-happening.
-
-LFS-tracking methods:
-
-- `lfs_track(pattern: Union[str, List[str]], filename: bool)`. Setting
-  `filename` to `True` will use the `--filename` parameter, which will consider
-  the pattern(s) as filenames, even if they contain special glob characters.
-- `lfs_untrack()`.
-- `auto_track_large_files()`: automatically tracks files that are larger than
-  10MB. Make sure to call this after adding files to the index.
-
-On top of these unitary methods lie some useful additional methods:
-
-- `push_to_hub(commit_message)`: consecutively does `git_add`, `git_commit` and
-  `git_push`.
-- `commit(commit_message: str, track_large_files: bool)`: this is a context
-  manager utility that handles committing to a repository. This automatically
-  tracks large files (>10Mb) with `git-lfs`. The `track_large_files` argument can
-  be set to `False` if you wish to ignore that behavior.
-
-These two methods also have support for the `blocking` parameter.
-
-Examples using the `commit` context manager:
-```python
->>> with Repository("text-files", clone_from="<user>/text-files").commit("My first file :)"):
-...     with open("file.txt", "w+") as f:
-...         f.write(json.dumps({"hey": 8}))
-```
-
-```python
->>> import torch
->>> model = torch.nn.Transformer()
->>> with Repository("torch-model", clone_from="<user>/torch-model").commit("My cool model :)"):
-...     torch.save(model.state_dict(), "model.pt")
-  ```
-
-### Non-blocking behavior
-
-The pushing methods have access to a `blocking` boolean parameter to indicate whether the push should happen
-asynchronously.
-
-In order to see if the push has finished or its status code (to spot a failure), one should use the `command_queue`
-property on the `Repository` object.
-
-For example:
-
-```python
-from huggingface_hub import Repository
-
-repo = Repository("<local_folder>", clone_from="<user>/<model_name>")
-
-with repo.commit("Commit message", blocking=False):
-    # Save data
-
-last_command = repo.command_queue[-1]
-
-# Status of the push command
-last_command.status
-# Will return the status code
-#     -> -1 will indicate the push is still ongoing
-#     -> 0 will indicate the push has completed successfully
-#     -> non-zero code indicates the error code if there was an error
-
-# if there was an error, the stderr may be inspected
-last_command.stderr
-
-# Whether the command finished or if it is still ongoing
-last_command.is_done
-
-# Whether the command errored-out.
-last_command.failed
-```
-
-When using `blocking=False`, the commands will be tracked and your script will exit only when all pushes are done, even
-if other errors happen in your script (a failed push counts as done).
-
-
-### Need to upload very large (>5GB) files?
-
-To upload large files (>5GB 🔥) from git command-line, you need to install the custom transfer agent
-for git-lfs, bundled in this package.
-
-To install, just run:
-
-```bash
-$ hf lfs-enable-largefiles .
-```
-
-This should be executed once for each model repo that contains a model file
->5GB. If you just try to push a file bigger than 5GB without running that
-command, you will get an error with a message reminding you to run it.
-
-Finally, there's a `hf lfs-multipart-upload` command but that one
-is internal (called by lfs directly) and is not meant to be called by the user.
-
-<br>
-
-## Using the Inference API wrapper
-
-`huggingface_hub` comes with a wrapper client to make calls to the Inference
-API! You can find some examples below, but we encourage you to visit the
-Inference API
-[documentation](https://api-inference.huggingface.co/docs/python/html/detailed_parameters.html)
-to review the specific parameters for the different tasks.
-
-When you instantiate the wrapper to the Inference API, you specify the model
-repository id. The pipeline (`text-classification`,  `text-to-speech`, etc) is
-automatically extracted from the
-[repository](https://huggingface.co/docs/hub/main#how-is-a-models-type-of-inference-api-and-widget-determined),
-but you can also override it as shown below.
-
-
-### Examples
-
-Here is a basic example of calling the Inference API for a `fill-mask` task
-using the `bert-base-uncased` model. The `fill-mask` task only expects a string
-(or list of strings) as input.
-
-```python
-from huggingface_hub.inference_api import InferenceApi
-inference = InferenceApi("bert-base-uncased", token=API_TOKEN)
-inference(inputs="The goal of life is [MASK].")
->> [{'sequence': 'the goal of life is life.', 'score': 0.10933292657136917, 'token': 2166, 'token_str': 'life'}]
-```
-
-This is an example of a task (`question-answering`) which requires a dictionary
-as input thas has the `question` and `context` keys.
-
-```python
-inference = InferenceApi("deepset/roberta-base-squad2", token=API_TOKEN)
-inputs = {"question":"What's my name?", "context":"My name is Clara and I live in Berkeley."}
-inference(inputs)
->> {'score': 0.9326569437980652, 'start': 11, 'end': 16, 'answer': 'Clara'}
-```
-
-Some tasks might also require additional params in the request. Here is an
-example using a `zero-shot-classification` model.
-
-```python
-inference = InferenceApi("typeform/distilbert-base-uncased-mnli", token=API_TOKEN)
-inputs = "Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!"
-params = {"candidate_labels":["refund", "legal", "faq"]}
-inference(inputs, params)
->> {'sequence': 'Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!', 'labels': ['refund', 'faq', 'legal'], 'scores': [0.9378499388694763, 0.04914155602455139, 0.013008488342165947]}
-```
-
-Finally, there are some models that might support multiple tasks. For example,
-`sentence-transformers` models can do `sentence-similarity` and
-`feature-extraction`. You can override the configured task when initializing the
-API.
-
-```python
-inference = InferenceApi("bert-base-uncased", task="feature-extraction", token=API_TOKEN)
-```
diff --git a/src/huggingface_hub/__init__.py b/src/huggingface_hub/__init__.py
index c1d2c6658f..472b0020f5 100644
--- a/src/huggingface_hub/__init__.py
+++ b/src/huggingface_hub/__init__.py
@@ -471,9 +471,6 @@
     "inference._mcp.mcp_client": [
         "MCPClient",
     ],
-    "inference_api": [
-        "InferenceApi",
-    ],
     "keras_mixin": [
         "KerasModelHubMixin",
         "from_pretrained_keras",
@@ -529,7 +526,6 @@
         "CorruptedCacheException",
         "DeleteCacheStrategy",
         "HFCacheInfo",
-        "HfFolder",
         "HfHubAsyncTransport",
         "HfHubTransport",
         "cached_assets_path",
@@ -662,7 +658,6 @@
     "HfFileSystemFile",
     "HfFileSystemResolvedPath",
     "HfFileSystemStreamFile",
-    "HfFolder",
     "HfHubAsyncTransport",
     "HfHubTransport",
     "ImageClassificationInput",
@@ -686,7 +681,6 @@
     "ImageToVideoOutput",
     "ImageToVideoParameters",
     "ImageToVideoTargetSize",
-    "InferenceApi",
     "InferenceClient",
     "InferenceEndpoint",
     "InferenceEndpointError",
@@ -1501,7 +1495,6 @@ def __dir__():
     )
     from .inference._mcp.agent import Agent  # noqa: F401
     from .inference._mcp.mcp_client import MCPClient  # noqa: F401
-    from .inference_api import InferenceApi  # noqa: F401
     from .keras_mixin import (
         KerasModelHubMixin,  # noqa: F401
         from_pretrained_keras,  # noqa: F401
@@ -1555,7 +1548,6 @@ def __dir__():
         CorruptedCacheException,  # noqa: F401
         DeleteCacheStrategy,  # noqa: F401
         HFCacheInfo,  # noqa: F401
-        HfFolder,  # noqa: F401
         HfHubAsyncTransport,  # noqa: F401
         HfHubTransport,  # noqa: F401
         cached_assets_path,  # noqa: F401
diff --git a/src/huggingface_hub/constants.py b/src/huggingface_hub/constants.py
index c1445ffc9d..20c5b5d970 100644
--- a/src/huggingface_hub/constants.py
+++ b/src/huggingface_hub/constants.py
@@ -234,43 +234,6 @@ def _as_int(value: Optional[str]) -> Optional[int]:
 # Allows to add information about the requester in the user-agent (eg. partner name)
 HF_HUB_USER_AGENT_ORIGIN: Optional[str] = os.environ.get("HF_HUB_USER_AGENT_ORIGIN")
 
-# List frameworks that are handled by the InferenceAPI service. Useful to scan endpoints and check which models are
-# deployed and running. Since 95% of the models are using the top 4 frameworks listed below, we scan only those by
-# default. We still keep the full list of supported frameworks in case we want to scan all of them.
-MAIN_INFERENCE_API_FRAMEWORKS = [
-    "diffusers",
-    "sentence-transformers",
-    "text-generation-inference",
-    "transformers",
-]
-
-ALL_INFERENCE_API_FRAMEWORKS = MAIN_INFERENCE_API_FRAMEWORKS + [
-    "adapter-transformers",
-    "allennlp",
-    "asteroid",
-    "bertopic",
-    "doctr",
-    "espnet",
-    "fairseq",
-    "fastai",
-    "fasttext",
-    "flair",
-    "k2",
-    "keras",
-    "mindspore",
-    "nemo",
-    "open_clip",
-    "paddlenlp",
-    "peft",
-    "pyannote-audio",
-    "sklearn",
-    "spacy",
-    "span-marker",
-    "speechbrain",
-    "stanza",
-    "timm",
-]
-
 # If OAuth didn't work after 2 redirects, there's likely a third-party cookie issue in the Space iframe view.
 # In this case, we redirect the user to the non-iframe view.
 OAUTH_MAX_REDIRECTS = 2
diff --git a/src/huggingface_hub/hf_api.py b/src/huggingface_hub/hf_api.py
index 951a19dcc0..6e209d9055 100644
--- a/src/huggingface_hub/hf_api.py
+++ b/src/huggingface_hub/hf_api.py
@@ -106,7 +106,6 @@
 from .repocard_data import DatasetCardData, ModelCardData, SpaceCardData
 from .utils import (
     DEFAULT_IGNORE_PATTERNS,
-    HfFolder,  # noqa: F401 # kept for backward compatibility
     LocalTokenNotFoundError,
     NotASafetensorsRepoError,
     SafetensorsFileMetadata,
diff --git a/src/huggingface_hub/inference_api.py b/src/huggingface_hub/inference_api.py
deleted file mode 100644
index 16c2812864..0000000000
--- a/src/huggingface_hub/inference_api.py
+++ /dev/null
@@ -1,217 +0,0 @@
-import io
-from typing import Any, Optional, Union
-
-from . import constants
-from .hf_api import HfApi
-from .utils import build_hf_headers, get_session, is_pillow_available, logging, validate_hf_hub_args
-from .utils._deprecation import _deprecate_method
-
-
-logger = logging.get_logger(__name__)
-
-
-ALL_TASKS = [
-    # NLP
-    "text-classification",
-    "token-classification",
-    "table-question-answering",
-    "question-answering",
-    "zero-shot-classification",
-    "translation",
-    "summarization",
-    "conversational",
-    "feature-extraction",
-    "text-generation",
-    "text2text-generation",
-    "fill-mask",
-    "sentence-similarity",
-    # Audio
-    "text-to-speech",
-    "automatic-speech-recognition",
-    "audio-to-audio",
-    "audio-classification",
-    "voice-activity-detection",
-    # Computer vision
-    "image-classification",
-    "object-detection",
-    "image-segmentation",
-    "text-to-image",
-    "image-to-image",
-    # Others
-    "tabular-classification",
-    "tabular-regression",
-]
-
-
-class InferenceApi:
-    """Client to configure httpx and make calls to the HuggingFace Inference API.
-
-    Example:
-
-    ```python
-    >>> from huggingface_hub.inference_api import InferenceApi
-
-    >>> # Mask-fill example
-    >>> inference = InferenceApi("bert-base-uncased")
-    >>> inference(inputs="The goal of life is [MASK].")
-    [{'sequence': 'the goal of life is life.', 'score': 0.10933292657136917, 'token': 2166, 'token_str': 'life'}]
-
-    >>> # Question Answering example
-    >>> inference = InferenceApi("deepset/roberta-base-squad2")
-    >>> inputs = {
-    ...     "question": "What's my name?",
-    ...     "context": "My name is Clara and I live in Berkeley.",
-    ... }
-    >>> inference(inputs)
-    {'score': 0.9326569437980652, 'start': 11, 'end': 16, 'answer': 'Clara'}
-
-    >>> # Zero-shot example
-    >>> inference = InferenceApi("typeform/distilbert-base-uncased-mnli")
-    >>> inputs = "Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!"
-    >>> params = {"candidate_labels": ["refund", "legal", "faq"]}
-    >>> inference(inputs, params)
-    {'sequence': 'Hi, I recently bought a device from your company but it is not working as advertised and I would like to get reimbursed!', 'labels': ['refund', 'faq', 'legal'], 'scores': [0.9378499388694763, 0.04914155602455139, 0.013008488342165947]}
-
-    >>> # Overriding configured task
-    >>> inference = InferenceApi("bert-base-uncased", task="feature-extraction")
-
-    >>> # Text-to-image
-    >>> inference = InferenceApi("stabilityai/stable-diffusion-2-1")
-    >>> inference("cat")
-    <PIL.PngImagePlugin.PngImageFile image (...)>
-
-    >>> # Return as raw response to parse the output yourself
-    >>> inference = InferenceApi("mio/amadeus")
-    >>> response = inference("hello world", raw_response=True)
-    >>> response.headers
-    {"Content-Type": "audio/flac", ...}
-    >>> response.content # raw bytes from server
-    b'(...)'
-    ```
-    """
-
-    @validate_hf_hub_args
-    @_deprecate_method(
-        version="1.0",
-        message=(
-            "`InferenceApi` client is deprecated in favor of the more feature-complete `InferenceClient`. Check out"
-            " this guide to learn how to convert your script to use it:"
-            " https://huggingface.co/docs/huggingface_hub/guides/inference#legacy-inferenceapi-client."
-        ),
-    )
-    def __init__(
-        self,
-        repo_id: str,
-        task: Optional[str] = None,
-        token: Optional[str] = None,
-        gpu: bool = False,
-    ):
-        """Inits headers and API call information.
-
-        Args:
-            repo_id (``str``):
-                Id of repository (e.g. `user/bert-base-uncased`).
-            task (``str``, `optional`, defaults ``None``):
-                Whether to force a task instead of using task specified in the
-                repository.
-            token (`str`, `optional`):
-                The API token to use as HTTP bearer authorization. This is not
-                the authentication token. You can find the token in
-                https://huggingface.co/settings/token. Alternatively, you can
-                find both your organizations and personal API tokens using
-                `HfApi().whoami(token)`.
-            gpu (`bool`, `optional`, defaults `False`):
-                Whether to use GPU instead of CPU for inference(requires Startup
-                plan at least).
-        """
-        self.options = {"wait_for_model": True, "use_gpu": gpu}
-        self.headers = build_hf_headers(token=token)
-
-        # Configure task
-        model_info = HfApi(token=token).model_info(repo_id=repo_id)
-        if not model_info.pipeline_tag and not task:
-            raise ValueError(
-                "Task not specified in the repository. Please add it to the model card"
-                " using pipeline_tag"
-                " (https://huggingface.co/docs#how-is-a-models-type-of-inference-api-and-widget-determined)"
-            )
-
-        if task and task != model_info.pipeline_tag:
-            if task not in ALL_TASKS:
-                raise ValueError(f"Invalid task {task}. Make sure it's valid.")
-
-            logger.warning(
-                "You're using a different task than the one specified in the"
-                " repository. Be sure to know what you're doing :)"
-            )
-            self.task = task
-        else:
-            assert model_info.pipeline_tag is not None, "Pipeline tag cannot be None"
-            self.task = model_info.pipeline_tag
-
-        self.api_url = f"{constants.INFERENCE_ENDPOINT}/pipeline/{self.task}/{repo_id}"
-
-    def __repr__(self):
-        # Do not add headers to repr to avoid leaking token.
-        return f"InferenceAPI(api_url='{self.api_url}', task='{self.task}', options={self.options})"
-
-    def __call__(
-        self,
-        inputs: Optional[Union[str, dict, list[str], list[list[str]]]] = None,
-        params: Optional[dict] = None,
-        data: Optional[bytes] = None,
-        raw_response: bool = False,
-    ) -> Any:
-        """Make a call to the Inference API.
-
-        Args:
-            inputs (`str` or `dict` or `list[str]` or `list[list[str]]`, *optional*):
-                Inputs for the prediction.
-            params (`dict`, *optional*):
-                Additional parameters for the models. Will be sent as `parameters` in the
-                payload.
-            data (`bytes`, *optional*):
-                Bytes content of the request. In this case, leave `inputs` and `params` empty.
-            raw_response (`bool`, defaults to `False`):
-                If `True`, the raw `Response` object is returned. You can parse its content
-                as preferred. By default, the content is parsed into a more practical format
-                (json dictionary or PIL Image for example).
-        """
-        # Build payload
-        payload: dict[str, Any] = {
-            "options": self.options,
-        }
-        if inputs:
-            payload["inputs"] = inputs
-        if params:
-            payload["parameters"] = params
-
-        # Make API call
-        response = get_session().post(self.api_url, headers=self.headers, json=payload, content=data)
-
-        # Let the user handle the response
-        if raw_response:
-            return response
-
-        # By default, parse the response for the user.
-        content_type = response.headers.get("Content-Type") or ""
-        if content_type.startswith("image"):
-            if not is_pillow_available():
-                raise ImportError(
-                    f"Task '{self.task}' returned as image but Pillow is not installed."
-                    " Please install it (`pip install Pillow`) or pass"
-                    " `raw_response=True` to get the raw `Response` object and parse"
-                    " the image by yourself."
-                )
-
-            from PIL import Image
-
-            return Image.open(io.BytesIO(response.content))
-        elif content_type == "application/json":
-            return response.json()
-        else:
-            raise NotImplementedError(
-                f"{content_type} output type is not implemented yet. You can pass"
-                " `raw_response=True` to get the raw `Response` object and parse the"
-                " output by yourself."
-            )
diff --git a/src/huggingface_hub/utils/__init__.py b/src/huggingface_hub/utils/__init__.py
index 52838fe000..bf25d66950 100644
--- a/src/huggingface_hub/utils/__init__.py
+++ b/src/huggingface_hub/utils/__init__.py
@@ -50,7 +50,6 @@
 from ._fixes import SoftTemporaryDirectory, WeakFileLock, yaml_dump
 from ._git_credential import list_credential_helpers, set_git_credential, unset_git_credential
 from ._headers import build_hf_headers, get_token_to_send
-from ._hf_folder import HfFolder
 from ._http import (
     ASYNC_CLIENT_FACTORY_T,
     CLIENT_FACTORY_T,
diff --git a/src/huggingface_hub/utils/_hf_folder.py b/src/huggingface_hub/utils/_hf_folder.py
deleted file mode 100644
index 6418bf2fd2..0000000000
--- a/src/huggingface_hub/utils/_hf_folder.py
+++ /dev/null
@@ -1,68 +0,0 @@
-# coding=utf-8
-# Copyright 2022-present, the HuggingFace Inc. team.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""Contain helper class to retrieve/store token from/to local cache."""
-
-from pathlib import Path
-from typing import Optional
-
-from .. import constants
-from ._auth import get_token
-
-
-class HfFolder:
-    # TODO: deprecate when adapted in transformers/datasets/gradio
-    # @_deprecate_method(version="1.0", message="Use `huggingface_hub.login` instead.")
-    @classmethod
-    def save_token(cls, token: str) -> None:
-        """
-        Save token, creating folder as needed.
-
-        Token is saved in the huggingface home folder. You can configure it by setting
-        the `HF_HOME` environment variable.
-
-        Args:
-            token (`str`):
-                The token to save to the [`HfFolder`]
-        """
-        path_token = Path(constants.HF_TOKEN_PATH)
-        path_token.parent.mkdir(parents=True, exist_ok=True)
-        path_token.write_text(token)
-
-    # TODO: deprecate when adapted in transformers/datasets/gradio
-    # @_deprecate_method(version="1.0", message="Use `huggingface_hub.get_token` instead.")
-    @classmethod
-    def get_token(cls) -> Optional[str]:
-        """
-        Get token or None if not existent.
-
-        This method is deprecated in favor of [`huggingface_hub.get_token`] but is kept for backward compatibility.
-        Its behavior is the same as [`huggingface_hub.get_token`].
-
-        Returns:
-            `str` or `None`: The token, `None` if it doesn't exist.
-        """
-        return get_token()
-
-    # TODO: deprecate when adapted in transformers/datasets/gradio
-    # @_deprecate_method(version="1.0", message="Use `huggingface_hub.logout` instead.")
-    @classmethod
-    def delete_token(cls) -> None:
-        """
-        Deletes the token from storage. Does not fail if token does not exist.
-        """
-        try:
-            Path(constants.HF_TOKEN_PATH).unlink()
-        except FileNotFoundError:
-            pass
diff --git a/tests/test_inference_api.py b/tests/test_inference_api.py
deleted file mode 100644
index a057ec4450..0000000000
--- a/tests/test_inference_api.py
+++ /dev/null
@@ -1,140 +0,0 @@
-# Copyright 2020 The HuggingFace Team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-import unittest
-from pathlib import Path
-from unittest.mock import patch
-
-import pytest
-from PIL import Image
-
-from huggingface_hub import hf_hub_download
-from huggingface_hub.inference_api import InferenceApi
-
-from .testing_utils import expect_deprecation, with_production_testing
-
-
-@pytest.mark.vcr
-@with_production_testing
-class InferenceApiTest(unittest.TestCase):
-    def read(self, filename: str) -> bytes:
-        return Path(filename).read_bytes()
-
-    @classmethod
-    @with_production_testing
-    def setUpClass(cls) -> None:
-        cls.image_file = hf_hub_download(repo_id="Narsil/image_dummy", repo_type="dataset", filename="lena.png")
-        return super().setUpClass()
-
-    @expect_deprecation("huggingface_hub.inference_api")
-    def test_simple_inference(self):
-        api = InferenceApi("bert-base-uncased")
-        inputs = "Hi, I think [MASK] is cool"
-        results = api(inputs)
-        self.assertIsInstance(results, list)
-
-        result = results[0]
-        self.assertIsInstance(result, dict)
-        self.assertTrue("sequence" in result)
-        self.assertTrue("score" in result)
-
-    @unittest.skip("Model often not loaded")
-    @expect_deprecation("huggingface_hub.inference_api")
-    def test_inference_with_params(self):
-        api = InferenceApi("typeform/distilbert-base-uncased-mnli")
-        inputs = "I bought a device but it is not working and I would like to get reimbursed!"
-        params = {"candidate_labels": ["refund", "legal", "faq"]}
-        result = api(inputs, params)
-        self.assertIsInstance(result, dict)
-        self.assertTrue("sequence" in result)
-        self.assertTrue("scores" in result)
-
-    @unittest.skip("Model often not loaded")
-    @expect_deprecation("huggingface_hub.inference_api")
-    def test_inference_with_dict_inputs(self):
-        api = InferenceApi("distilbert-base-cased-distilled-squad")
-        inputs = {
-            "question": "What's my name?",
-            "context": "My name is Clara and I live in Berkeley.",
-        }
-        result = api(inputs)
-        self.assertIsInstance(result, dict)
-        self.assertTrue("score" in result)
-        self.assertTrue("answer" in result)
-
-    @unittest.skip("Model often not loaded")
-    @expect_deprecation("huggingface_hub.inference_api")
-    def test_inference_with_audio(self):
-        api = InferenceApi("facebook/wav2vec2-base-960h")
-        file = hf_hub_download(
-            repo_id="hf-internal-testing/dummy-flac-single-example",
-            repo_type="dataset",
-            filename="example.flac",
-        )
-        data = self.read(file)
-        result = api(data=data)
-        self.assertIsInstance(result, dict)
-        self.assertTrue("text" in result, f"We received {result} instead")
-
-    @unittest.skip("Model often not loaded")
-    @expect_deprecation("huggingface_hub.inference_api")
-    def test_inference_with_image(self):
-        api = InferenceApi("google/vit-base-patch16-224")
-        data = self.read(self.image_file)
-        result = api(data=data)
-        self.assertIsInstance(result, list)
-        for classification in result:
-            self.assertIsInstance(classification, dict)
-            self.assertTrue("score" in classification)
-            self.assertTrue("label" in classification)
-
-    @expect_deprecation("huggingface_hub.inference_api")
-    def test_text_to_image(self):
-        api = InferenceApi("stabilityai/stable-diffusion-2-1")
-        with patch("huggingface_hub.inference_api.get_session") as mock:
-            mock().post.return_value.headers = {"Content-Type": "image/jpeg"}
-            mock().post.return_value.content = self.read(self.image_file)
-            output = api("cat")
-        self.assertIsInstance(output, Image.Image)
-
-    @expect_deprecation("huggingface_hub.inference_api")
-    def test_text_to_image_raw_response(self):
-        api = InferenceApi("stabilityai/stable-diffusion-2-1")
-        with patch("huggingface_hub.inference_api.get_session") as mock:
-            mock().post.return_value.headers = {"Content-Type": "image/jpeg"}
-            mock().post.return_value.content = self.read(self.image_file)
-            output = api("cat", raw_response=True)
-        # Raw response is returned
-        self.assertEqual(output, mock().post.return_value)
-
-    @expect_deprecation("huggingface_hub.inference_api")
-    def test_inference_overriding_task(self):
-        api = InferenceApi(
-            "sentence-transformers/paraphrase-albert-small-v2",
-            task="feature-extraction",
-        )
-        inputs = "This is an example again"
-        result = api(inputs)
-        self.assertIsInstance(result, list)
-
-    @expect_deprecation("huggingface_hub.inference_api")
-    def test_inference_overriding_invalid_task(self):
-        with self.assertRaises(ValueError, msg="Invalid task invalid-task. Make sure it's valid."):
-            InferenceApi("bert-base-uncased", task="invalid-task")
-
-    @expect_deprecation("huggingface_hub.inference_api")
-    def test_inference_missing_input(self):
-        api = InferenceApi("deepset/roberta-base-squad2")
-        result = api({"question": "What's my name?"})
-        self.assertIsInstance(result, dict)
-        self.assertTrue("error" in result)
diff --git a/tests/test_utils_headers.py b/tests/test_utils_headers.py
index d6c00874e4..ff61cec932 100644
--- a/tests/test_utils_headers.py
+++ b/tests/test_utils_headers.py
@@ -19,8 +19,6 @@
 NO_AUTH_HEADER = {"user-agent": DEFAULT_USER_AGENT}
 
 
-# @patch("huggingface_hub.utils._headers.HfFolder")
-# @handle_injection
 class TestAuthHeadersUtil(unittest.TestCase):
     def test_use_auth_token_str(self) -> None:
         self.assertEqual(build_hf_headers(use_auth_token=FAKE_TOKEN), FAKE_TOKEN_HEADER)
diff --git a/tests/test_utils_hf_folder.py b/tests/test_utils_hf_folder.py
deleted file mode 100644
index 5857fa4df8..0000000000
--- a/tests/test_utils_hf_folder.py
+++ /dev/null
@@ -1,53 +0,0 @@
-# Copyright 2020 The HuggingFace Team. All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-"""Contain tests for `HfFolder` utility."""
-
-import os
-import unittest
-from uuid import uuid4
-
-from huggingface_hub.utils import HfFolder
-
-
-def _generate_token() -> str:
-    return f"token-{uuid4()}"
-
-
-class HfFolderTest(unittest.TestCase):
-    def test_token_workflow(self):
-        """
-        Test the whole token save/get/delete workflow,
-        with the desired behavior with respect to non-existent tokens.
-        """
-        token = _generate_token()
-        HfFolder.save_token(token)
-        self.assertEqual(HfFolder.get_token(), token)
-        HfFolder.delete_token()
-        HfFolder.delete_token()
-        # ^^ not an error, we test that the
-        # second call does not fail.
-        self.assertEqual(HfFolder.get_token(), None)
-        # test TOKEN in env
-        self.assertEqual(HfFolder.get_token(), None)
-        with unittest.mock.patch.dict(os.environ, {"HF_TOKEN": token}):
-            self.assertEqual(HfFolder.get_token(), token)
-
-    def test_token_strip(self):
-        """
-        Test the workflow when the token is mistakenly finishing with new-line or space character.
-        """
-        token = _generate_token()
-        HfFolder.save_token(" " + token + "\n")
-        self.assertEqual(HfFolder.get_token(), token)
-        HfFolder.delete_token()
diff --git a/tests/test_webhooks_server.py b/tests/test_webhooks_server.py
index 8284b14bea..c8d5c4c2db 100644
--- a/tests/test_webhooks_server.py
+++ b/tests/test_webhooks_server.py
@@ -110,28 +110,28 @@
 }
 
 
-def test_deserialize_payload_example_with_comment() -> None:
-    """Confirm that the test stub can actually be deserialized."""
-    payload = WebhookPayload.model_validate(WEBHOOK_PAYLOAD_CREATE_DISCUSSION)
-    assert payload.event.scope == WEBHOOK_PAYLOAD_CREATE_DISCUSSION["event"]["scope"]
-    assert payload.comment is not None
-    assert payload.comment.content == "Add co2 emissions information to the model card"
-
-
-def test_deserialize_payload_example_without_comment() -> None:
-    """Confirm that the test stub can actually be deserialized."""
-    payload = WebhookPayload.model_validate(WEBHOOK_PAYLOAD_UPDATE_DISCUSSION)
-    assert payload.event.scope == WEBHOOK_PAYLOAD_UPDATE_DISCUSSION["event"]["scope"]
-    assert payload.comment is None
-
-
-def test_deserialize_payload_example_with_updated_refs() -> None:
-    """Confirm that the test stub can actually be deserialized."""
-    payload = WebhookPayload.model_validate(WEBHOOK_PAYLOAD_WITH_UPDATED_REFS)
-    assert payload.updatedRefs is not None
-    assert payload.updatedRefs[0].ref == "refs/pr/5"
-    assert payload.updatedRefs[0].oldSha is None
-    assert payload.updatedRefs[0].newSha == "227c78346870a85e5de4fff8a585db68df975406"
+@requires("gradio")
+class TestWebhookPayload(unittest.TestCase):
+    def test_deserialize_payload_example_with_comment(self) -> None:
+        """Confirm that the test stub can actually be deserialized."""
+        payload = WebhookPayload.model_validate(WEBHOOK_PAYLOAD_CREATE_DISCUSSION)
+        assert payload.event.scope == WEBHOOK_PAYLOAD_CREATE_DISCUSSION["event"]["scope"]
+        assert payload.comment is not None
+        assert payload.comment.content == "Add co2 emissions information to the model card"
+
+    def test_deserialize_payload_example_without_comment(self) -> None:
+        """Confirm that the test stub can actually be deserialized."""
+        payload = WebhookPayload.model_validate(WEBHOOK_PAYLOAD_UPDATE_DISCUSSION)
+        assert payload.event.scope == WEBHOOK_PAYLOAD_UPDATE_DISCUSSION["event"]["scope"]
+        assert payload.comment is None
+
+    def test_deserialize_payload_example_with_updated_refs(self) -> None:
+        """Confirm that the test stub can actually be deserialized."""
+        payload = WebhookPayload.model_validate(WEBHOOK_PAYLOAD_WITH_UPDATED_REFS)
+        assert payload.updatedRefs is not None
+        assert payload.updatedRefs[0].ref == "refs/pr/5"
+        assert payload.updatedRefs[0].oldSha is None
+        assert payload.updatedRefs[0].newSha == "227c78346870a85e5de4fff8a585db68df975406"
 
 
 @requires("gradio")