Skip to content

Commit 0e0c460

Browse files
authored
Prepare 2.2.0-beta.1 release (Part 1) (#337)
### Features added - Chat completion now supports audio input and output! - To configure a chat completion to request audio output using the `gpt-4o-audio-preview` model, use `ChatResponseModalities.Text | ChatResponseModalities.Audio` as the value for `ChatCompletionOptions.ResponseModalities` and create a `ChatAudioOptions` instance for `ChatCompletionOptions.AudioOptions`. - Input chat audio is provided to `UserChatMessage` instances using `ChatContentPart.CreateInputAudioPart()` - Output chat audio is provided on the `OutputAudio` property of `ChatCompletion` - References to prior assistant audio are provided via `OutputAudioReference` instances on the `AudioReference` property of `AssistantChatMessage`; `AssistantChatMessage(chatCompletion)` will automatically handle this, too - For more information, see the example in the README - Predicted output can be used with chat completion: the new `OutputPrediction` property on `ChatCompletionOptions` can be populated with `ChatMessageContentPart` instances via `ChatOutputPrediction.CreateStaticContentPrediction()` to substantially accelerate some varieties of requests. - For `o3-mini`, `o1`, and later models with reasoning capabilities: - The new `DeveloperChatMessage`, which replaces `SystemChatMessage`, can be used to provide instructions to the model - `ChatCompletionOptions` can specify a `ReasoningEffortLevel` property to adjust the level of token consumption the model will attempt to apply ### `[Experimental]` Breaking changes - The `IDictionary<string, string> Metadata` property in several request options types in the Assistants and RealtimeConversation areas have had their setters removed, aligning them with other request use of collections. The dictionaries remain writeable and use both initializer syntax and range copies to produce the same effect.
1 parent 4cd8529 commit 0e0c460

File tree

271 files changed

+7717
-2200
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

271 files changed

+7717
-2200
lines changed

CHANGELOG.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,24 @@
11
# Release History
22

3+
## 2.2.0-beta.1 (Unreleased)
4+
5+
### Features added
6+
7+
- Chat completion now supports audio input and output!
8+
- To configure a chat completion to request audio output using the `gpt-4o-audio-preview` model, use `ChatResponseModalities.Text | ChatResponseModalities.Audio` as the value for `ChatCompletionOptions.ResponseModalities` and create a `ChatAudioOptions` instance for `ChatCompletionOptions.AudioOptions`.
9+
- Input chat audio is provided to `UserChatMessage` instances using `ChatContentPart.CreateInputAudioPart()`
10+
- Output chat audio is provided on the `OutputAudio` property of `ChatCompletion`
11+
- References to prior assistant audio are provided via `OutputAudioReference` instances on the `AudioReference` property of `AssistantChatMessage`; `AssistantChatMessage(chatCompletion)` will automatically handle this, too
12+
- For more information, see the example in the README
13+
- Predicted output can be used with chat completion: the new `OutputPrediction` property on `ChatCompletionOptions` can be populated with `ChatMessageContentPart` instances via `ChatOutputPrediction.CreateStaticContentPrediction()` to substantially accelerate some varieties of requests.
14+
- For `o3-mini`, `o1`, and later models with reasoning capabilities:
15+
- The new `DeveloperChatMessage`, which replaces `SystemChatMessage`, can be used to provide instructions to the model
16+
- `ChatCompletionOptions` can specify a `ReasoningEffortLevel` property to adjust the level of token consumption the model will attempt to apply
17+
18+
### `[Experimental]` Breaking changes
19+
20+
- The `IDictionary<string, string> Metadata` property in several request options types in the Assistants and RealtimeConversation areas have had their setters removed, aligning them with other request use of collections. The dictionaries remain writeable and use both initializer syntax and range copies to produce the same effect.
21+
322
## 2.1.0 (2024-12-04)
423

524
### Features added

README.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ It is generated from our [OpenAPI specification](https://github.com/openai/opena
1818
- [How to use chat completions with streaming](#how-to-use-chat-completions-with-streaming)
1919
- [How to use chat completions with tools and function calling](#how-to-use-chat-completions-with-tools-and-function-calling)
2020
- [How to use chat completions with structured outputs](#how-to-use-chat-completions-with-structured-outputs)
21+
- [How to use chat completions with audio](#how-to-use-chat-completions-with-audio)
2122
- [How to generate text embeddings](#how-to-generate-text-embeddings)
2223
- [How to generate images](#how-to-generate-images)
2324
- [How to transcribe audio](#how-to-transcribe-audio)
@@ -354,6 +355,75 @@ foreach (JsonElement stepElement in structuredJson.RootElement.GetProperty("step
354355
}
355356
```
356357

358+
## How to use chat completions with audio
359+
360+
Starting with the `gpt-4o-audio-preview` model, chat completions can process audio input and output.
361+
362+
This example demonstrates:
363+
1. Configuring the client with the supported `gpt-4o-audio-preview` model
364+
1. Supplying user audio input on a chat completion request
365+
1. Requesting model audio output from the chat completion operation
366+
1. Retrieving audio output from a `ChatCompletion` instance
367+
1. Using past audio output as `ChatMessage` conversation history
368+
369+
```csharp
370+
// Chat audio input and output is only supported on specific models, beginning with gpt-4o-audio-preview
371+
ChatClient client = new("gpt-4o-audio-preview", Environment.GetEnvironmentVariable("OPENAI_API_KEY"));
372+
373+
// Input audio is provided to a request by adding an audio content part to a user message
374+
string audioFilePath = Path.Combine("Assets", "realtime_whats_the_weather_pcm16_24khz_mono.wav");
375+
byte[] audioFileRawBytes = File.ReadAllBytes(audioFilePath);
376+
BinaryData audioData = BinaryData.FromBytes(audioFileRawBytes);
377+
List<ChatMessage> messages =
378+
[
379+
new UserChatMessage(ChatMessageContentPart.CreateInputAudioPart(audioData, ChatInputAudioFormat.Wav)),
380+
];
381+
382+
// Output audio is requested by configuring ChatCompletionOptions to include the appropriate
383+
// ResponseModalities values and corresponding AudioOptions.
384+
ChatCompletionOptions options = new()
385+
{
386+
ResponseModalities = ChatResponseModalities.Text | ChatResponseModalities.Audio,
387+
AudioOptions = new(ChatOutputAudioVoice.Alloy, ChatOutputAudioFormat.Mp3),
388+
};
389+
390+
ChatCompletion completion = client.CompleteChat(messages, options);
391+
392+
void PrintAudioContent()
393+
{
394+
if (completion.OutputAudio is ChatOutputAudio outputAudio)
395+
{
396+
Console.WriteLine($"Response audio transcript: {outputAudio.Transcript}");
397+
string outputFilePath = $"{outputAudio.Id}.mp3";
398+
using (FileStream outputFileStream = File.OpenWrite(outputFilePath))
399+
{
400+
outputFileStream.Write(outputAudio.AudioBytes);
401+
}
402+
Console.WriteLine($"Response audio written to file: {outputFilePath}");
403+
Console.WriteLine($"Valid on followup requests until: {outputAudio.ExpiresAt}");
404+
}
405+
}
406+
407+
PrintAudioContent();
408+
409+
// To refer to past audio output, create an assistant message from the earlier ChatCompletion, use the earlier
410+
// response content part, or use ChatMessageContentPart.CreateAudioPart(string) to manually instantiate a part.
411+
412+
messages.Add(new AssistantChatMessage(completion));
413+
messages.Add("Can you say that like a pirate?");
414+
415+
completion = client.CompleteChat(messages, options);
416+
417+
PrintAudioContent();
418+
```
419+
420+
Streaming is highly parallel: `StreamingChatCompletionUpdate` instances can include a `OutputAudioUpdate` that may
421+
contain any of:
422+
423+
- The `Id` of the streamed audio content, which can be referenced by subsequent `AssistantChatMessage` instances via `ChatAudioReference` once the streaming response is complete; this may appear across multiple `StreamingChatCompletionUpdate` instances but will always be the same value when present
424+
- The `ExpiresAt` value that describes when the `Id` will no longer be valid for use with `ChatAudioReference` in subsequent requests; this typically appears once and only once, in the final `StreamingOutputAudioUpdate`
425+
- Incremental `TranscriptUpdate` and/or `AudioBytesUpdate` values, which can incrementally consumed and, when concatenated, form the complete audio transcript and audio output for the overall response; many of these typically appear
426+
357427
## How to generate text embeddings
358428

359429
In this example, you want to create a trip-planning website that allows customers to write a prompt describing the kind of hotel that they are looking for and then offers hotel recommendations that closely match this description. To achieve this, it is possible to use text embeddings to measure the relatedness of text strings. In summary, you can get embeddings of the hotel descriptions, store them in a vector database, and use them to build a search index that you can query using the embedding of a given customer's prompt.

0 commit comments

Comments
 (0)