Transcription Errors When Using Nova-3 #1503
-
|
Hi, We have a voice agent service built in Python, where we use Deepgram’s Speech-to-Text WebSocket API for real-time speech transcription. During live sessions, the user’s speech is transcribed via Flux. After the session ends, we make an HTTP request with the session’s recording to get the transcription of the whole session where we utilize the Nova-3 model. Also note that in all our recordings, we have 2 channels: the user and the AI agent. The agent channel includes a synthetic voice generated by a Text-to-Speech model, and the user channel is organic. Our question is related to the second step where we try to get the transcription of the whole recording. In the sample whose request ID is given below, around 0:10, the user says “Hello”, but it is not transcribed by Nova-3. Request ID: ca1e7450-09b3-404b-91aa-4213f45e84ab We would appreciate your help and insights into why this might be happening and how we can improve such cases. Thanks in advance. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
|
Thanks for asking your question. Please be sure to reply with as much detail as possible so the community can assist you efficiently. |
Beta Was this translation helpful? Give feedback.
-
|
Hey there! It looks like you haven't connected your GitHub account to your Deepgram account. You can do this at https://community.deepgram.com - being verified through this process will allow our team to help you in a much more streamlined fashion. |
Beta Was this translation helpful? Give feedback.
-
|
@afurkankjf this is something that Deepgram's product team is actively trying to improve. We have a weakness in what we consider "short utterances", like a single "hello". I'm honestly impressed that your bot responded to it at all. There isn't much that you can do to improve Deepgram's transcription of "Hello." We have a generative component to our models which become more accurate as we get more and more context. So, the more words that the user says the better. On your side, you may have the bot prompt the user with something like "Hi, my name is {bot's name} and I'm here to help. Could you provide me your name and what I can do for you today?" This may discourage a generic "hello" in the future. But, again, this is something that Deepgram is working on! |
Beta Was this translation helpful? Give feedback.
@afurkankjf this is something that Deepgram's product team is actively trying to improve. We have a weakness in what we consider "short utterances", like a single "hello". I'm honestly impressed that your bot responded to it at all.
There isn't much that you can do to improve Deepgram's transcription of "Hello." We have a generative component to our models which become more accurate as we get more and more context. So, the more words that the user says the better. On your side, you may have the bot prompt the user with something like "Hi, my name is {bot's name} and I'm here to help. Could you provide me your name and what I can do for you today?" This may discourage a generic "hello" in …