Transcription Errors When Using Nova-3 #1501
-
|
Hi, We have a voice agent service built in Python, where we use Deepgram’s Speech-to-Text WebSocket API for real-time speech transcription. During live sessions, the user’s speech is transcribed via Flux. After the session ends, we make an HTTP request with the session’s recording to get the transcription of the whole session where we utilize the Nova-3 model. Also note that in all our recordings, we have 2 channels: the user and the AI agent. The agent channel includes a synthetic voice generated by a Text-to-Speech model, and the user channel is organic. Our question is related to the second step where we try to get the transcription of the whole recording. In the sample whose request ID is given below, the user’s audio is not transcribed at all, and when we listen to it carefully, the user seems to be saying something like “make a for one day session so that they can”. Request ID: 1067c63f-5c68-4909-9607-0ac0445d51d8 It is a relatively low-volume audio but we were wondering if there is anything we can do to improve this. Thanks in advance. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
|
Thanks for asking your question. Please be sure to reply with as much detail as possible so the community can assist you efficiently. |
Beta Was this translation helpful? Give feedback.
-
|
Hey there! It looks like you haven't connected your GitHub account to your Deepgram account. You can do this at https://community.deepgram.com - being verified through this process will allow our team to help you in a much more streamlined fashion. |
Beta Was this translation helpful? Give feedback.
-
|
There is nothing to be done here from Deepgram's side. I personally couldn't hear or understand it which means that our models won't be able to either. In a scenario like this, it will be best to prompt the user to speak louder or to transfer them to a human agent. One thing that you may find you can do, if you have control over the user's audio stream, is to increase the volume of the user. This isn't usually required or advised but this is a special case. |
Beta Was this translation helpful? Give feedback.
There is nothing to be done here from Deepgram's side. I personally couldn't hear or understand it which means that our models won't be able to either. In a scenario like this, it will be best to prompt the user to speak louder or to transfer them to a human agent. One thing that you may find you can do, if you have control over the user's audio stream, is to increase the volume of the user. This isn't usually required or advised but this is a special case.