GOAL: Develop a Self hosted Voice Chat application with a self hosted AI in real time. Everything needs to be self hosted, and in real time.You will be interacting the LLM using voice instead of typings.
- Self host Ollama on windows/WSL with any model. This model will be used for interaction. It may be switched to different model later.
- Test it.
- Start Ollama with env host to 0.0.0.0 so that request coming from WSL 2 will be accepted.
- serve ollama with the updated env.
- Find windows IP from WSL 2. We will need to use this ip to interact with ollama.
- Test if we can communicate with ollama api from WSL 2 cli. (try to hit any ollam api from cli).
- Start a nestjs project on WSL 2.
- Create a module Ollama.
- Create controller / services for it.
- Connect with ollama running on windows 10. (NOTE: the endpoint needs to be of the windows machine, not
localhost:11434.) - Create an endpoint POST
/ollama/chatwith body{ prompt: 'Howdy!! }. - Pass the prompt to ollama.
- Stream ollama response back to client instead of waiting for complete response.
- Test it
- Setup PulseAudio on windows to allow audio pass-through from windows to WSL 2.
- Install Sox / arecode in WSL 2 to receive audio from pulseAudio.
- Test it.
- Create a new module VoiceChat. All the voice chat related code (audio recording / streaming / processing / STT etc ) will be done here.
- Create an endpoint GET
/voice/chat. - Integrated npm package Mic here to capture audio, and stream it to a file.
- Test it.
- Integrate real-time Speech to Text. (can be within same application or host a sperate server for it.)
- Using cloud service (Not a option since it has to be self hosted)
- Using Pre-existing solution that convert audio to text in real time. Host them locally on a server.