Voice Based AI Agents
We're about to release voice based AI Agents. This allows you to literally speak to your computer, have the LLM understand your incentives, and respond to you using natural voice. It's based upon our AI Expert System, and combines OpenAI's GPT models with their audio and TTS APIs. To understand what I mean watch the following video.
You can try it here if you wish, at which point you can click the blue microphone button to trigger speech. It's a bit quirky on Safari on iPhones, because it doesn't allow for playing audio without having a user interaction - So if you're testing it from your iPhone you'll need to click the mic icon in the top / right corner of the output once it's available.
How it works
The first thing that happens when you speak, is that your audio is transmitted to OpenAI and transcribed. Then we retrieve RAG data from our database based upon the transcript and associate with your request as "context". OpenAI returns text back to our Magic cloudlet, which sometimes might include "AI function invocations". Such function invocations can be rapidly created using our low-code and no-code platform, and can do any imaginable task, such as for instance.
- Send an email
- Scrape website
- Search the web
- Search for records in database
- Invoke some API
- Etc ...
If the response contains AI functions, the cloudlet will invoke these, and transmit the response from such function invocations back to OpenAI again, which generates text based upon your live data. Finally, parts of the response will be sent to OpenAI's text to speech API, resulting in a friendly voice providing you with voice based information based upon your RAG, AI function invocations, and whatever your prompt happens to be requesting from the AI. Basically ...
AI Agents based upon voice commands and speech!
Wrapping up
There's a lot of others out there that can do real time voice AI chatbots. ChatGPT being one, but also Microsoft CoPilot has very strong support for real time conversations. However, as far as we know, we're the only ones that can integrate voice commands with AI agentic behaviour, allowing for anything more than simple conversations. This allows you to have an AI Agent based upon voice commands and speech.
The flipside is that we can't use OpenAI's realtime API, because its model is simply not capable of executing AI functions by returning JSON to our cloudlet - So there's latency after the engine has generated a text response before it speaks its output. But in general I think that's a minor trade off considering it's able to actually execute functions and "do stuff".
Have a Custom AI Solution
At AINIRO we specialise in delivering custom AI solutions and AI chatbots with AI agent features. If you want to talk to us about how we can help you implement your next custom AI solution, you can reach out to us below.