Voice Based AI Agents

We're about to release voice based AI Agents. This allows you to literally speak to your computer, have the LLM understand your incentives, and respond to you using natural voice. It's based upon our AI Expert System, and combines OpenAI's GPT models with their audio and TTS APIs. To understand what I mean watch the following video.

You can try it here if you wish, at which point you can click the blue microphone button to trigger speech. It's a bit quirky on Safari on iPhones, because it doesn't allow for playing audio without having a user interaction - So if you're testing it from your iPhone you'll need to click the mic icon in the top / right corner of the output once it's available.

How it works

The first thing that happens when you speak, is that your audio is transmitted to OpenAI and transcribed. Then we retrieve RAG data from our database based upon the transcript and associate with your request as "context". OpenAI returns text back to our Magic cloudlet, which sometimes might include "AI function invocations". Such function invocations can be rapidly created using our low-code and no-code platform, and can do any imaginable task, such as for instance.

Send an email
Scrape website
Search the web
Search for records in database
Invoke some API
Etc ...

If the response contains AI functions, the cloudlet will invoke these, and transmit the response from such function invocations back to OpenAI again, which generates text based upon your live data. Finally, parts of the response will be sent to OpenAI's text to speech API, resulting in a friendly voice providing you with voice based information based upon your RAG, AI function invocations, and whatever your prompt happens to be requesting from the AI. Basically ...

AI Agents based upon voice commands and speech!

Wrapping up

There's a lot of others out there that can do real time voice AI chatbots. ChatGPT being one, but also Microsoft CoPilot has very strong support for real time conversations. However, as far as we know, we're the only ones that can integrate voice commands with AI agentic behaviour, allowing for anything more than simple conversations. This allows you to have an AI Agent based upon voice commands and speech.

The flipside is that we can't use OpenAI's realtime API, because its model is simply not capable of executing AI functions by returning JSON to our cloudlet - So there's latency after the engine has generated a text response before it speaks its output. But in general I think that's a minor trade off considering it's able to actually execute functions and "do stuff".

Read more about AI Agents from AINIRO here

Voice Based AI Agents

How it works

Wrapping up

Thomas Hansen

A Free Website AI Chatbot Widget for X-Mas

RAG versus AI Agents and SQL

Turn your CSV file into an AI Agent

Verticals

Misc

Solutions

Case Studies

Legal

Contact Us