Voice Based AI Agents

Voice Based AI Agents

We're about to release voice based AI Agents. This allows you to literally speak to your computer, have the LLM understand your incentives, and respond to you using natural voice. It's based upon our AI Expert System, and combines OpenAI's GPT models with their audio and TTS APIs. To understand what I mean watch the following video.

You can try it here if you wish, at which point you can click the blue microphone button to trigger speech. It's a bit quirky on Safari on iPhones, because it doesn't allow for playing audio without having a user interaction - So if you're testing it from your iPhone you'll need to click the mic icon in the top / right corner of the output once it's available.

How it works

The first thing that happens when you speak, is that your audio is transmitted to OpenAI and transcribed. Then we retrieve RAG data from our database based upon the transcript and associate with your request as "context". OpenAI returns text back to our Magic cloudlet, which sometimes might include "AI function invocations". Such function invocations can be rapidly created using our low-code and no-code platform, and can do any imaginable task, such as for instance.

  • Send an email
  • Scrape website
  • Search the web
  • Search for records in database
  • Invoke some API
  • Etc ...

If the response contains AI functions, the cloudlet will invoke these, and transmit the response from such function invocations back to OpenAI again, which generates text based upon your live data. Finally, parts of the response will be sent to OpenAI's text to speech API, resulting in a friendly voice providing you with voice based information based upon your RAG, AI function invocations, and whatever your prompt happens to be requesting from the AI. Basically ...

AI Agents based upon voice commands and speech!

Wrapping up

There's a lot of others out there that can do real time voice AI chatbots. ChatGPT being one, but also Microsoft CoPilot has very strong support for real time conversations. However, as far as we know, we're the only ones that can integrate voice commands with AI agentic behaviour, allowing for anything more than simple conversations. This allows you to have an AI Agent based upon voice commands and speech.

The flipside is that we can't use OpenAI's realtime API, because its model is simply not capable of executing AI functions by returning JSON to our cloudlet - So there's latency after the engine has generated a text response before it speaks its output. But in general I think that's a minor trade off considering it's able to actually execute functions and "do stuff".

Have a Custom AI Solution

At AINIRO we specialise in delivering custom AI solutions and AI chatbots with AI agent features. If you want to talk to us about how we can help you implement your next custom AI solution, you can reach out to us below.

Thomas Hansen

Thomas Hansen I am the CEO and Founder of AINIRO.IO, Ltd. I am a software developer with more than 25 years of experience. I write about Machine Learning, AI, and how to help organizations adopt said technologies. You can follow me on LinkedIn if you want to read more of what I write.

Published 17. Dec 2024

A Free Website AI Chatbot Widget for X-Mas

Until the end of December we're giving away a free website AI chatbot to the first 25 people to qualify. Got website? Need AI? We've got you!

Read More

RAG versus AI Agents and SQL

RAG is amazing, and it's arguably 80% of our revenue. But RAG doesn't always work for our use cases. If this is true for you, you should create an AI Agent instead doing lookups into an SQL database.

Read More

Turn your CSV file into an AI Agent

In this article I am going to explain how you can turn your CSV file into an AI Agent in 5 minutes using Low-Code, No-Code, and AINIRO's Magic Cloud.

Read More