How to add Memory to ChatGPT

How to add Memory to ChatGPT

Yesterday OpenAI added memory to ChatGPT. Such a thing is actually surprisingly easy to implement, and we've been dabbling with it ourselves in our own AI chatbots.

We've never actually implemented rich memory, besides our the AI chatbot that knows your name feature, since for most of our use cases it's irrelevant - However, if somebody out there are interested in an AI chatbot with extended memory, and are willing to pay for it, we'd love to get in touch with you.

Notice, such a thing would be a completely custom software project and would therefor require deep pockets!

How memory works in ChatGPT

As a specialist on RAG-based AI chatbots we probably know more about how to add memory to your AI chatbot than most. If you look at our platform, you will see that it actually (anonymously) stores each question and answer associated with a user ID.

Historical AI chatbot requests

This allows us to know which questions each individual user have asked without privacy concerns. The rest of the task is simply to vectorise or create embeddings based upon the input provided by the user, which allows us to later retrieve these entries using dot products as the user is asking more questions. Then if the match is above some sort of threshold, such as for instance 0.5 or something, we associate "the memory" with the context we're sending to OpenAI.

To further refine the process, we can also attach individual top scoring "memory entries" to the query we're using as we're matching records from our training snippets. In addition, it would be fairly easy to implement a "classification system" for "persistent memories", allowing the chatbot to use some entries specified by the user as "static memories" that are always attached to the query when matching training snippets and context data.

This allows us to extract static memories, that are basic facts that should always be a part of the queries we're using towards our RAG database - In addition to dynamically matched memories that are only matched if relevant to the question at hand.

An example use case

Imagine you're on a real estate website, and you're searching for an apartment. The chatbot replies and provides you with a beautiful apartment with one bedroom and a sea view. You inform the chatbot that you've got 2 children, and need at least 3 bedrooms. Your classification algorithm kicks in and informs you that this is "personal and relevant information about the user", so you create embeddings for the reply from the user, and adds it to the "static memory bank" for your chatbot.

One week later the same user comes back, and says "I'm looking for a house". The static memory kicks in and extracts the "This user has 3 children and needs 3 bedrooms" memory, which is then appended to the query as the chatbot is trying to match training data. What we're now looking for when using RAG isn't only the question the user asked, but in fact the following.

I'm looking for a house. This user has 3 children and needs 3 bedrooms

The last sentence originating from the "static memory bank". The embeddings API will then prioritise snippets with 3 bedrooms further up in the context, and recommend a 3 bedroom houses instead of the previously matched 1 bedroom apartment. One could imagine a response from OpenAI resembling the following at that point.

Hi John, since you've got 3 adorable children, you might want to look at the following 3 bedroom house ...

The process becomes a little bit more complex when we're dealing with a RAG database, something OpenAI doesn't really care that much about - But fundamentally it's similar enough to be able to easily implement. The way OpenAI (probably) implemented this, is by creating a simple RAG database of individual records, for then to match towards this as the user is querying ChatGPT.

Prioritising RAG items

We've done similar things with several of our clients. One of our clients, being a real estate agency wanted to have the AI chatbot recommend properties that only they were selling. They had tagged all of these properties with some custom tag in their CMS system. As we're extracting items through our synchronization import once each day, we extract these tags, and inject these early into the prompt we're storing. The tag could be for instance "PREFERRED_SNIPPET".

Then we've got a "Search postfix" setting on the model, which will append all queries towards our RAG database with this static value. You can see a screenshot below.

Search postfix prioritising context data

When the user is asking for "Apartment with swimming pool", what's actually sent to our RAG database is as follows.

Apartment with swimming pool - PREFERRED_SNIPPET

This will ensure the embeddings containing "PREFERRED_SNIPPET" will be prioritised, and such snippets will "bubble" to the top. The process with long term memory is actually surprisingly similar.

Use cases

We've done this with several of our clients. Real estate companies being one example. Another example is a video streaming service that's one of our clients. The latter wanted to have "most popular" videoes being prioritised, so we extracted the tag called "most popular", and created a static search postfix, ensuring items that were most popular bubbles to the top.

Another use case of course, is our "the chatbot that remembers your name" feature. The last one is built using questionnaires though, but the process is fundamentally the same. You can read about the latter one below if you haven't already clicked the link above.

Especially when you're dealing with sales, personalization becomes crucial. It's the stuff that truly makes your AI chatbot pops out and convert.


We obviously don't know the exact details of how OpenAI implemented long term memory, but it's actually surprisingly easy. One could imagine a RAG database, a summary created from the memory bank being sent as an invisible message 1, a classification system classifying factual data about the user as facts, for then to use RAG as the user is asking questions, etc. A summary that's constantly added to as user adds more facts, etc.

If you want to have a meeting with us to see how we can help you out with a personalized AI chatbot, you can contact us below. The stuff that OpenAI did with long term memory is actually surprisingly easy to implement.

Thomas Hansen

Thomas Hansen I am the CEO and Founder of AINIRO.IO, Ltd. I am a software developer with more than 25 years of experience. I write about Machine Learning, AI, and how to help organizations adopt said technologies. You can follow me on LinkedIn if you want to read more of what I write.

Published 14. Feb 2024

AINIRO Influencer Program - Get a Free Web Scraper

If you're an influencer in the AI space, we've got some great news to you; We'll give you access to our web scraper for free in return for some marketing.

Read More

How to Scrape a URL for your Custom GPTs

If you want to create truly amazing custom GPTs, you must have the ability to scrape a URL. Here I explain how to do just that using Magic.

Read More

How to Connect your AI Chatbot to Shopify

If you connect your AI Chatbot to the Shopify API, you can typically expect much higher quality because now it's got access to semantic data.

Read More