We have just implemeted streaming of tokens from OpenAI. This improves our chatbot's response time roughly 30 times. The reasons for this is because the chatbot will immediately start writing out words as they're being returned from OpenAI, resulting in that it typically takes no more than 1 to 3 seconds before you can start reading its response.
Previously we would wait for the whole response from OpenAI before we returned it from our server. Now we're using Web Sockets and OpenAI's streaming features to return words immediately once we've got them.
This makes our chatbot roughly 30 times faster for all practical concerns
Notice, it will still spend the same amount of time before its done, but since it starts writing words immediately once it's got words, it "feels" as if the chatbot is 30 times faster, and you can start reading its response 30 times faster. Previously a single chatbot response would take 30 to 50 seconds. Now you can start reading its response in 1 to 3 seconds.
Remembering Conversations in Sessions
In addition to the above, the chatbot will now remember conversations across page views within the same domain. This allows you to click links and navigate the website, and the conversation will "follow" you around, allowing you to ask follow up questions, and continue the dialogue, as if it was a human sales assistance following you around in a physical store. You can see both of these features in the following video.
Over the next couple of days we will be updating all partner and client cloudlets to take advantage of this new feature. However, to avoid breaking backwards compatibility you need to add the following QUERY parameter to your embed script.
Without this QUERY parameter the chatbot will resort to its default behaviour, which is not streaming tokens.