O1 not Good Enough (for us)

The last couple of days we've been playing around with OpenAI's latest O1 models. Initially we were super jazzed about these new models, and thought they could be a valuable addition to our product - But after having played around with these models and done some experiments, our conclusion is that they're simply not good enough for us, and suffers from many problems making them basically useless for our use cases.

Problems with these models

First of all, all the O1 models are using a different API. You cannot add system messages to them, implying you'll have to change your middleware to use these models. In addition, you cannot change its temperature, and the counting of tokens is completely different due to the models' reliance upon "reasoning tokens." This forced me to completely change the API middleware to simply be able to test these models with our tech.

Even after having done the above changes, we had to rip out streaming support. The new O1 models doesn't support stream-events, implying it can't stream tokens as it proceeds with its answer. This is a big deal for us, since we're using a CDN provider with a maximum timeout of 60 seconds. Since these models sometimes spends more than 60 seconds answering questions, this results in a timeout from our CDN provider.

Basically, these O1 models are useless for us

Overkill IQ

Below is a screenshot of gpt-4o evaluating a company based upon historical accounting figures as reported to the government by the company itself.

Evaluating a company with AI

Notice how it's perfectly capable of breaking down the evaluation process into a step by step method, using the discounted cash flow evaluation process. This is a fairly complex mathematical process, and requires hours of calculations for a human being. gpt-4o is just magically capable of performing the whole process by itself, by looking up accounting data from its database, and performing the whole evaluation process - 100% autonomously.

The complete result is several pages long, and goes into details such as:

Free Cash Flow
Weighted Average Cost of Capital
Forecast Future Cash Flows
Terminal Value
Discount Future Cash Flow
Enterprise Value
Equity Value
Etc, etc, etc.

This is currently the most complex calculation requirement we've got, and it's a project we're doing for one of our customers who wants to create an AI chatbot SaaS company giving investment advice based upon historical company data from a database with some roughly 5 to 10 million records. The project is using gpt-4o, and its average response time is less than 20 seconds due to using gpt-4o. The project is delivered as an AI chatbot SaaS company based upon our AI Expert System leveraging our AI Agent capabilities. In addition, it's got a database of roughly 5 to 10 million records with historical financial data it's doing lookups into.

Adding gpt-o1 as our model for something such as the above, would simply be overkill and not interesting for us. The only thing we'd achieve is higher expenses and lower quality user experience.

No reasons to use a nuke when a shotgun is sufficient

gpt-4o is good enough

OpenAI's existing models, in particular gpt-4o, is simply good enough for us. In addition it's a fraction of the cost, it supports streaming, and it answers much faster. Using o1 instead would accomplish nothing for us.

These new o1 models might be truly incredible for some use cases, in particular complex research, used on scientific problems. According to OpenAI themselves, it's got the reasoning capability of a PhD student, where 4o has the reasoning capability of a high school student.

Our problem is that 100% of our current use cases can easily be solved with high school math. Implying the only thing we'd achieve by using o1 is higher costs, lower quality UX, and slower responses.

Conclusion

Until O1 can be used with a similar API as gpt-4o, and it supports streaming, O1 is simply not interesting for us - Or our customers. These models might be incredibly useful for complex research and science, but for us they're simply overkill.

In fact, the only reason we're using gpt-4o and not gpt-4o-mini, is because the mini model isn't good enough at following instructions, so OpenAI's gpt-4o model is currently our weapon of choice for our customers.

My wish list for OpenAI is not stronger models such as the o1 suite of models - But rather better "weak" models. Improve gpt-4o-mini to allow it to better follow instructions would be high on my wish list.

O1 might be incredibly valuable for complex research, but none of our problems currently requires complex research capabilities. Our current use cases are simply easily soved using high school capabilities. So at least in the foreseeable future, we see no reasons to spend time implementing support for o1. This might change in the future, but for now these models simply aren't interesting to us ...

O1 not Good Enough (for us)

Problems with these models

Overkill IQ

gpt-4o is good enough

Conclusion

Thomas Hansen

Adding Web Browsing to GPT-O1

Project Strawberry, AKA GPT5 Released

Monetise your GPTs

Solutions

Misc

Legal

Solutions

Case Studies

Contact Us