OpenAI's O3-mini versus DeepSeek R1

After having been dethroned by DeepSeek a couple of weeks ago, OpenAI was capable of crawling back on the throne last night with their new flagship model O3-mini. At AINIRO we officially support both DeepSeek's R1 and OpenAI's O3, so having factual information about each model's performance is crucial for us.
Notice, you can try both of these models in our AI Expert System.
Measuring Quality and Performance
We put both models to the test, and used GPT-4O as our baseline, and the conclusion is crystal clear; OpenAI's O3-mini is kick ass good, and if you need an extremely smart LLM, then this should definitely become one of your options as you evaluate LLMs to use in your future projects. We're already in the process of upgrading several of our clients to use O3-mini, in addition to that we'll use it ourselves in our own solutions.
The cost of Reasoning
Both O3-mini and DeepSeek's R1 are "reasoning models". This implies that they will spend some time "thinking" before they start responding. With DeepSeek you can actually see this thinking, which I think is super cool, but with O3-mini you cannot see its thoughts. Below you can see how DeepSeek is "reasoning" before it starts answering.
On average O3-mini spends about 5 seconds reasoning before it starts producing output for complex prompts. For some this will be an overhead they cannot tolerate, since GPT-4O starts answering on average after 0.5 seconds.
For DeepSeek R1 however, this thinking process can require 30 to 60 seconds, depending upon the complexity of your prompt. I suspect this is because of that we're running DeepSeek through HuggingFace, and that they're not giving us enough hardware to run DeepSeek optimally - So I would avoid concluding with that O3-mini is 50x faster than DeepSeek's R1, since this might drastically change with more hardware. However, when testing O3-mini through our systems, it seems to be about 50 times faster than DeepSeek's R1 unfortunately. You can reproduce this for yourself below.
- Compare DeepSeek's R1 and OpenAI O3 - Log in with your GMail account.
Which model?
We're running DeepSeek through HuggingFace, which makes it ridiculously slow compared to O3-mini. I suspect with better hardware, which unfortunately cost $2,500 per month to $25,000 per month, the speed differences here would probably become much smaller. However, at the speed we can currently run R1 through HuggingFace, I'd have to say I'd recommend OpenAI's O3-mini for now instead, unless privacy is an absolute must, at which point we could host R1 on a private server with zero spying occurring.
Yes, we must assume OpenAI is spying on us. With a self-hosted solution built on DeepSeek's R1 this wouldn't be possible even in theory!
When it comes to building AI agents, DeepSeek is not as good at understanding when and how to execute functions - But I suspect that's mostly a prompt engineering issue, and/or could be fixed with further fine tuning. However, O3 performs perfectly here, and is definitely on pair or better than GPT-4O. Hence for people that can afford the 5 seconds overhead before the model starts answering, we'd probably recommend O3 for our clients here in the future - Unless absolute privacy is a must, at which point we'd go with a self-hosted R1 solution.
GPT-4o however is faster, because it doesn't have reasoning, and probably for many of our clients still our goto model for these reasons (pun). But in general, O3 is about 40% of the cost, implying it's less expensive and smarter - Which basically is a too good value proposition to ignore for most of our clients, and we'll be advicing most our clients to switch to O3 as fast as possible.
Wrapping up
For a couple of weeks OpenAI was not the leading LLM provider out there. Regardless of what you think about Chinese surveillance, that's a good thing, since it creates a sense of urgency in the west, resulting in less expensive products and higher quality for us as consumers.
I am personally super happy that we now have a serious competitor from outside the US that can go on pair with the leading LLM providers from Silicon Valley. For us as a "middleware glue-code provider" this is just spectacular news, and we will always support both R1 and O3, plus whatever these guys are throwing at us in the future. However, we will never use DeepSeek's API, and exclusively use their LLMs over either HuggingFace, our own infrastructure, or some provider delivering access to these models from EU or US.
However, my conclusion so far is that for 90% of our customers we'll probably end up advicing these to use O3, starting from today!
Bravo OpenAI, O3 was really good!
Have a Custom AI Solution
At AINIRO we specialise in delivering custom AI solutions and AI chatbots with AI agent features. If you want to talk to us about how we can help you implement your next custom AI solution, you can reach out to us below.