10x Better Scraper

I suspect we already had the best web scraper in the industry for AI and LLM. However, we have somehow managed to increase its quality 10x as of today. If you want to try it out, you can create a demo chatbot here.

Why bother you may ask? Well, the problem with AI and ChatGPT specifically is that it suffers from the exact same problems as everything else related to computing. Which is as follows ...

Garbage in, garbage out!

Unless you can somehow provide super high quality input to it, it will simply return inferior quality back to you.

If you can somehow provide it with super high quality input, it will return super high quality output to you - And it all starts with how your website is scraped. Hence, the quality of the scraper becomes crucial to create high quality AI chatbots.

How it works

In the above link we've already explained all the basics, such as how our scraper chops up pages into multiple "training snippets". The new thing in the current release, is that we're able to also use DIV elements from your page, in addition to that we will create specific training snippets for images, allowing images to be loosely associated with queries. The latter is kind of a big deal since when displaying images, relevant images might not (only) be found where the images are physically found on your website.

This allows our chatbot technology to find and display images much more frequently than before, and also typically display more relevant images.

In addition to this, we've updated the scraper to tolerate almost anything. As long as your site is not a SPA, or blocking scrapers, our scraper will somehow be able to extract meaningful content from it. On top of this, our scraping technology will now respect your robots.txt file, and not scrape unless given permission. If you want to prevent our scraper from scraping our website you can stop it similarly to how you stop OpenAI's scraper, except ours is named AINIRO. In addition to the above, we can now also scrape password protected websites, though this require a small amount of manual work from our side.

Below is a screenshot of our website scraper while working.

ChatGPT chatbot scraper

New demos for all

To celebrate our new scraper, and to allow for everybody to test its quality, we've decided to allow everybody that have previously created a demo chatbot to create one more demo chatbot. In case you tried our demo previously and you weren't satisfied, you can now try it again to see the quality difference.

Notice - If you tried to create a demo chatbot previously and it didn't work, it will highly likely work now. In fact, the only thing our scraper doesn't tolerate as far as we know are SPA sites, in addition to sites explicitly blocking scrapers. However, even when it doesn't work, the new scraper will give you feedback about why it doesn't work - Allowing you to fix your site such that we can scrape it later.

Thomas Hansen

Thomas Hansen I am the CTO of AINIRO.IO AS and the CEO of AINIRO.IO, Ltd. I am a software developer with more than 25 years of experience. I write about Machine Learning, AI, and how to help organizations adopt said technologies. You can follow me on LinkedIn if you want to read more of what I write.

Published 12. Oct 2023

