Proof of Work as CAPTCHA

Proof of Work as CAPTCHA

The definition of CAPTCHA according to Merriam Webster's dictionary is; "A test to prevent spamming software from accessing a website by requiring visitors to the site to solve a puzzle in order to gain access."

The paradox is that the above task can actually be completely automated, by using Proof of Work, instead of bothering users with weird looking images and asking them to solve challenging puzzles. The point being that spamming is only lucrative because of the economy of scale. If you eliminate the economy of scale, most bots will no longer be interested in spamming you.

Proof Of Work

If I can send out 10,000 emails to 10,000 random recipients for $1, then economy of scale tells me it is highly lucrative if I can earn $10 in total on these emails. If the cost for sending out these emails increases from $1 to $100, it is no longer lucrative, and I'm instead loosing money on sending out my emails. This fact can be exploited to create CAPTCHA logic that makes spamming obsolete.

You hurt the bot where it's most painful, in its wallet!

Proof of Work is used in the BitCoin network to make sure the ledger as a whole is valid. For BitCoin the task is to generate thousands of hash values until you by pure chance end up with a hash value having some 4 to 5 zeros at the end. This requires electricity, a lot of electricity, ensuring only participants seriously interested in actually verifying the ledger will want to have a go at solving the problem.

BlowFish as Proof of Work

A similar technique can actually be used instead of CAPTCHA. BlowFish is a "slow hashing algorithm". In fact, its primary feature is that it is slow, and therefore spends a lot of CPU. This again has a cost, implying you're paying with electricity to create a BlowFish hash. The workload, implying the amount of electricity required to create a valid BlowFish hash can also be configured. With my current BlowFish JavaScript library, at workload 16 on a MacBook Pro, it takes 100 milliseconds to create a BlowFish hash of some string that's 32 bytes long.

This implies that even on my expensive and new MacBook Pro with an M3 CPU, I can at the most generate 10 such hash values per second with a workload of 16 iterations. For a legitimate human user, if it takes 0.1 additional seconds to click a button to submit a form is irrelevant.

By exploiting this simple fact, I can now create "a puzzle" that would make most scammers no longer be interested in spamming me by invoking HTTP endpoints in my server over and over again. Their cost for doing such a thing is simply too high.

So now all I have to do, is to create a unique BlowFish token in my frontend code before I invoke my HTTP endpoint, for then to verify this token in my backend as my endpoint is executing. If the hash value is not valid, I abort the execution of my endpoint's code.

Implementation

I've created a JavaScript file in Magic Cloud that creates a BlowFish hash value. The JavaScript file is dynamically created and associates a unique "public key" with its code. The public key is created specifically for one server using a server-side secret that's never shared with the client. Basically, the secret is my JWT secret, double hashed with SHA1. Double hashing to avoid brute force on my auth secret.

When the frontend requires a new CAPTCHA token, it invokes a function in this file, with a callback function that's invoked when the hash value is ready. This function again takes the current Unix timestamp, appends it to the public key, and BlowFish hashes the concatenated result. Then it takes this hash value, appends a ;, in addition to the timestamp it used to generate the hash. The end result becomes an "invocation token". Using my library resembles the following.

mcaptcha.token(token => {
   // Pass in token to server
});

The token looks as follows.

$2a$10$17134600313740.016285uHzevxskhOFg/i7R5ss9bhF6xeuN.njC;1713460031

The above is a standard BlowFish hash value, with a unique salt, and its workload - In addition to a semicolon and the Unix timestamp for when the token was generated. On the server again, I've got a simple piece of Hyperlambda executing resembling the following.

execute:magic.auth.captcha-verify
    token:x:@.arguments/*/captcha

The above code assumes the token is supplied as a "captcha" parameter to the endpoint.

The above Hyperlambda slot again, extracts the Unix timestamp for the tail of my token, and verifies it's not more than 5 seconds old. This implies I've got 5 seconds from the token is generated to reach my server with the token to be able to successfully use the token.

The server again, re-creates the public key, appends a semicolon and the timestamp to it, and verifies that the BlowFish hash value matches the re-created public key + timestamp.

I've now created a "password" that's valid for 5 seconds

And the point being that it took me 100 milliseconds to create the above "password", implying few bots will have any interest in trying to automate the process to generate thousands of such tokens, because I've now taken away "the economy of scale" from their business.

If the token is invalid, the above will throw an exception. If the token is valid, it will do nothing. On my server the cost associated with verifying the token is negligible, around 25 milliseconds. However, creating the token on the client requires 100 milliseconds. The end result is that most spammers would probably shy away from trying to brute force multiple tokens in a loop, realising the cost associated with creating one such token is simply too high.

Future Improvements

Possible improvements would be to actually implement a similar algorithm as BitCoin is using, since verifying a simple SHA1 on the server is dirt cheap, and can be done in probably less than 1 milliseconds. If I do, I will of course not be able to use more than 2 to 3 zeros at the end, allowing the average client to generate a SHA1 "by chance" in some 100 to 1000 attempts - Implying the client would spend 100 to 1,000 milliseconds creating the hash, while my server only needs 1 milliseconds to verify it.

In addition, I might want to store my tokens on the server for at least 10 seconds, to verify each token is only used once.

However, all in all, I think the above is simply brilliant, because now spammers needs to spend $100 to create 1,000 HTTP invocations - While previously they could generate 1,000 HTTP invocations for probably less than $0.10.

Conclusion

Although I personally believe BitCoin and Crypto is the dumbest thing we've ever collectively come up with in this world, there's a lot of good ideas in the underlying technology. Proof of Work being one example ...

Over the next couple of days I will refine my algorithm, probably primarily generate SHA1 values on the client until I've got a SHA1 value with 2 to 3 trailing zeroes, which implies 100 to 1,000 attempts on average to create a valid token, allowing the server to verify the token in 1 millisecond, while the client needs to spend probably between 100 to 1,000 milliseconds to generate the token.

When I'm done, I will unceremonially SHIFT+DELETE Google's reCAPTCHA as the junkware it is, and in the process having improved page load speed of our website probably by at least 20 points.

Maybe I'll even create some open sauce code for you to download, such that we collectively as a specie can forever bury Google's reCAPTCHA, and in the process probably make the internet a bajillion times faster ...

Thomas Hansen

Thomas Hansen I am the CEO and Founder of AINIRO.IO, Ltd. I am a software developer with more than 25 years of experience. I write about Machine Learning, AI, and how to help organizations adopt said technologies. You can follow me on LinkedIn if you want to read more of what I write.

Published 18. Apr 2024