“Protected by reCaptcha” means that a form has to be accompanied by a g-recaptcha-response field, which is then verified by the backend through a request to reCaptcha server. Based on the probability calculated by a machine learning algorithm, reCaptcha may give you captchas with difficulty nonexistent, very hard and everything in between. Instead of having to solve the captcha by hand, this method allows using another valid browser session cookie which Google deems “human” to effectively bypass a captcha. These “valid” browser sessions can be farmed en masse. According to this report, “[…] a checkbox captcha is obtained after the beginning of the 9th day from the cookie’s creation, without requiring any browsing activities and type of network connection […]. Our experiment also revealed that each cookie can receive up to 8 checkbox captchas in a day.”
I used a simple nodejs server that serves a form to collect Google reCaptcha response. The caveat is: the browser must have the same hostname as the real website which is achieved by changing the /etc/hosts file or hosting a DNS server.
On the other hand, I am using a browser automation tool to submit a form which is protected by recaptcha. We need g-recaptcha-response in a hidden field. So when the page is loaded, we awaitfor a valid token:
And this then sends a message to all connected harvestors via websocket:
Now, there is a way to farm these valid harvestor sessions such that the first 20 or so recaptcha verifications are bypassed. Now the automated browser window has a valid reCaptcha, it needs to fill the hidden field and call the callback function! Great work Google!