This blog serves to publish the latest announcements and changes for scrapeulous.com We will publish instructions and general tutorials about scraping on this blog. Questions regarding the content of this blog are answered by firstname.lastname@example.org
Breaking Google's Recaptcha
A captcha is a mechanism to distinguish human users from automated programs (bot). There are many service providers in the Internet that have a major incentive to prevent bots from (ab)using their systems. Imagine if there was a reliable method to break Google’s famous reCAPTCHA v2 or the new reCAPTCHA v3 ([released in late 2018). The following scenarios would be possible: Mass creation of accounts on sites such as Reddit.
Scraping search engines in 2019
When I started developing a simple Python script in 2012, which later would become the open source software GoogleScraper, I needed to stay up to date with the latest web technology. Since then, I developed a love-hate relationship with web scraping. One the one side, being able to scrape websites in large quantities gives you instant access to the most up-to-date information of the world. On the other side, web scraping is inherently a unstable business, because you are a third party source and rely on the provider (such as Google or Bing).