Web scrape troubleshooting: Difference between revisions

Web scrape troubleshooting (edit)

369 bytes added , 22 December 2025

m

15,047

edits

@@ Line 15: / Line 15: @@
 . The IP was banned from server
-* Setting the temporization (sleep time) between each request e.g.: [http://php.net/manual/en/function.sleep.php PHP: sleep - Manual], [http://doc.scrapy.org/en/1.0/topics/autothrottle.html#topics-autothrottle AutoThrottle extension — Scrapy 1.0.3 documentation] or [[Sleep | Sleep random seconds in programming]].
+* Random Delays: Setting the temporization (sleep time) between each request e.g.: [http://php.net/manual/en/function.sleep.php PHP: sleep - Manual], [http://doc.scrapy.org/en/1.0/topics/autothrottle.html#topics-autothrottle AutoThrottle extension — Scrapy 1.0.3 documentation] or [[Sleep | Sleep random seconds in programming]].
 * The server responded with a status of 403: '[https://zh.wikipedia.org/wiki/HTTP_403 403 forbidden]' --> Change the network IP
+* Smart Retry: '''Automatic retry''' or '''Exponential Backoff'''<ref>[https://en.wikipedia.org/wiki/Exponential_backoff Exponential backoff - Wikipedia]: "Exponential backoff is an algorithm that uses feedback to multiplicatively decrease the rate of some process, in order to gradually find an acceptable rate."</ref> on network errors (up to 3 times)
 . [https://en.wikipedia.org/wiki/CAPTCHA CAPTCHA]
@@ Line 84: / Line 86: @@
 * ''target website'' + download / downloader site:github.com
 * ''target website'' + browser client site:github.com
 == Common Web Scraping Issues and Solutions ==