Web scrape troubleshooting: Difference between revisions

Jump to navigation Jump to search
no edit summary
mNo edit summary
No edit summary
Line 4: Line 4:
#* Backup the HTML text of parent DOM element
#* Backup the HTML text of parent DOM element
#* (optional) Complete HTML file backup
#* (optional) Complete HTML file backup
# Server ip ban
# The IP was banned from server
#* setting the temporization (sleep time) between pages ex: [http://php.net/manual/en/function.sleep.php PHP: sleep - Manual], [http://doc.scrapy.org/en/1.0/topics/autothrottle.html#topics-autothrottle AutoThrottle extension — Scrapy 1.0.3 documentation]
#* Setting the temporization (sleep time) between pages ex: [http://php.net/manual/en/function.sleep.php PHP: sleep - Manual], [http://doc.scrapy.org/en/1.0/topics/autothrottle.html#topics-autothrottle AutoThrottle extension — Scrapy 1.0.3 documentation]
#* The server responded with a status of 403: '[https://zh.wikipedia.org/wiki/HTTP_403 403 forbidden]' --> Change the network IP
# CATCHA
# CATCHA
# AJAX
# AJAX

Navigation menu