Web scrape troubleshooting: Difference between revisions

From LemonWiki共筆

Jump to navigation Jump to search

Revision as of 10:14, 13 June 2017

list of technical issues

website revision: expected web content (of DOM element) was empty
- Multiple sources of same column such as different HTML DOM but have the same column value.
- Backup the HTML text of parent DOM element
- (optional) complete HTML file backup
server ip ban
- setting the temporization (sleep time) between pages ex: PHP: sleep - Manual, AutoThrottle extension — Scrapy 1.0.3 documentation
CATCHA
AJAX

Further reading

Why say that HTTP is a stateless protocol? - Stack Overflow

Retrieved from "https://wiki.planetoid.info/index.php?title=Web_scrape_troubleshooting&oldid=18204"