Web scrape troubleshooting: Difference between revisions

From LemonWiki共筆
Jump to navigation Jump to search
No edit summary
No edit summary
Line 9: Line 9:
# AJAX
# AJAX


Further reading
* [https://stackoverflow.com/questions/13200152/why-say-that-http-is-a-stateless-protocol Why say that HTTP is a stateless protocol? - Stack Overflow]


[[Category:Programming]]
[[Category:Programming]]
[[Category:Data science]]
[[Category:Data science]]
[[Category:Data collecting]]
[[Category:Data collecting]]

Revision as of 10:14, 13 June 2017

list of technical issues

  1. website revision: expected web content (of DOM element) was empty
    • Multiple sources of same column such as different HTML DOM but have the same column value.
    • Backup the HTML text of parent DOM element
    • (optional) complete HTML file backup
  2. server ip ban
  3. CATCHA
  4. AJAX

Further reading