Web scrape troubleshooting: Difference between revisions

From LemonWiki共筆
Jump to navigation Jump to search
No edit summary
mNo edit summary
Line 10: Line 10:


Further reading
Further reading
* [https://stackoverflow.com/questions/13200152/why-say-that-http-is-a-stateless-protocol Why say that HTTP is a stateless protocol? - Stack Overflow]
* stateless: [https://stackoverflow.com/questions/13200152/why-say-that-http-is-a-stateless-protocol Why say that HTTP is a stateless protocol? - Stack Overflow]
* stateful: [http://www.webopedia.com/TERM/S/stateful.html What is stateful? Webopedia Definition]


[[Category:Programming]]
[[Category:Programming]]
[[Category:Data science]]
[[Category:Data science]]
[[Category:Data collecting]]
[[Category:Data collecting]]

Revision as of 10:17, 13 June 2017

list of technical issues

  1. website revision: expected web content (of DOM element) was empty
    • Multiple sources of same column such as different HTML DOM but have the same column value.
    • Backup the HTML text of parent DOM element
    • (optional) complete HTML file backup
  2. server ip ban
  3. CATCHA
  4. AJAX

Further reading