Web scrape troubleshooting: Difference between revisions

Jump to navigation Jump to search
m
Line 61: Line 61:
* Understnding the web technology
* Understnding the web technology
** HTTP GET/POST ★★☆☆☆
** HTTP GET/POST ★★☆☆☆
** HTTP/CSS/Javascript ★★☆☆☆
** CSS seletor and DOM (Document Object Model) elements ★★☆☆☆
** CSS seletor and DOM (Document Object Model) elements ★★☆☆☆
** AJAX (Asynchronous JavaScript and XML) ★★★★☆
** AJAX (Asynchronous JavaScript and XML) ★★★★☆
Line 73: Line 74:
* Detection of abnormal data
* Detection of abnormal data
** [https://en.wikipedia.org/wiki/List_of_HTTP_status_codes HTTP status codes] ★★☆☆☆
** [https://en.wikipedia.org/wiki/List_of_HTTP_status_codes HTTP status codes] ★★☆☆☆
** Data is wrong even they show HTTP 200 ★★★☆☆
** Data is wrong even the server throw HTTP 200 status code ★★★☆☆
* Etiquette of web scraping
* Etiquette of web scraping
** Limit ot web request ★★☆☆☆
** Limit of web request ★★☆☆☆
* Tom and Jerry
* Tom and Jerry
** VPN and proxy ★★☆☆☆
** VPN and proxy ★★☆☆☆
Anonymous user

Navigation menu