Web scrape troubleshooting: Difference between revisions
Jump to navigation
Jump to search
m
→Skill tree of web scraping
| Line 61: | Line 61: | ||
* Understnding the web technology | * Understnding the web technology | ||
** HTTP GET/POST ★★☆☆☆ | ** HTTP GET/POST ★★☆☆☆ | ||
** HTTP/CSS/Javascript ★★☆☆☆ | |||
** CSS seletor and DOM (Document Object Model) elements ★★☆☆☆ | ** CSS seletor and DOM (Document Object Model) elements ★★☆☆☆ | ||
** AJAX (Asynchronous JavaScript and XML) ★★★★☆ | ** AJAX (Asynchronous JavaScript and XML) ★★★★☆ | ||
| Line 73: | Line 74: | ||
* Detection of abnormal data | * Detection of abnormal data | ||
** [https://en.wikipedia.org/wiki/List_of_HTTP_status_codes HTTP status codes] ★★☆☆☆ | ** [https://en.wikipedia.org/wiki/List_of_HTTP_status_codes HTTP status codes] ★★☆☆☆ | ||
** Data is wrong even | ** Data is wrong even the server throw HTTP 200 status code ★★★☆☆ | ||
* Etiquette of web scraping | * Etiquette of web scraping | ||
** Limit | ** Limit of web request ★★☆☆☆ | ||
* Tom and Jerry | * Tom and Jerry | ||
** VPN and proxy ★★☆☆☆ | ** VPN and proxy ★★☆☆☆ | ||