Web scrape troubleshooting: Difference between revisions
Jump to navigation
Jump to search
→Skill tree of web scraping
| Line 71: | Line 71: | ||
** submit the from without loggin ★★☆☆☆ | ** submit the from without loggin ★★☆☆☆ | ||
** submit the from after logged the account ★★★☆☆ | ** submit the from after logged the account ★★★☆☆ | ||
* Detection of abnormal data | |||
** [https://en.wikipedia.org/wiki/List_of_HTTP_status_codes HTTP status codes] ★★☆☆☆ | |||
** Data is wrong even they show HTTP 200 ★★★☆☆ | |||
* Etiquette of web scraping | * Etiquette of web scraping | ||
** Limit ot web request ★★☆☆☆ | ** Limit ot web request ★★☆☆☆ | ||
| Line 82: | Line 85: | ||
* Data cleaning e.g. unprintable characters ★★★☆☆ | * Data cleaning e.g. unprintable characters ★★★☆☆ | ||
* [https://en.wikipedia.org/wiki/Regular_expression Regular expression] ★★★☆☆ | * [https://en.wikipedia.org/wiki/Regular_expression Regular expression] ★★★☆☆ | ||
* Selection of database engine | * Selection of database engine ★★★★☆ | ||
== Further reading == | == Further reading == | ||