Web scrape troubleshooting: Difference between revisions

Jump to navigation Jump to search
Line 71: Line 71:
** submit the from without loggin ★★☆☆☆
** submit the from without loggin ★★☆☆☆
** submit the from after logged the account ★★★☆☆
** submit the from after logged the account ★★★☆☆
* Detection of abnormal data
** [https://en.wikipedia.org/wiki/List_of_HTTP_status_codes HTTP status codes] ★★☆☆☆
** Data is wrong even they show HTTP 200 ★★★☆☆
* Etiquette of web scraping
* Etiquette of web scraping
** Limit ot web request ★★☆☆☆
** Limit ot web request ★★☆☆☆
Line 82: Line 85:
* Data cleaning e.g. unprintable characters ★★★☆☆
* Data cleaning e.g. unprintable characters ★★★☆☆
* [https://en.wikipedia.org/wiki/Regular_expression Regular expression]  ★★★☆☆
* [https://en.wikipedia.org/wiki/Regular_expression Regular expression]  ★★★☆☆
* Selection of database engine ★★★☆☆
* Selection of database engine ★★★★☆


== Further reading ==
== Further reading ==
Anonymous user

Navigation menu