Web scrape troubleshooting: Difference between revisions
Jump to navigation
Jump to search
m
no edit summary
mNo edit summary |
|||
| Line 32: | Line 32: | ||
<tr> | <tr> | ||
<td>Advanced</td> | <td>Advanced</td> | ||
<td></td> | <td>Interactive websites</td> | ||
<td>Url is the resource of dataset. Require to simulate post form submit with the form data or [[User agent|user agent]]</td> | <td>Url is the resource of dataset. Require to simulate post form submit with the form data or [[User agent|user agent]]</td> | ||
<td>Using [[HTTP request and response data tool]] or [http://php.net/curl PHP: cURL]</td> | <td>Using [[HTTP request and response data tool]] or [http://php.net/curl PHP: cURL]</td> | ||
| Line 38: | Line 38: | ||
<tr> | <tr> | ||
<td>more difficult</td> | <td>more difficult</td> | ||
<td></td> | <td>Interactive websites</td> | ||
<td>Require to simulate the user behavior on browser such as click the button, submit the form and obtain the file finally.</td> | <td>Require to simulate the user behavior on browser such as click the button, submit the form and obtain the file finally.</td> | ||
<td>Using [https://www.seleniumhq.org/ Selenium] or [https://developers.google.com/web/updates/2017/04/headless-chrome Headless Chrome]</td> | <td>Using [https://www.seleniumhq.org/ Selenium] or [https://developers.google.com/web/updates/2017/04/headless-chrome Headless Chrome]</td> | ||
| Line 44: | Line 44: | ||
<tr> | <tr> | ||
<td>Difficult</td> | <td>Difficult</td> | ||
<td></td> | <td>Interactive websites</td> | ||
<td>Ajax</td> | <td>Ajax</td> | ||
<td></td> | <td></td> | ||