Web scrape troubleshooting: Difference between revisions
Jump to navigation
Jump to search
→List of technical issues
mNo edit summary |
|||
| Line 20: | Line 20: | ||
<tr> | <tr> | ||
<th>Difficulty in implementing</th> | <th>Difficulty in implementing</th> | ||
<th>Descriptioin</th> | |||
<th>Approach</th> | <th>Approach</th> | ||
<th>Comments</th> | <th>Comments</th> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td> | <td>Easy</td> | ||
<td>Url is the resource of dataset</td> | <td>Well-formatted HTML elements</td> | ||
<td>Url is the resource of dataset.</td> | |||
<td></td> | <td></td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td> | <td>Advanced</td> | ||
<td></td> | |||
<td>Url is the resource of dataset. Require to simulate post form submit with the form data or [[User agent|user agent]]</td> | <td>Url is the resource of dataset. Require to simulate post form submit with the form data or [[User agent|user agent]]</td> | ||
<td>Using [[HTTP request and response data tool]] or [http://php.net/curl PHP: cURL]</td> | <td>Using [[HTTP request and response data tool]] or [http://php.net/curl PHP: cURL]</td> | ||
| Line 35: | Line 38: | ||
<tr> | <tr> | ||
<td>more difficult</td> | <td>more difficult</td> | ||
<td></td> | |||
<td>Require to simulate the user behavior on browser such as click the button, submit the form and obtain the file finally.</td> | <td>Require to simulate the user behavior on browser such as click the button, submit the form and obtain the file finally.</td> | ||
<td>Using [https://www.seleniumhq.org/ Selenium] or [https://developers.google.com/web/updates/2017/04/headless-chrome Headless Chrome]</td> | <td>Using [https://www.seleniumhq.org/ Selenium] or [https://developers.google.com/web/updates/2017/04/headless-chrome Headless Chrome]</td> | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td> | <td>Difficult</td> | ||
<td></td> | |||
<td>Ajax</td> | <td>Ajax</td> | ||
<td></td> | <td></td> | ||