Web scrape troubleshooting: Difference between revisions

Jump to navigation Jump to search
mNo edit summary
Line 20: Line 20:
   <tr>
   <tr>
     <th>Difficulty in implementing</th>
     <th>Difficulty in implementing</th>
    <th>Descriptioin</th>
     <th>Approach</th>  
     <th>Approach</th>  
     <th>Comments</th>  
     <th>Comments</th>  
   </tr>
   </tr>
   <tr>
   <tr>
     <td>easy</td>
     <td>Easy</td>
     <td>Url is the resource of dataset</td>  
    <td>Well-formatted HTML elements</td>  
     <td>Url is the resource of dataset.</td>  
     <td></td>
     <td></td>
   </tr>
   </tr>
   <tr>
   <tr>
     <td>more difficult</td>
     <td>Advanced</td>
    <td></td>
     <td>Url is the resource of dataset. Require to simulate post form submit with the form data or [[User agent|user agent]]</td>  
     <td>Url is the resource of dataset. Require to simulate post form submit with the form data or [[User agent|user agent]]</td>  
     <td>Using [[HTTP request and response data tool]] or [http://php.net/curl PHP: cURL]</td>
     <td>Using [[HTTP request and response data tool]] or [http://php.net/curl PHP: cURL]</td>
Line 35: Line 38:
   <tr>
   <tr>
     <td>more difficult</td>
     <td>more difficult</td>
    <td></td>
     <td>Require to simulate the user behavior on browser such as click the button, submit the form and obtain the file finally.</td>  
     <td>Require to simulate the user behavior on browser such as click the button, submit the form and obtain the file finally.</td>  
     <td>Using [https://www.seleniumhq.org/ Selenium] or [https://developers.google.com/web/updates/2017/04/headless-chrome Headless Chrome]</td>
     <td>Using [https://www.seleniumhq.org/ Selenium] or [https://developers.google.com/web/updates/2017/04/headless-chrome Headless Chrome]</td>
   </tr>
   </tr>
   <tr>
   <tr>
     <td>difficult</td>
     <td>Difficult</td>
    <td></td>
     <td>Ajax</td>  
     <td>Ajax</td>  
     <td></td>
     <td></td>
Anonymous user

Navigation menu