Web scrape troubleshooting: Difference between revisions

Web scrape troubleshooting (edit)

107 bytes added , 25 May 2021

Anonymous user

@@ Line 20: / Line 20: @@
    <tr>
      <th>Difficulty in implementing</th>
+    <th>Descriptioin</th>
      <th>Approach</th>
      <th>Comments</th>
    </tr>
    <tr>
-     <td>easy</td>
+     <td>Easy</td>
-     <td>Url is the resource of dataset</td>
+    <td>Well-formatted HTML elements</td>
+     <td>Url is the resource of dataset.</td>
      <td></td>
    </tr>
    <tr>
-     <td>more difficult</td>
+     <td>Advanced</td>
+    <td></td>
      <td>Url is the resource of dataset. Require to simulate post form submit with the form data or [[User agent|user agent]]</td>
      <td>Using [[HTTP request and response data tool]] or [http://php.net/curl PHP: cURL]</td>
@@ Line 35: / Line 38: @@
    <tr>
      <td>more difficult</td>
+    <td></td>
      <td>Require to simulate the user behavior on browser such as click the button, submit the form and obtain the file finally.</td>
      <td>Using [https://www.seleniumhq.org/ Selenium] or [https://developers.google.com/web/updates/2017/04/headless-chrome Headless Chrome]</td>
    </tr>
    <tr>
-     <td>difficult</td>
+     <td>Difficult</td>
+    <td></td>
      <td>Ajax</td>
      <td></td>