Webcrawling


The contents are from Brown CS1951A.

Example

Time Food Calorie
breakfast egg, milk, cereal, avocado 600
lunch chicken breast, brown rice, lettuce 700
dinner steak, sweet potato, broccoli 800
     

The following are the source code for this table.

<table rules="groups">
  <thead>
    <tr>
      <th style="text-align: left">Time</th>
      <th style="text-align: left">Food</th>
      <th style="text-align: right">Calorie</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left">breakfast</td>
      <td style="text-align: left">egg, milk, cereal, avocado</td>
      <td style="text-align: right">600</td>
    </tr>
    <tr>
      <td style="text-align: left">lunch</td>
      <td style="text-align: left">chicken breast, brown rice, lettuce</td>
      <td style="text-align: right">700</td>
    </tr>
    <tr>
      <td style="text-align: left">dinner</td>
      <td style="text-align: left">steak, sweet potato, broccoli</td>
      <td style="text-align: right">800</td>
    </tr>
  </tbody>
  <tbody>
    <tr>
      <td style="text-align: left"> </td>
      <td style="text-align: left"> </td>
      <td style="text-align: right"> </td>
    </tr>
  </tbody>
</table>

Now lets’ use BeautifulSoup to parse the html.