Discussion 8
Webcrawling
The contents are from Brown CS1951A.
Example
Time | Food | Calorie |
---|---|---|
breakfast | egg, milk, cereal, avocado | 600 |
lunch | chicken breast, brown rice, lettuce | 700 |
dinner | steak, sweet potato, broccoli | 800 |
The following are the source code for this table.
<table rules="groups">
<thead>
<tr>
<th style="text-align: left">Time</th>
<th style="text-align: left">Food</th>
<th style="text-align: right">Calorie</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left">breakfast</td>
<td style="text-align: left">egg, milk, cereal, avocado</td>
<td style="text-align: right">600</td>
</tr>
<tr>
<td style="text-align: left">lunch</td>
<td style="text-align: left">chicken breast, brown rice, lettuce</td>
<td style="text-align: right">700</td>
</tr>
<tr>
<td style="text-align: left">dinner</td>
<td style="text-align: left">steak, sweet potato, broccoli</td>
<td style="text-align: right">800</td>
</tr>
</tbody>
<tbody>
<tr>
<td style="text-align: left"> </td>
<td style="text-align: left"> </td>
<td style="text-align: right"> </td>
</tr>
</tbody>
</table>
Now lets’ use BeautifulSoup to parse the html.