13.11. BeautifulSoup with Requests¶
BeautifulSoup makes it easy to extract the data you need from an HTML or XML page. You can download and install the BeautifulSoup library from:
Information on installing BeautifulSoup with the Python Package Index tool
is available at:
We will use the
requests library to get a response object from a URL,
BeautifulSoup object from the HTML in the response, then
from the anchor (
a) tags. Anchor tags are also known as link tags.
This will find all of the ‘a’ tags and print the href for each of them.
The program reads the HTML page from “http://www.dr-chuck.com/page1.htm”, creates a BeautifulSoup object from the content of that HTML page, gets a list of the ‘a’ tags. It then loops through the list of ‘a’ tags and prints the ‘href’ attribute for it or ‘None’ if there isn’t an ‘href’ attribute.
You can use also BeautifulSoup to pull out various parts of each tag:
This will find the first ‘a’ tag and print the information for it.
html.parser is the HTML parser that is included in the standard Python 3 library.
Information on other HTML parsers is available at: