15.2. Parsing XML

Look at the code below and predict what will be printed.

Here is a simple application that parses some XML and extracts some data elements from the XML:

Run this to see what it prints.

The triple single quote ('''), as well as the triple double quote ("""), allow for the creation of strings in Python that span multiple lines.

Calling fromstring converts the string representation of the XML into a “tree” of XML elements. When the XML is in a tree, we have a series of methods we can call to extract portions of data from the XML string. The find function searches through the XML tree and retrieves the element that matches the specified tag. The get gets the value of the attribute in that tag.

Using an XML parser such as ElementTree has the advantage that while the XML in this example is quite simple, it turns out there are many rules regarding valid XML, and using ElementTree allows us to extract data from XML without worrying about the rules of XML syntax.

15.2.1. Using get to get the value of an attribute

Why do we use get to get the value of an attribute?

Look at the code below and predict what will be printed.

Run this to see what it prints.

Note

Just like with dictionaries we can use get to get the value of an attribute and if the attribute isn’t there the default is to return None.

15.2.2. Getting Data from the First Element of a Type in XML

You can use find to get the first element of the XML of a specified type. You can the use find on that element to get children tags of that element.

Run the code to see what this prints.

What do you think would happen if we looked for the first ‘author’ in tree rather than in book? Modify the code to see what happens.

15.2.3. Fixing Errors in XML

If your XML has errors, what do you think will happen?

The following XML has errors. Try to run the code first to see what happens and then fix the XML so that the code runs correctly.

The following XML has errors. Try to run the code first to see what happens and then fix the XML so that the code runs correctly.

15.2.4. Write Code to Process XML

Write code to print the book title, category, author, and year.

Write code to print the note’s to, from, body, and time (with region).

You have attempted of activities on this page