Soup with bugs (debug beautiful soup)

Bogdan Samoletskyi
2 min readDec 26, 2019

--

Photo by yee yek on Unsplash

In the web scraping with Python the best (and I only know this library) is beautiful soup.

How it works?

According to the documentation When we parsing HTML (or XML) document through Beautiful Soup, it gives us a BeautifulSoup object, which represents the document as a nested data structure.

So it is a hierarchical structure of the document.

From the first sight everything looks good… until you tried it in practice.

Let’s say, we trying to parse the document from this WebAPI: http://ergast.com/api/f1/2019/14/results

It’s clearly visible, that results are in the <ResultsList> node and Result in it.

So we should write something like this:

soup = url_to_soup(‘http://ergast.com/api/f1/’ + str(year) + ‘/’ + str(gp_round) + ‘/results’)
for result_row in soup.find(‘ResultsList’).find_all(‘Result’):

But it fails with the error, which means that something went wrong.

Whats wrong?

To find out what’s the error we need to catch it:

import logging
….
except (AttributeError, KeyError) as ex:
logging.exception(“message”)

And now we see the error:

for result_row in soup.find(‘ResultsList’).find_all(‘Result’):
AttributeError: ‘NoneType’ object has no attribute ‘find_all’

So , soup cannot find the ResultsList node. Interesting. To see how the Beautiful soup object looks like we need to use this command:

print (soup.prettify())

And what we see?

Every tag in small letters!

Let’s change our code accordingly:

for result_row in soup.find(‘resultslist’).find_all(‘result’):
driver_Id = result_row.Driver[‘driverid’]

And now it works!

--

--

No responses yet