Soup with bugs (debug beautiful soup)
In the web scraping with Python the best (and I only know this library) is beautiful soup.
How it works?
According to the documentation When we parsing HTML (or XML) document through Beautiful Soup, it gives us a BeautifulSoup
object, which represents the document as a nested data structure.
So it is a hierarchical structure of the document.
From the first sight everything looks good… until you tried it in practice.
Let’s say, we trying to parse the document from this WebAPI: http://ergast.com/api/f1/2019/14/results
It’s clearly visible, that results are in the <ResultsList> node and Result in it.
So we should write something like this:
soup = url_to_soup(‘http://ergast.com/api/f1/’ + str(year) + ‘/’ + str(gp_round) + ‘/results’)
for result_row in soup.find(‘ResultsList’).find_all(‘Result’):
But it fails with the error, which means that something went wrong.
Whats wrong?
To find out what’s the error we need to catch it:
import logging
….
except (AttributeError, KeyError) as ex:
logging.exception(“message”)
And now we see the error:
for result_row in soup.find(‘ResultsList’).find_all(‘Result’):
AttributeError: ‘NoneType’ object has no attribute ‘find_all’
So , soup cannot find the ResultsList node. Interesting. To see how the Beautiful soup object looks like we need to use this command:
print (soup.prettify())
And what we see?
Every tag in small letters!
Let’s change our code accordingly:
for result_row in soup.find(‘resultslist’).find_all(‘result’):
driver_Id = result_row.Driver[‘driverid’]
And now it works!