A Python Web Scraper and Database Updater
Today, I’m keeping my blog post simple. This morning, I wrote a Python script called that does the following:
- Scraps data from my blog
- Updates my Firebase Realtime Database
It’s similar to code I’ve written in the past but not identical to anything. So today, I’ll break down every function within the script and explain:
- What the function does
- Why the function exists
Let’s do it…
initialize_app
This function gets permission to use my Firebase Realtime Database. It’s crucial because, without it, I wouldn’t be able to pull data from my database or update it.
get_last_item
This looks at every item index in my database and returns the most recent item I’ve added. I need this for the highest_id_num and recent_date functions.
highest_id_num
This uses the dictionary returned from get_last_item() and returns the value from the ID key. I need this so that I know what index to start at when I add data to the database.
recent_date
This also uses the dictionary returned from get_last_item() and returns the value from the date key. This date value is useful because when I’m scraping data from my blog, the recent_date() function tells me where to start looking.
is_date_greater
This function takes two dates as arguments and compares the two. Then it returns True or False based on whether or not the first date is greater than the second one.
update_url
This function takes the URL for my blog posts page and shortens it. I need this because when I scrap data for each blog post, one div block gives the path and page for the post’s link. Using that info with this function allows me to build the link that leads to a post.
scraper
This function scraps the data from my blog posts page and returns a dictionary with all items that have dates greater than the most recent date. Why do I need this function?
It’s obvious…
So that I have data to add to my database.
update_firebase
This is perhaps the most important function in this script. It uses a combination of all previous functions to update my Firebase Realtime Database and insert the values I want to add.