Creating a Python Algorithm to Score My Taste in Music

Have you ever wondered how “trendy” your taste in music is?

I think about it all the time, but Spotify makes it impossible to compare my playlists to others.

That’s where my Spotify Scoring Algorithm comes into play.

Using Python, a lil bit of SQL, and my magical math skills, I wrote a program that uses the following equation to score a user’s taste in music…

Here are the questions I’m answering for you in this project report:

What is the Spotify Scoring Algorithm?
What made me crazy enough to do this project?
What the heck is that mess of letters you call an equation?
What are some cool things I learned?
What challenges almost blew up this project?
What would I do differently next time?
How did the algorithm score my taste in music?

Click here if you wanna see the code I wrote on GitHub

What is the Spotify Scoring Algorithm?

I used Spotify’s API to retrieve data from my favorite personal playlist: Canciones en Español. Using the various popularity metrics Spotify gives you, I created an algorithm that scores my taste in music on a scale from 0 to 100.

What made me crazy enough to do this project?

I’m a massive fan of Latino Music and songs in Spanish. But most people I know don’t have the same musical passions as me. While browsing through Spotify’s API documentation (you know, just some light, fun reading), I discovered the popularity metric their own algorithm created.

I realized there was a way to take advantage of this data to create my own equation that finds my answer for me.

What the heck is that mess of letters you call an equation?

I could have easily just taken the 439 track popularities on my playlist, find their average, then call it a day.

Boom!

That would only take five minutes. Nice and easy.

But some songs and artists deserve to hold more or less weight in the scoring algorithm based on specific factors I found important.

Here they are:

X = some weighted multiplier to make the left side of the numerator worth more than the right side… For my final algorithm, I used X = 2 in my calculation
Y = X + 1 and is used to average the scores
t = track popularity
p = artist popularity with a scoring multiplier based on the artists’ number on a track (Artist 1 is favored most while Artist 6 is the least)
s = a scoring multiplier for p based on whether or not the artist's popularity on a given track is > or < the track's popularity
w = a scoring multiplier for p based on whether or not a song is in the artist’s top tracks or not
r = artist popularity with a scoring multiplier based on the following ratio: # of an artist’s top tracks in the playlist to the artist’s frequency in the playlist

What are some cool things I learned?

This project took me around two weeks to complete, and in case you’ve never read my other blog posts that talk about this experiment, this is my first big Python project.

Yup, that’s right.

I’ve only been learning the language for about one month. But hey, I’m a quick learner.

Anyway, here are some of the valuable skills I picked up while creating this Spotify algorithm:

How to create and manipulate JSON/CSV files
How to analyze data within Jupyter Notebooks
How to work with Python Pandas for everything Data Analytics
Why it’s important to write code documentation
How to work with APIs to retrieve data

What challenges almost blew up this project?

Since this was my first big Python project, it’s inevitable that challenges popped up out of nowhere and slapped me silly across the face.

Even though I have a long way to go before becoming a true Python master, I lied a bit when I said I’d only been learning it for one month. In college, I took a few programming classes with Python and learned a lot.

But that was 3+ years ago.

I forgot a lot over time, but the fundamentals stuck with me and made it easier to understand concepts.

One of the biggest challenges I faced came in the form of greed.

I started having fun creating the scoring equation and tried doing too much. My original plan was to standardize the track/artist popularities to make the scoring system fair and account for outliers.

But one big mistake made me its slave…

The standardization formula I used created a bias, where popularities were compared based on their standard deviations instead of their size.

Once I realized this, I scrapped the idea and focused on comparing the raw popularity scores.

What would I do differently next time?

Now that I’m free and finished with this project, part of me wants to run away and never think about Spotify again.

But that’s not happening.

Anyway part of me also wants to dive back into the project to optimize my code.

Here are a few things I would do differently if I started the project today:

Flexibility

Right now the algorithm only works on my one playlist. I haven’t created something that everyone can easily use for their own playlists. Part of the reason is that the average person doesn’t know how to access Spotify’s API to scrap data from their account

Use Artist IDs instead of Artist Names

Most functions in my algorithm use “Artist Name” values in the dictionaries they return. But I realized some artists have the same name. Since dictionaries have unique keys, I would have been safer using Artist IDs instead since they’re also unique to each artist

Learn front-end development

I wouldn’t go crazy becoming a full-on front-end developer, but I’d expand on my HTML/CSS skills and add on some Javascript knowledge to create a simple, interactive webpage that allows anyone to use my algorithm for their playlist

Wrapping Up

This project is just one of many that’ll put my programming and data science skills to the test. Starting off with a project this grand seemed impossible at first.

But finishing the project is like running a sub-four-minute mile…

Anything’s possible now.

And I almost forgot the most important part about this project….

How did the algorithm score my taste in music?

From 0 to 100, how does my algorithm score my taste in music?

Well, drumroll, please……….

74/100

Not bad?

I’m not sure. I’ve gotta run some more tests and experiment with other playlists to find where the true mean and median sit.

‍

Once more, cilck here to see my code on GitHub