Automating YouTube Tag Collection: A Web Scraping Tool for Content Creators

Web Scraping
Data Collection
Python
BeautifulSoup
Published

July 5, 2024

Introduction

This was a very interesting project, as it was created to help with a real life problem from a friend of mine! He runs some YouTube channels and, interested in enhancing SEO optimization, he was going through a process of manually going to some similar content channels, peeking in the source code of the pages and piecing together all the tags each video had added. These tags are something that are hidden in the website’s UI, so it was necessary to go to the source code to do that.

At that moment, I realized I could use my web scraping abilities to help streamline and automatize this process.


See on Github:

Creating the script

In previous projects, I had already used the BeautifulSoup library to do web scraping in similar fashions, so it was natural to use the same for this problem. Our goal was to gather the tags of a set of different URLs and group them in a similar spreadsheet, so that it could be easily analyzed. Something specific that caught our interest was the idea of seeing all the tags from all the selected videos together, and checking which ones appeared the most often. This could be done manually, but I decided to use my knowledge in data manipulation to further enhance this new process.

Using pandas, I created an additional column, concatenating all the other ones, and created another one to keep track of the count of each row. That way, the number of times each tag showed up in that column could be instantly checked. However, that raised another concern from my side.

At that moment, visualizing everything in a spreadsheet editor was very simple, but if someone wanted to utilize that data in other ways, these additional columns made it very awkward to manipulate the data in something like pandas. That being considered, I decided to give the user an option to choose not to add these columns if they don’t want to.

Conclusion

That project was a very nice challenge of using everything I’ve studied to help someone with real life problems. From that point forward, this project could be expanded to create more insightful data with more information from the page. If anyone wants to use the script, experiment with the source code or even create another version forking the project, it is available on GitHub.