Github Repo: AI-powered YouTube Analysis Tool - Backend
In this post, I want to dive into how I set up the backend infrastructure for my AI-powered YouTube Analysis Tool using Flask, Celery, and Redis. If you've ever wondered how to efficiently handle long-running tasks in a web application, this reflection will walk you through my journey and share some insights I learned along the way.
When building my app, I needed a way to handle time-consuming tasks like downloading YouTube videos, transcribing audio, and analyzing transcripts without slowing down the user experience. This is why I chose Celery.
Celery is like a task manager for your app – it lets you run tasks in the background, so your web app doesn't get stuck waiting for them to finish. It integrates with Flask, a lightweight web framework that I used for building the backend. This combination allowed my app to stay responsive, even when dealing with heavy processing. When moving to the cloud, I used Google App Engine to host my Flask application and Celery workers. This allowed me to easily scale my instances as dynamically depending on traffic.
To make Celery work, you need something called a message broker. Think of it as a post office that takes your task requests (letters) and delivers them to the right worker (the person processing the letter). For my project, I chose Redis as the message broker, and here's why:
Other message brokers like RabbitMQ or Amazon SQS are also popular choices, but I chose Redis due to its speed, simplicity, and to gain experience.
Instead of managing Redis on my own in the cloud, I opted to use Google Cloud Memory Store. It’s a managed service that lets you use Redis without worrying about the details of setup and maintenance. Here’s why I went this route:
Initially, I was hit with a surprise – running the default setup for Google Cloud Memory Store was going to cost me about $70 per month. That was pretty steep for a new project, so I made some adjustments.
Reduced the Instance Size: I adjusted the Redis instance capacity to a smaller size, opting for a 1 GB capacity tier instead of something larger. This reduction alone significantly lowered the monthly cost to about $35.77/month, which is a considerable saving for a small project. This capacity matched my app's current demands, ensuring I wasn't paying for unused resources. Just for reference increasing the max capacity to 300 GB cost $3,504.00/month
Tweaked Celery Settings: By optimizing Celery's configuration, I reduced the load on Redis. Specifically, I:
While Google Cloud Memory Store with Redis is powerful, it can be pricey if you’re not careful. Starting with a smaller instance and adjusting settings based on actual usage made the most sense.
I needed a database in order to store user information for logging in and tracking the minutes each user has used. I chose to use a PostgreSQL database because it is a familiar, widely used, and easy to setup. With SQLAlchemy, I was able to quickly define a user model that fit my needs.
Once I moved to the cloud, I used Google Cloud SQL to host my PostgreSQL database. This allowed me to easily scale my database as needed and keep it separate from my application server. I opted for the smallest instance size to keep costs low.
Here’s a simple rundown of how everything ties together:
Using Celery and Flask deployed to Google App Engine, and Google Cloud Memory Store for Redis was an interesting experience. I would have loved to test the system under heavy load to see how it scaled. (a task for another day).
Overall, I learned a lot about managing background tasks in a scalable way. I also learned a lot about how to manage costs when using managed services like App Engine, Google Cloud Memory Store, and Cloud SQL.
I hope this reflection gives you some insight into how you might approach building your next application with background tasks in mind.
Will