brand logo

Behind The Scenes

Architecture & Implementation

The application is written in Python using the Flask framework. We use Pandas for data processing and SQLite for local storage. The codebase is open-source and available on GitHub. It runs on a standard VPS from Vultr. The architecture is intentionally simple and relies on standard web development practices.

Data Collection

We get our data directly from the YouTube API using the googleapiclient library. Here is how we initialize the service:


from googleapiclient.discovery import build

if __name__ == "__main__":
   API_KEY = os.environ.get("YT_API_KEY")
   service = build("youtube", "v3", API_KEY)

Your screen is too small to display the code.

Source Discovery

We use the YouTube API's `search().list()` method with keywords like "Tech", "Programming", and "Computer Science" to find channels. We extract the channel IDs and use the `channels().list()` method to get detailed metadata. Here is what an entry looks like in our CSV file:


ChannelID,ChannelName,ChannelIcon,ChannelUrl,ExistedSince,SubscriberCount,VideoCount,ViewCount,Country
CsBjURrPoezykLs9EqgamOA,Fireship,https://yt3.ggpht.com/ytc/AIf8zZTUVa5AeFd3m5-4fdY2hEaKof3Byp8VruZ0f0FNEA,
https://www.youtube.com/@fireship,2017-04-07,2750000,601,364500037,US
   

Your screen is too small to display the code.

We append this data to our existing list, using pandas to remove duplicates and rows with missing data. We also manually add channels and periodically prune inactive ones. The process looks like this:


request = service.search().list(
   q="Tech | Programming | Computer Science",
   type="channel", part="id",
   maxResults=50, order="relevance", 
   relevanceLanguage="en", regionCode="US"
)

response = request.execute()
for item in response.get("items", []):
   temp_id = item["id"]["channelId"]
   searched_channels.append(temp_id)

# Other code...    

for channel in searched_channels:
   request = service.channels().list(part=["snippet", "statistics", ... ], id=channel)
   response = request.execute()

   channel_info = {
      "ChannelID": response["items"][0]["id"]["channelId"],
      "ChannelURL": response["items"][0]["snippet"]["customUrl"],
      "ChannelName": response["items"][0]["snippet"]["title"],
      "ChannelIcon": response["items"][0]["snippet"]["thumbnails"]["medium"]["url"],
      # Additional information about that channel 
   }

   channels.append(channel_info)

df = pd.DataFrame(channels)
df = pd.concat([channel_df, df], ignore_index=True)
df.drop_duplicates(inplace=True)
df.dropna(inplace=True)
df.to_csv("channels.csv", index=False)
   

Your screen is too small to display the code.

To find channels the API might miss, we also built a simple web scraper using Selenium WebDriver. It searches a random tech topic on YouTube, extracts channel URLs from the results, and follows recommended videos to find additional channels. Here is a snippet of the scraper:


driver = webdriver.Firefox(options=options)
chosen_topic = choice(search_terms)
search_terms.remove(chosen_topic)
driver.get(f"https://www.youtube.com/results?search_query={chosen_topic}")

all_recommended_channels = driver.find_elements(By.ID, "channel-thumbnail")
channels = [channel.get_attribute("href").split("@")[1] for channel in all_recommended_channels]

video_links = driver.find_elements(By.CSS_SELECTOR, "a#video-title")
choice(video_links[5:]).click()

for i in range(7):
      recommended_channel = driver.find_element(By.CSS_SELECTOR, "a.ytd-video-owner-renderer")
      recommended_video = driver.find_elements(By.TAG_NAME, "ytd-compact-video-renderer")
      choice(recommended_video[:5]).click()
      channels.append(recommended_channel.get_attribute("href").split("@")[1])
      # Converting URLs to channelIDs
	

Your screen is too small to display the code.

You can download our full database of YouTube channels below. It contains all the channel metadata we use. If you have channels you'd like to see added or removed, please contact us.

Video Indexing

We iterate through our channel list and use the `activities().list()` function to fetch recent uploads. We then use the `videos().list()` method to retrieve detailed statistics like view counts, likes, and duration. Here is a snippet:


for channel in channel_df["ChannelID"]:
   request = service.activities().list(
      part=["snippet", "id", "contentDetails"],
      publishedAfter=yesterday.isoformat() + "T00:00:00Z",
      channelId=channel, maxResults=50, fields=FIELDS,
   )

   response = request.execute()
      for item in response["items"]:
         channel_name = item["snippet"]["channelTitle"]
         channel_id = item["snippet"]["channelId"]
         video_id = item["contentDetails"]["upload"]["videoId"]
         # Additional information...

         request = service.videos().list(id=video_id, part=["statistics", "snippet", "contentDetails"])
         response = request.execute()
         
         view_count = int(response["items"][0]["statistics"]["viewCount"])
         like_count = int(response["items"][0]["statistics"]["likeCount"])
         content_rating = response["items"][0]["contentDetails"]["contentRating"]
         video_duration = isodate.parse_duration(response["items"][0]["contentDetails"]["duration"])
         # Again, remaining additional information...

Your screen is too small to display the code.

We apply a few filters to the raw data. We ignore videos that are less than 30 seconds long, have fewer than 500 views, or are not in English. We also verify they are categorized under "Science & Technology" or "Education". The filtered list is stored in a JSON file. Here is an example payload:


"ChannelName": "Fireship",
"ChannelId": "UCsBjURrPoezykLs9EqgamOA",
"ChannelIcon": "https://yt3.ggpht.com/ytc/AIf8zZTUVa5AeFd3m5-4fdY2hEaKof3Byp8VruZ0f0FNEA",
"ChannelUrl": "https://www.youtube.com/@fireship",
"VideoUrl": "https://www.youtube.com/watch?v=ky5ZB-mqZKM",
"VideoTitle": "AI influencers are getting filthy rich... let's build one",
"VideoId": "ky5ZB-mqZKM",
"PublishedDate": "2023-11-29 21:06",
"Thumbnail": "https://i.ytimg.com/vi/gGWQfV1FCis/mqdefault.jpg",
"Duration": "0:04:25",
"Definition": "HD",
"Language": "EN",
"Caption": false,
"ContentRating": false,
"ViewCount":  4091018,
"LikeCount": 156078,
"CommentCount": 5052,
"CategoryId": 28

Your screen is too small to display the code.

Scoring Algorithm

Videos are ranked using a custom scoring function. We calculate a base multiplier using metadata (duration, definition, captions) and apply it to normalized engagement metrics (views, likes, comments). This helps surface highly-engaged content regardless of the channel's total subscriber count.

QualityMultiplier = SubscriberBalance × DefinitionQuality × CaptionQuality × RatingQuality × DurationQuality

Your screen is too small to display the formula.

We normalize the engagement metrics using a logarithmic function to prevent massive view counts from skewing the results. We weight comments and likes more heavily than views, as they are stronger indicators of engagement. The final rating is calculated as follows:

Rating = (ViewRate + LikeRate + CommentRate) × QualityMultiplier

Your screen is too small to display the formula.

Once every video is scored, we sort the index. For weekly and monthly lists, we aggregate the daily videos, re-sort, and truncate to the top N results. The sorted lists are saved as JSON and served via the API and dashboard.


for lang, all_videos in videos.items():
    for time in ["daily", "weekly", "monthly", "yearly"]:
        with open(f"{time}.json", "r") as f:
            data = json.load(f)

        if time == "daily":
            top_day = OrderedDict(sorted(all_videos.items(), key=lambda item: item[0], reverse=True))
            data[lang] = OrderedDict(list(top_day.items()))

        elif time == "weekly":
            top_week = update_videos(data[lang], time)
	    top_week.update(OrderedDict(list(top_day.items())[:50]))
            top_week = sort_videos(top_week)
            data[lang] = top_week

	# Same thing for monthy & yearly videos

         with open(f"{time}.json", "w") as f:
            json.dump(data, f, indent=4)

Your screen is too small to display the code.