Snarf YouTube videos off

Published: Sunday, Mar 21, 2021
Last modified: Friday, Jul 29, 2022 (3892cd0)
Tags: Computing Web is a very cool virtual conference platform. You pick an avatar, can voice chat, video chat, and watch presentations all while standing around in a virtual 2D space. A conference which I attended recently had all the talks pre-recorded and played back at live-time. This worked very well because the presenters would take their time and provide the best possible content. Within the virtual conference space, there was a way to play the videos back through moving your avatar into a specific location that is different for each video.

Figure 1: Screenshot of a YouTube Embed in credit

Figure 1: Screenshot of a YouTube Embed in credit

§Why download the videos?

While I could view the video online, it’s a lot nicer to use a local video player with simpler controls and without a required internet connection. Also while I did say is pretty cool, I am unsure what promises are made about the lifetime of the event or how it exactly identifies my user and how long my user is valid. It would be nice to download the videos for local consumption.

§Grabbing the video URLs

The video appears to be a YouTube embed in the corner of the screen. Through some poking around in Chrome Dev Tools I found an <iframe> for the embed. According to MDN, there is a way to watch the DOM for element insertions, so I tried it out. It turns out I wanted in particular the childList mutation records found via the MutationObserver. Using the MutationObserver, I was able to fire a callback, check for the presence of an <iframe>, then extract its src attribute, regex out the YouTube video ID, and print out YouTube video URLs for the (unlisted) content:

let cb1 = function(mutationsList, observer) {
    const es = document.getElementsByTagName('iframe');
    if (es.length == 0) return;
    const a = es[0];
    const src = a.src;
    if (src === null) return;
    const m = src.match(/\/embed\/([^/?]+)/)[1];
let observer2 = new MutationObserver(cb1);
observer2.observe(root, {attributes: false, childList: true, subtree: true });

This is the mechanism, now to trigger it: just trigger the YouTube Embeds to be added to the DOM. In this case that means walking into spaces in the virtual conference room that trigger video playback.

§Getting the URLs out of developer console

This is kind of goofy on my part, I just copy-pasted the entire content, but all the other log data was included such as file-names and line numbers, as shown in this text snippet:

instrument.ts:129 initializing wss://
3XuSvZIqx04:1 GET 404
instrument.ts:129 consume-set-spatial response error: no such consumer
(anonymous) @ instrument.ts:129
console.error @ 11.bundle.js:1
(anonymous) @ 11.bundle.js:1
Promise.then (async)
(anonymous) @ 11.bundle.js:1
step @ 11.bundle.js:1
(anonymous) @ 11.bundle.js:1
(anonymous) @ 11.bundle.js:1
oe @ 11.bundle.js:1
SFUClient.setMaxSpatialLayerConsume @ 11.bundle.js:1
4instrument.ts:129 consume-set-spatial response error: no such consumer
(anonymous) @ instrument.ts:129

I first ran in Emacs M-x keep-lines RET RET to delete all the lines that do not contain a YouTube video URL. Then I manually removed the other noise, using rectangle selection because a there were rows and columns could be deleted quickly this way (Enter rectangular selection with C-x space). Finally I sorted the buffer using C-x h M-x sort-lines RET, then deleted duplicate lines using M-x delete-duplicate-lines RET. I now had a list of YouTube videos.

All that was left to do is run: xargs youtube-dl -f best < list.txt.

Pretty simple eh? I guess browser extensions and plugins are optional.