Case Study

BOP Hearing Recorder

A server-side recording system for Do Moore Good's Pardon Me Campaign, built to capture Pennsylvania Board of Pardons public hearings reliably, without putting that responsibility on any individual's personal computer or internet connection.

Django ffmpeg HLS Stream Capture Redundant Recording Google Drive Whisper Transcription

The Problem

Pennsylvania Board of Pardons hearings are public and livestreamed, but staff and legal volunteers at Do Moore Good needed a reliable recording of each one for case review and documentation. The previous approach involved people logging into the PAcast stream from their own computers and using screen recording software.

That created two problems. First, multiple people joining the same limited-capacity stream put load on the broadcast, causing buffering and quality issues for everyone watching. Second, screen recording is fragile: it requires a personal computer to stay on, unlocked, and connected for the full 4 to 6 hour hearing. A Windows update, a dead battery, or a dropped wifi connection ends the recording silently, with no warning and no backup.

Before and After

Before

Screen recording on personal laptops

  • Volunteer joins the PAcast stream as a viewer, adding to broadcast load
  • Screen recording software captures whatever is visible on-screen
  • Computer must stay on and connected for the full 4 to 6 hour hearing
  • Any interruption (update, sleep, disconnect) silently ends the recording
  • No notification if something goes wrong during the hearing
  • Files stored locally with no consistent archive
After

Server-side HLS capture, automated

  • Server pulls raw video data directly from the stream source, not as a viewer
  • Recording starts automatically at the scheduled time, no one needs to be present
  • Two parallel processes run simultaneously: if one fails, the other keeps going
  • Email alerts on start, every hour, and immediately if anything goes wrong
  • Files upload directly to a shared Google Drive folder after the hearing ends
  • Retries automatically for 30 minutes if the stream is late going live

How It Works

The recording process is fundamentally different from screen recording. Instead of capturing pixels off a display, the server pulls the raw HLS video segments directly from the broadcast source. This is the same data a browser receives when watching the stream, but saved straight to disk rather than rendered on screen.

Direct HLS Capture

yt-dlp extracts the raw HLS manifest URL from the PAcast broadcast page, then ffmpeg records the video and audio streams directly to a fragmented MP4 file. No browser, no display, no screen involved. The server is not a viewer on the stream.

Redundant Recording

Two independent ffmpeg processes run simultaneously against the same stream URL. Each writes to its own file. If one process crashes mid-hearing, the other continues uninterrupted. Both files are available for download after the hearing ends.

Scheduled and Self-Starting

Hearings are scheduled via the web UI. A cron job checks every minute for recordings due to start. If the stream is not yet live at the scheduled time, the system retries automatically every two minutes for up to 30 minutes before marking the job as failed and sending an alert.

Playable Mid-Recording

Files are saved as fragmented MP4, which writes its header at the start rather than the end. This means the recording is a valid, playable file at any moment during capture. A standard MP4 written incorrectly becomes unplayable if the process is interrupted.

Google Drive Archive

After the hearing ends, files upload to a shared Google Drive folder via service account. The upload supports both personal Drive and Google Workspace Shared Drives. After a successful upload, local files can be deleted to free server disk space.

Whisper Transcription

Recordings can be transcribed using faster-whisper running locally on the server (large-v3 model). This produces a searchable text transcript of the full hearing without sending audio to any third-party service.

System Architecture

PAcast HLS
Public livestream
yt-dlp
Extracts stream URL
Django App
Scheduler + controls
ffmpeg
x2 parallel processes
MP4 Files
Local server storage
Google Drive
Shared archive
Email Alerts
Start / hourly / error
Whisper
Local transcription

A cron job fires every minute to start any scheduled recordings whose time has arrived. Auth uses magic-link email: no passwords, no OAuth flow required.

Notification Pipeline

A hearing can run for four to six hours with no one watching it. The notification system is designed so that if something goes wrong, there is always an email that explains what happened and includes a direct link to retry.

Application Mockups

Dashboard
Disk free: 48.3 GB
Test recording + New recording
Label Status Started Duration File size Drive
View Jun 10 AM Session Recording 9:00 AM 2h 14m 4.2 GB Pending upload
View Jun 10 AM Session (backup) Recording 9:00 AM 2h 14m 4.1 GB Pending upload
View May 13 Full Day Uploaded May 13 5h 48m deleted Drive ↗
View Apr 8 PM Session Uploaded Apr 8 3h 22m deleted Drive ↗
Recording Detail
Jun 10 AM Session
Recording Started 9:00 AM · running 2h 14m

Primary process

Status Active (PID 18432)
File size4.2 GB
Outputrecording_12_jun10_am.mp4

Backup process

Status Active (PID 18433)
File size4.1 GB
Outputrecording_12_jun10_am_backup.mp4
Spot-check (last 30s) View log Stop recording
Pipeline Test

Run a short test recording against a known-good public stream before a hearing to verify the full pipeline is working.

StreamDescriptionDuration
BBC News HD Video + audio. Best all-around pipeline test. Run test
BBC World Service Audio only. Tests audio recording path. Run test
PAcast (BOP) The real stream. Only live during hearings. Run test

After a test completes, use the Spot-check button on the recording detail page to download the last 30 seconds and confirm both audio and video are present before the hearing starts.

Impact

Tech Stack

Django Python ffmpeg yt-dlp faster-whisper SQLite Google Drive API Magic Link Auth Apache + Gunicorn systemd + cron

Need a system like this?

I build practical, custom tools that remove manual dependencies and replace fragile workarounds with reliable infrastructure.

Let's talk