B-roll Assistant

Project: March 2025 — November 2025

Role: AI / Backend Engineer

Client: PAP (Polish Press Agency)

Project description

B-roll Assistant is an intelligent AI assistant that automatically catalogs video materials and enables their instant search.

In newsrooms and video production environments, time is critical. Editors and video editors must quickly find the right B-roll, often under deadline pressure. In practice:

video materials are poorly described or not at all,
file names are ambiguous,
manual tagging is costly and unrealistic,
classic folder-based search doesn't scale.

The system:

analyzes video without user intervention,
generates snapshots and shot descriptions using AI,
combines classic search with semantic search,
learns from editor ratings.

Thanks to this, the editor doesn't need to know how the file was named — just describe what they're looking for.

How It Works

1. Smart Catalog Setup

System adapts to existing file structure
No manual organization required
Handles large B-roll archives

2. AI Shot Descriptions

Automatic snapshot generation (FFmpeg)
Visual shot analysis (AI Vision)
Text descriptions for each clip: objects, scenes, context, atmosphere
Each clip receives a rich semantic description without manual tagging

3. Dual Search & Continuous Learning

Keyword search (fast, precise queries)
Semantic search (search by meaning and concepts)

Example queries:

"glass"
"transparent container"
"tense political rally"
"crowd waiting nervously"

The system enables rating results, which:

improves accuracy,
adjusts ranking,
increases search effectiveness over time.

Architecture Overview

B-roll Assistant is a multimodal pipeline:

Video ingestion — file structure analysis, frame extraction
AI processing — vision → image description, text → embeddings
Search layer — classic indexes, vector database
Feedback loop — user ratings, result ranking correction

Designed for performance and scaling in media environments.

What I did

Designed the B-roll analysis system architecture
Implemented video frame extraction pipeline
Integrated AI Vision for shot analysis
Created automatic clip description system
Built dual search engine (keyword + semantic)
Deployed embedding-based search
Designed user feedback mechanism
Prepared backend API and data structure
Containerized the system (Docker)

Skills

Python
OpenAI (Vision + LLM)
Embeddings
Vector Database
FastAPI
PostgreSQL
FFmpeg
Docker
Semantic Search

Results

Find the right B-roll in seconds instead of minutes
No need for manual tagging
Better utilization of existing video archives
System that learns along with the editorial team
Solid foundation for: newsrooms, media houses, content platforms

Sample photos

Murawa search Zabytki search File management Drone footage Pisanie