Engineering

Index Video and Audio for AI Retrieval

Build a complete pipeline to make video and audio content searchable: transcribe with Whisper, chunk the transcript by timestamp, embed it, and retrieve matching clips by semantic query.

June 26, 2026
6 min read
Aki Wijesundara
#RAG#Whisper#Search

Key Takeaways

  • Comprehensive strategies proven to work at top companies
  • Actionable tips you can implement immediately
  • Expert insights from industry professionals

The Problem with Video and Audio Content

Video and audio are the dark matter of enterprise knowledge. A company might have hundreds of recorded team meetings, customer calls, training sessions, and product demos, with no way to find what was said in any of them. Standard search does not work on media files. You need to extract the text, index it, and retrieve it semantically.

The good news: the full pipeline covering transcription, chunking, embedding, and retrieval can be built in Python in an afternoon.

Transcribing with Whisper and Chunking by Time

OpenAI's Whisper model runs locally and produces word-level timestamps. Those timestamps are what let you retrieve a specific clip rather than just a document.

import whisper
import json

def transcribe_with_timestamps(audio_path, chunk_duration=30):
    model = whisper.load_model("base")
    result = model.transcribe(audio_path, word_timestamps=True)

    chunks = []
    for segment in result["segments"]:
        chunk_idx = int(segment["start"] // chunk_duration)
        while len(chunks) <= chunk_idx:
            start = len(chunks) * chunk_duration
            chunks.append({
                "chunk_id": len(chunks),
                "start_sec": start,
                "end_sec": start + chunk_duration,
                "text": ""
            })
        chunks[chunk_idx]["text"] += " " + segment["text"].strip()

    return [c for c in chunks if c["text"].strip()]

chunks = transcribe_with_timestamps("meeting_recording.mp4")
with open("chunks.json", "w") as f:
    json.dump(chunks, f, indent=2)
print(f"Produced {len(chunks)} chunks with timestamps")

Each chunk records its start and end time so search results link back to the exact moment in the recording, not just the file.

Embedding and Building the Index

Once you have chunks, embed each one and store it in a vector index. For a library of up to a few thousand videos, a FAISS flat index is simple and fast enough to get started.

import numpy as np
import faiss
import json, pickle
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

with open("chunks.json") as f:
    chunks = json.load(f)

texts = [c["text"] for c in chunks]
embeddings = model.encode(texts, convert_to_numpy=True)
embeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)

index = faiss.IndexFlatIP(embeddings.shape[1])
index.add(embeddings)

faiss.write_index(index, "video_index.faiss")
with open("chunks_meta.pkl", "wb") as f:
    pickle.dump(chunks, f)
print(f"Indexed {len(chunks)} chunks")

Querying for Timestamped Clips

With the index built, a semantic query returns the matching moments with timestamps you can use to deep-link into a video player or extract a clip with ffmpeg.

Prompt

"Build a query function that takes a question string, embeds it, searches the FAISS index for the top 5 matches, and returns each result with its source file name, start time in MM:SS format, and a direct timestamp URL like: https://example.com/player?t=240"

The full pipeline turns any collection of recorded meetings or training sessions into a searchable knowledge base. Teams stop re-explaining things that were covered in a call three months ago because now anyone can find it in seconds.

Want to build this live with Aki?

Join a Lightning Lesson and go deeper on this topic. Browse upcoming sessions →

A

Aki Wijesundara

Expert team of AI professionals and career advisors with experience at top tech companies. We've helped 500+ students land internships at Google, Meta, OpenAI, and other leading AI companies.

📍 Silicon Valley🎓 500+ Success Stories⭐ 98% Success Rate

Ready to Launch Your AI Career?

Join our comprehensive program and get personalized guidance from industry experts who've been where you want to go.

Share Article

Get Weekly AI Career Tips

Join 5,000+ professionals getting actionable career advice in their inbox.

No spam. Unsubscribe anytime.