YouTube Reader

The YouTube Reader allows you to extract transcripts from YouTube videos synchronously and convert them into vector embeddings for your knowledge base.

Code

1from kern.agent import Agent
2from kern.knowledge.knowledge import Knowledge
3from kern.knowledge.reader.youtube_reader import YouTubeReader
4from kern.vectordb.pgvector import PgVector
5
6db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"
7
8# Create Knowledge Instance
9knowledge = Knowledge(
10 name="YouTube Knowledge Base",
11 description="Knowledge base from YouTube video transcripts",
12 vector_db=PgVector(
13 table_name="youtube_vectors",
14 db_url=db_url
15 ),
16)
17
18# Add YouTube video content synchronously
19knowledge.insert(
20 metadata={"source": "youtube", "type": "educational"},
21 urls=[
22 "https://www.youtube.com/watch?v=dQw4w9WgXcQ", # Replace with actual educational video
23 "https://www.youtube.com/watch?v=example123" # Replace with actual video URL
24 ],
25 reader=YouTubeReader(),
26)
27
28# Create an agent with the knowledge
29agent = Agent(
30 knowledge=knowledge,
31 search_knowledge=True,
32)
33
34# Query the knowledge base
35agent.print_response(
36 "What are the main topics discussed in the videos?",
37 markdown=True
38)

Usage

Set up your virtual environment

1uv venv --python 3.12
2source .venv/bin/activate
1uv venv --python 3.12
2.venv\Scripts\activate

Install dependencies

1uv pip install -U youtube-transcript-api pytube sqlalchemy psycopg pgvector kern-ai openai

Set environment variables

1export OPENAI_API_KEY=xxx

Run PgVector

1docker run -d \
2 -e POSTGRES_DB=ai \
3 -e POSTGRES_USER=ai \
4 -e POSTGRES_PASSWORD=ai \
5 -e PGDATA=/var/lib/postgresql/data/pgdata \
6 -v pgvolume:/var/lib/postgresql/data \
7 -p 5532:5432 \
8 --name pgvector \
9 kern/pgvector:16

Run Agent

1python examples/basics/knowledge/concepts/readers/overview/youtube_reader_sync.py
1python examples/basics/knowledge/concepts/readers/overview/youtube_reader_sync.py

Params

ParameterTypeDefaultDescription
video_urlstrRequiredURL of the YouTube video to extract transcript from