Knowledge Content Types

Kern Knowledge uses content as the building block of any piece of knowledge. Content can be added to knowledge from different sources.

Content Origin	Description
Path	Local files or directories containing files
Url	Direct links to files or other sites
Text	Raw text content
Topic	Search topics from repositories like Arxiv or Wikipedia
Remote Content	Content from cloud storage providers like S3, GCS, SharePoint, GitHub, and Azure Blob

Knowledge content needs to be read and chunked before it can be passed to any VectorDB for embedding, storage and ultimately, retrieval. When content is added to Knowledge, a default reader is selected. Readers are used to parse content from the origin and then chunk it into smaller pieces that will then be embedded by the VectorDB.

Custom readers or an override to the default reader and/or its settings can be passed when adding the content. In the below example, an instance of the standard PDFReader class is created but we update the chunk_size. Similarly, we can update the chunking_strategy and other parameters that will influence how content is ingested and processed.

1from kern.knowledge.reader.pdf_reader import PDFReader
2
3reader = PDFReader(
4    chunk_size=1000,
5)
6
7knowledge_base = Knowledge(
8    vector_db=vector_db,
9)
10
11asyncio.run(
12        knowledge_base.ainsert(
13            path="data/pdf",
14            reader=reader
15        )
16    )

For more information about the different readers and their capabilities checkout the Readers page.

Next Steps

magnifying-glass

Search & Retrieval

Learn how agents search and find information in your knowledge base

book-open

Readers

Explore content parsing and ingestion options in detail

scissors

Chunking Strategies

Optimize how content is broken down for better search results

database

Vector Databases

Choose the right storage solution for your knowledge base