Knowledge Content Types
Kern Knowledge uses content as the building block of any piece of knowledge.
Content can be added to knowledge from different sources.
| Content Origin | Description |
|---|---|
| Path | Local files or directories containing files |
| Url | Direct links to files or other sites |
| Text | Raw text content |
| Topic | Search topics from repositories like Arxiv or Wikipedia |
| Remote Content | Content from cloud storage providers like S3, GCS, SharePoint, GitHub, and Azure Blob |
Knowledge content needs to be read and chunked before it can be passed to any VectorDB for embedding, storage and ultimately, retrieval. When content is added to Knowledge, a default reader is selected. Readers are used to parse content from the origin and then chunk it into smaller pieces that will then be embedded by the VectorDB.
Custom readers or an override to the default reader and/or its settings can be passed when adding the content. In the below example, an instance of the standard PDFReader class is created
but we update the chunk_size. Similarly, we can update the chunking_strategy and other parameters that will influence how content is ingested and processed.
1from kern.knowledge.reader.pdf_reader import PDFReader23reader = PDFReader(4 chunk_size=1000,5)67knowledge_base = Knowledge(8 vector_db=vector_db,9)1011asyncio.run(12 knowledge_base.ainsert(13 path="data/pdf",14 reader=reader15 )16 )For more information about the different readers and their capabilities checkout the Readers page.
Next Steps
Search & Retrieval
Learn how agents search and find information in your knowledge base
Readers
Explore content parsing and ingestion options in detail
Chunking Strategies
Optimize how content is broken down for better search results
Vector Databases
Choose the right storage solution for your knowledge base