mdBook Instructions
This project uses mdBook to generate documentation.
Prerequisites
To build the documentation, you need to install mdbook and the mdbook-mermaid preprocessor for diagrams.
1. Install mdBook
You can install mdbook using Cargo (Rust’s package manager):
cargo install mdbook
Alternatively, you can download binaries from the mdBook releases page.
2. Install mdbook-mermaid
This project uses Mermaid for diagrams. Install the preprocessor:
cargo install mdbook-mermaid
Building the Book
The book is configured to build its output into the /docs directory at the project root.
Build Command
To generate the HTML version of the book, run the following command from the md-book directory:
mdbook build
The generated files will be placed in ../docs.
Live Development
To preview changes as you edit, you can use the serve command:
mdbook serve --open
This will start a local web server and open the book in your default browser. It automatically reloads when you save changes to the markdown files.
Project Structure
md-book/book.toml: Configuration file for the book.md-book/src/: Contains the markdown source files.md-book/src/SUMMARY.md: The table of contents for the book.docs/: The generated HTML documentation (output directory).
Rust Models
Strand-Rust-Coder-14B
- URL: https://huggingface.co/Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF
- Model: hf.co/Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF:latest
Ollama Sample: Introduction to Local LLM Integration
Welcome!
This introduction guide walks you through ollama_sample, a beginner-friendly Rust application that demonstrates how to integrate a local Ollama LLM into your Rust projects. By the end of this guide, you’ll understand how to communicate with local AI models and build interactive applications.
What You’ll Build
A command-line Rust application that:
- Prompts you to enter 5 keywords
- Sends those keywords to a local Ollama AI model
- Generates a creative joke based on those keywords
- Displays the result in your terminal
This is a great starting point for learning:
- Async Rust programming with
tokio - HTTP API calls using
reqwest - JSON serialization with
serde - Prompt engineering for AI models
System Requirements
Before you begin, ensure you have:
- Rust toolchain - Install from rustup.rs
- Ollama - Download from ollama.ai
- A language model - Download one with:
ollama pull llama3.1(orllama2)
Quick Start
Step 1: Start Ollama
Open a terminal and keep it running:
ollama serve
Step 2: Run the Application
In a new terminal, navigate to your workspace and run:
cargo run -p ollama_sample
Step 3: Enter Keywords
When prompted, type 5 keywords (one per line), for example:
Keyword 1: coffee
Keyword 2: robots
Keyword 3: pizza
Keyword 4: astronauts
Keyword 5: socks
Step 4: See the Magic
The application will generate and display a funny joke using your keywords!
How It Works Under the Hood
- User Input Collection: The app reads 5 keywords from your terminal
- Prompt Construction: Keywords are formatted into a natural language instruction
- API Request: An async HTTP POST request is sent to Ollama’s
/api/generateendpoint - Model Processing: Your local LLM processes the prompt
- Response Handling: The generated joke is displayed in your terminal
Project Structure
ollama_sample/
├── Cargo.toml # Package manifest with dependencies
└── src/
└── main.rs # Main application (async Rust code)
Key Technologies
| Technology | Purpose |
|---|---|
reqwest | Making HTTP requests to Ollama |
tokio | Async runtime for concurrent operations |
serde & serde_json | JSON serialization and deserialization |
Tips for Success
- Model Selection:
llama3.1provides better joke quality thanllama2; experiment with different models - Ollama Server: Keep the
ollama servecommand running in a separate terminal - Customization: Edit the system prompt in
main.rsto change the AI’s personality (e.g., “Act as a standup comedian”) - Error Messages: If you see connection errors, verify Ollama is running on
http://localhost:11434
What’s Next?
Once comfortable with this example, try:
- Adding more sophisticated prompts
- Experimenting with different models
- Building a web interface using the same Ollama API
- Creating multi-turn conversations
RAG Sample: Introduction to PDF Retrieval with Qdrant
Overview
This sample demonstrates a full Rust-based Retrieval-Augmented Generation (RAG) workflow that:
- loads PDF text and generates vector embeddings for each page using Ollama,
- stores the extracted page content and embeddings in Qdrant,
- performs semantic vector search to retrieve relevant chunks for a question,
- synthesizes a final answer using an LLM.
The Qdrant server and Ollama service should be running locally. The Rust workspace includes the rag_qdrant module for loading and querying data.
What This Project Does
- Starts a local Qdrant service with Docker Compose
- Uses Ollama (llama3.1) to generate embeddings for document chunks
- Inserts PDF page chunks and their embeddings into Qdrant
- Performs semantic search using Qdrant’s vector search
- Generates a natural language answer using an LLM via Ollama
Plan
- Add a new Rust workspace member named
rag_qdrant. - Add
pdf-extractanduuidto the workspace dependencies. - Create a Docker Compose configuration for Qdrant.
- Implement
rag_qdrant/src/main.rswith two modes:load <path-to-pdf>: read the PDF and insert page chunks into Qdrantask "<question>": retrieve relevant PDF chunks from Qdrant
- Update documentation with the instructions and sample commands.
How It Works
Loading Process
When you run the load command, the application performs the following steps:
- Text Extraction: Uses
pdf-extractto read the PDF file and split it into individual pages. - Indexing: Ensures a collection is defined in Qdrant with the appropriate vector size and distance metric (Cosine).
- Embedding Generation: For each page, it sends the text to Ollama (
llama3.1) to generate a 4096-dimensional vector embedding. - Storage: Stores the page text, metadata (source file, page number), and the embedding as a point in Qdrant.
sequenceDiagram
participant CLI as rag_qdrant load
participant PDF as PDF File
participant O as Ollama (llama3.1)
participant Q as Qdrant
CLI->>PDF: Extract text by pages
CLI->>Q: PUT /collections/pdf_chunks
loop For each page
CLI->>O: Get embedding for page text
O-->>CLI: 4096-D Vector
CLI->>Q: PUT /collections/pdf_chunks/points (text + embedding)
end
Querying Process (RAG)
When you run the ask command, the application executes the RAG workflow:
- Question Embedding: Generates a vector embedding for your question using Ollama.
- Semantic Search: Performs a search in Qdrant to find the top 3 most relevant text chunks based on vector distance.
- Context Construction: Combines the retrieved text chunks into a single context block.
- Answer Synthesis: Sends the context and your question to Ollama. The LLM uses the provided context to generate a factual answer.
sequenceDiagram
participant CLI as rag_qdrant ask
participant O as Ollama (llama3.1)
participant Q as Qdrant
CLI->>O: Get embedding for question
O-->>CLI: Question Vector
CLI->>Q: POST /collections/pdf_chunks/points/search
Q-->>CLI: Top 3 relevant chunks
CLI->>O: Generate answer (Context + Question)
O-->>CLI: Synthesized Answer
CLI->>User: Display Answer
Setup
Start Services
- Start Qdrant from the repository root:
docker compose up -d qdrant
-
Access the Qdrant Web UI: Open http://localhost:6333/dashboard in your browser.
-
Ensure Ollama is running and has the
llama3.1model:
ollama run llama3.1
Load a PDF
cargo run -p rag_qdrant -- load path/to/document.pdf
Sample PDF
Use the included sample file:
cargo run -p rag_qdrant -- load data/the-tale-of-peter-rabbit.pdf
Ask a Question
cargo run -p rag_qdrant -- ask "Who is Peter?"
Notes
- The sample uses vector embeddings (4096-D) for semantic retrieval.
- It leverages Qdrant’s efficient similarity search.
- The final answer is generated by an LLM (llama3.1 via Ollama) using the retrieved context.
- The collection name is
pdf_chunks.
File Structure
rag_qdrant/
├── Cargo.toml
└── src/
└── main.rs
Next Steps
- Implement chunking strategies (e.g., fixed-size with overlap) instead of page-level chunks.
- Add support for multiple PDF documents and filtered search.
- Explore hybrid search (combining full-text and vector search) for better accuracy.
RAG Sample: Introduction to PDF Retrieval with PostgreSQL (pgvector)
Overview
This sample demonstrates a full Rust-based Retrieval-Augmented Generation (RAG) workflow that:
- loads PDF text and generates vector embeddings for each page using Ollama,
- stores extracted page content and embeddings in PostgreSQL with the
pgvectorextension, - performs semantic vector similarity search to retrieve relevant chunks for a question,
- synthesizes a final answer using an LLM.
The PostgreSQL server and Ollama service should be running locally. The Rust workspace includes the rag_postgres module for loading and querying data.
What This Project Does
- Starts a local PostgreSQL service with Docker Compose
- Uses Ollama (
llama3.1) to generate embeddings for document chunks - Creates a
pdf_chunkstable with avector(4096)column - Inserts PDF page chunks and their embeddings into PostgreSQL
- Performs semantic search using
pgvectordistance operators - Generates a natural language answer using an LLM via Ollama
Plan
- Add a new Rust workspace member named
rag_postgres. - Add necessary dependencies, including
tokio-postgres. - Add a Docker Compose configuration for PostgreSQL with
pgvector. - Implement
rag_postgres/src/main.rswith two modes:load <path-to-pdf>: read the PDF and insert page chunks into PostgreSQLask "<question>": retrieve relevant PDF chunks from PostgreSQL
- Update documentation with instructions and sample commands.
How It Works
Loading Process
When you run the load command, the application performs the following steps:
- Text Extraction: Uses
pdf-extractto read the PDF file and split it into individual pages. - Schema Bootstrap: Connects to PostgreSQL, enables the
vectorextension, and ensurespdf_chunksexists. - Embedding Generation: For each page, sends page text to Ollama (
llama3.1) to generate a 4096-dimensional vector embedding. - Storage: Inserts source metadata, page content, and embeddings into PostgreSQL.
sequenceDiagram
participant CLI as rag_postgres load
participant PDF as PDF File
participant O as Ollama (llama3.1)
participant PG as PostgreSQL (pgvector)
CLI->>PDF: Extract text by pages
CLI->>PG: CREATE EXTENSION vector
CLI->>PG: CREATE TABLE pdf_chunks
loop For each page
CLI->>O: Get embedding for page text
O-->>CLI: 4096-D Vector
CLI->>PG: INSERT row (text + embedding)
end
Querying Process (RAG)
When you run the ask command, the application executes the RAG workflow:
- Question Embedding: Generates a vector embedding for your question using Ollama.
- Semantic Search: Queries PostgreSQL with
ORDER BY embedding <=> $1::vector LIMIT 3to get top relevant chunks. - Context Construction: Combines the retrieved text chunks into one context block.
- Answer Synthesis: Sends context and question to Ollama for grounded answer generation.
sequenceDiagram
participant CLI as rag_postgres ask
participant O as Ollama (llama3.1)
participant PG as PostgreSQL (pgvector)
CLI->>O: Get embedding for question
O-->>CLI: Question Vector
CLI->>PG: Similarity search (embedding <=> query)
PG-->>CLI: Top 3 relevant chunks
CLI->>O: Generate answer (Context + Question)
O-->>CLI: Synthesized Answer
CLI->>User: Display Answer
Setup
Start Services
- Start PostgreSQL from the repository root:
docker compose up -d postgres
- Ensure Ollama is running and has the
llama3.1model:
ollama run llama3.1
Load a PDF
cargo run -p rag_postgres -- load path/to/document.pdf
Sample PDF
Use the included sample file:
cargo run -p rag_postgres -- load data/the-tale-of-peter-rabbit.pdf
Ask a Question
cargo run -p rag_postgres -- ask "Who is Peter?"
Notes
- The sample uses PostgreSQL with the
pgvectorextension andvector(4096)embeddings. - Similarity search is done with
pgvectordistance ordering (<=>). - The final answer is generated by an LLM (
llama3.1via Ollama) using retrieved context.
File Structure
rag_postgres/
├── Cargo.toml
└── src/
└── main.rs
RAG Sample: Introduction to PDF Retrieval with Elasticsearch
Overview
This sample demonstrates a full Rust-based Retrieval-Augmented Generation (RAG) workflow that:
- loads PDF text and generates vector embeddings for each page using Ollama,
- stores the extracted page content and embeddings in Elasticsearch,
- performs semantic vector search (k-NN) to retrieve relevant chunks for a question,
- synthesizes a final answer using an LLM.
The Elasticsearch server and Ollama service should be running locally. The Rust workspace includes the rag_elasticsearch module for loading and querying data.
What This Project Does
- Starts a local Elasticsearch service with Docker Compose
- Uses Ollama (llama3.1) to generate embeddings for document chunks
- Inserts PDF page chunks and their embeddings into Elasticsearch
- Performs semantic search using Elasticsearch’s
dense_vectortype and k-NN search - Generates a natural language answer using an LLM via Ollama
Plan
- Add a new Rust workspace member named
rag_elasticsearch. - Add necessary dependencies to the workspace.
- Create a Docker Compose configuration for Elasticsearch.
- Implement
rag_elasticsearch/src/main.rswith two modes:load <path-to-pdf>: read the PDF and insert page chunks into Elasticsearchask "<question>": retrieve relevant PDF chunks from Elasticsearch
- Update documentation with the instructions and sample commands.
How It Works
Loading Process
When you run the load command, the application performs the following steps:
- Text Extraction: Uses
pdf-extractto read the PDF file and split it into individual pages. - Indexing: Ensures an index is defined in Elasticsearch with a
dense_vectormapping for theembeddingfield. - Embedding Generation: For each page, it sends the text to Ollama (
llama3.1) to generate a 4096-dimensional vector embedding. - Storage: Stores the page text, metadata (source file, page number), and the embedding as a document in Elasticsearch.
sequenceDiagram
participant CLI as rag_elasticsearch load
participant PDF as PDF File
participant O as Ollama (llama3.1)
participant ES as Elasticsearch
CLI->>PDF: Extract text by pages
CLI->>ES: CREATE INDEX (mappings)
loop For each page
CLI->>O: Get embedding for page text
O-->>CLI: 4096-D Vector
CLI->>ES: INDEX _doc (text + embedding)
end
Querying Process (RAG)
When you run the ask command, the application executes the RAG workflow:
- Question Embedding: Generates a vector embedding for your question using Ollama.
- Semantic Search: Performs a k-Nearest Neighbors (k-NN) search in Elasticsearch to find the top 3 most relevant text chunks based on cosine similarity.
- Context Construction: Combines the retrieved text chunks into a single context block.
- Answer Synthesis: Sends the context and your question to Ollama. The LLM uses the provided context to generate a factual answer.
sequenceDiagram
participant CLI as rag_elasticsearch ask
participant O as Ollama (llama3.1)
participant ES as Elasticsearch
CLI->>O: Get embedding for question
O-->>CLI: Question Vector
CLI->>ES: k-NN Search
ES-->>CLI: Top 3 relevant chunks
CLI->>O: Generate answer (Context + Question)
O-->>CLI: Synthesized Answer
CLI->>User: Display Answer
Setup
Start Services
- Start Elasticsearch from the repository root:
docker compose up -d elasticsearch
- Ensure Ollama is running and has the
llama3.1model:
ollama run llama3.1
Load a PDF
cargo run -p rag_elasticsearch -- load path/to/document.pdf
Sample PDF
Use the included sample file:
cargo run -p rag_elasticsearch -- load data/the-tale-of-peter-rabbit.pdf
Ask a Question
cargo run -p rag_elasticsearch -- ask "Who is Peter?"
Notes
- The sample uses
dense_vector(4096-D) for semantic retrieval in Elasticsearch. - It leverages Elasticsearch’s k-NN search capabilities.
- The final answer is generated by an LLM (llama3.1 via Ollama) using the retrieved context.
File Structure
rag_elasticsearch/
├── Cargo.toml
└── src/
└── main.rs
RAG Sample: Introduction to PDF Retrieval with SurrealDB
Overview
This sample demonstrates a full Rust-based Retrieval-Augmented Generation (RAG) workflow that:
- loads PDF text and generates vector embeddings for each page using Ollama,
- stores the extracted page content and embeddings in SurrealDB,
- performs semantic vector search to retrieve relevant chunks for a question,
- synthesizes a final answer using an LLM.
The SurrealDB server and Ollama service should be running locally. The Rust workspace includes the rag_surrealdb module for loading and querying data.
What This Project Does
- Starts a local SurrealDB service with Docker Compose
- Uses Ollama (llama3.1) to generate embeddings for document chunks
- Inserts PDF page chunks and their embeddings into SurrealDB
- Performs semantic search using SurrealDB’s vector search (HNSW index)
- Generates a natural language answer using an LLM via Ollama
Plan
- Add a new Rust workspace member named
rag_surrealdb. - Add
pdf-extractto the workspace dependencies. - Create a Docker Compose configuration for SurrealDB.
- Implement
rag_surrealdb/src/main.rswith two modes:load <path-to-pdf>: read the PDF and insert page chunks into SurrealDBask "<question>": retrieve relevant PDF chunks from SurrealDB
- Update documentation with the instructions and sample commands.
How It Works
Loading Process
When you run the load command, the application performs the following steps:
- Text Extraction: Uses
pdf-extractto read the PDF file and split it into individual pages. - Indexing: Ensures a vector index (HNSW) is defined in SurrealDB for the
embeddingfield. - Embedding Generation: For each page, it sends the text to Ollama (
llama3.1) to generate a 4096-dimensional vector embedding. - Storage: Stores the page text, metadata (source file, page number), and the embedding as a record in the
chunktable in SurrealDB.
sequenceDiagram
participant CLI as rag_surrealdb load
participant PDF as PDF File
participant O as Ollama (llama3.1)
participant S as SurrealDB
CLI->>PDF: Extract text by pages
CLI->>S: DEFINE INDEX (HNSW)
loop For each page
CLI->>O: Get embedding for page text
O-->>CLI: 4096-D Vector
CLI->>S: CREATE chunk (text + embedding)
end
Querying Process (RAG)
When you run the ask command, the application executes the RAG workflow:
- Question Embedding: Generates a vector embedding for your question using Ollama.
- Semantic Search: Performs a K-Nearest Neighbors (KNN) search in SurrealDB to find the top 3 most relevant text chunks based on vector distance.
- Context Construction: Combines the retrieved text chunks into a single context block.
- Answer Synthesis: Sends the context and your question to Ollama. The LLM uses the provided context to generate a factual answer.
sequenceDiagram
participant CLI as rag_surrealdb ask
participant O as Ollama (llama3.1)
participant S as SurrealDB
CLI->>O: Get embedding for question
O-->>CLI: Question Vector
CLI->>S: Vector Search (KNN)
S-->>CLI: Top 3 relevant chunks
CLI->>O: Generate answer (Context + Question)
O-->>CLI: Synthesized Answer
CLI->>User: Display Answer
Setup
Start Services
- Start SurrealDB from the repository root:
docker compose up -d surrealdb
- Ensure Ollama is running and has the
llama3.1model:
ollama run llama3.1
Load a PDF
cargo run -p rag_surrealdb -- load path/to/document.pdf
Sample PDF
Use the included sample file:
cargo run -p rag_surrealdb -- load data/the-tale-of-peter-rabbit.pdf
Ask a Question
cargo run -p rag_surrealdb -- ask "Who is Peter?"
Notes
- The sample uses vector embeddings (4096-D) for semantic retrieval.
- It leverages SurrealDB’s HNSW index for efficient similarity search.
- The final answer is generated by an LLM (llama3.1 via Ollama) using the retrieved context.
- The database namespace is
ragand the database name issample.
File Structure
rag_surrealdb/
├── Cargo.toml
└── src/
└── main.rs
Next Steps
- Implement chunking strategies (e.g., fixed-size with overlap) instead of page-level chunks.
- Add support for multiple PDF documents and filtered search.
- Explore hybrid search (combining full-text and vector search) for better accuracy.
Compare
Elastic Search
cargo run -p rag_elasticsearch -- load data/the-tale-of-peter-rabbit.pdf
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.55s
Running `target/debug/rag_elasticsearch load data/the-tale-of-peter-rabbit.pdf`
Extracting text from data/the-tale-of-peter-rabbit.pdf...
Creating index 'pdf_chunks' with dense_vector mapping...
Processed page 1...
Processed page 2...
Processed page 3...
Processed page 4...
Processed page 5...
Processed page 6...
Processed page 7...
Processed page 8...
Processed page 9...
Processed page 10...
Processed page 11...
Processed page 12...
Processed page 13...
Processed page 14...
Loaded all page chunks into Elasticsearch.
cargo run -p rag_elasticsearch -- ask "Who is Peter?"
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.55s
Running `target/debug/rag_elasticsearch ask 'Who is Peter?'`
Generating embedding for the question...
Searching for relevant chunks in Elasticsearch...
Top matches for: "Who is Peter?"
Result 1 (source=the-tale-of-peter-rabbit.pdf, page=5, score=0.7578453):
And then, feeling rather sick, he went to look for some parsley. But round the end of a cucumber frame, whom should he meet but Mr. McGregor!
Result 2 (source=the-tale-of-peter-rabbit.pdf, page=9, score=0.7525861):
Mr. McGregor was quite sure that Peter was somewhere in the toolshed, perhaps hidden underneath a flower-pot. He began to turn them over carefully, looking under each. Presently Peter sneezed— “Kertyschoo!” Mr. McGregor was after him in no time,
Result 3 (source=the-tale-of-peter-rabbit.pdf, page=2, score=0.713193):
“Now, my dears,” said old Mrs. Rabbit one morning, “you may go into the fields or down the lane, but don’t go into Mr. McGregor’s garden: your Father had an accident there; he was put in a pie by Mrs. McGregor.”
Generating answer with Ollama...
Answer:
According to the context, Peter refers to Peter Rabbit, a young rabbit who lives with his mother, old Mrs. Rabbit. He is the main character in the story and is known for getting into trouble by visiting Mr. McGregor's garden despite his mother's warnings.
QDRANT
cargo run -p rag_qdrant -- load data/the-tale-of-peter-rabbit.pdf
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.16s
Running `target/debug/rag_qdrant load data/the-tale-of-peter-rabbit.pdf`
Extracting text from data/the-tale-of-peter-rabbit.pdf...
Creating collection 'pdf_chunks'...
Processed page 1...
Processed page 2...
Processed page 3...
Processed page 4...
Processed page 5...
Processed page 6...
Processed page 7...
Processed page 8...
Processed page 9...
Processed page 10...
Processed page 11...
Processed page 12...
Processed page 13...
Processed page 14...
Loaded all page chunks into Qdrant.
Ask questions with: cargo run -p rag_sample -- ask "your question"
cargo run -p rag_qdrant -- ask "Who is Peter?"
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.62s
Running `target/debug/rag_qdrant ask 'Who is Peter?'`
Generating embedding for the question...
Searching for relevant chunks in Qdrant...
Top matches for: "Who is Peter?"
Result 1 (source=the-tale-of-peter-rabbit.pdf, page=5, score=0.5156903):
And then, feeling rather sick, he went to look for some parsley. But round the end of a cucumber frame, whom should he meet but Mr. McGregor!
Result 2 (source=the-tale-of-peter-rabbit.pdf, page=9, score=0.5051725):
Mr. McGregor was quite sure that Peter was somewhere in the toolshed, perhaps hidden underneath a flower-pot. He began to turn them over carefully, looking under each. Presently Peter sneezed— “Kertyschoo!” Mr. McGregor was after him in no time,
Result 3 (source=the-tale-of-peter-rabbit.pdf, page=2, score=0.42638594):
“Now, my dears,” said old Mrs. Rabbit one morning, “you may go into the fields or down the lane, but don’t go into Mr. McGregor’s garden: your Father had an accident there; he was put in a pie by Mrs. McGregor.”
Generating answer with Ollama...
Answer:
Peter is the rabbit who is the main character of the story. He is a mischievous young rabbit who has escaped from his mother's warnings to stay away from Mr. McGregor's garden, where he has gone in search of parsley.
SurrealDB
cargo run -p rag_surrealdb -- load data/the-tale-of-peter-rabbit.pdf
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.43s
Running `target/debug/rag_surrealdb load data/the-tale-of-peter-rabbit.pdf`
Extracting text from data/the-tale-of-peter-rabbit.pdf...
Loaded page 1...
Loaded page 2...
Loaded page 3...
Loaded page 4...
Loaded page 5...
Loaded page 6...
Loaded page 7...
Loaded page 8...
Loaded page 9...
Loaded page 10...
Loaded page 11...
Loaded page 12...
Loaded page 13...
Loaded page 14...
Loaded all page chunks into SurrealDB.
Ask questions with: cargo run -p rag_sample -- ask "your question"
cargo run -p rag_surrealdb -- ask "Who is Peter?"
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.62s
Running `target/debug/rag_surrealdb ask 'Who is Peter?'`
Generating embedding for the question...
Searching for relevant chunks...
Top matches for: "Who is Peter?"
Result 1 (source=the-tale-of-peter-rabbit.pdf, page=5):
And then, feeling rather sick, he went to look for some parsley. But round the end of a cucumber frame, whom should he meet but Mr. McGregor!
Result 2 (source=the-tale-of-peter-rabbit.pdf, page=9):
Mr. McGregor was quite sure that Peter was somewhere in the toolshed, perhaps hidden underneath a flower-pot. He began to turn them over carefully, looking under each. Presently Peter sneezed— “Kertyschoo!” Mr. McGregor was after him in no time,
Result 3 (source=the-tale-of-peter-rabbit.pdf, page=2):
“Now, my dears,” said old Mrs. Rabbit one morning, “you may go into the fields or down the lane, but don’t go into Mr. McGregor’s garden: your Father had an accident there; he was put in a pie by Mrs. McGregor.”
Generating answer with Ollama...
Answer:
Peter is the main character of the story, a rabbit, likely the son or child of old Mrs. Rabbit.
PostGres
cargo run -p rag_postgres -- load data/the-tale-of-peter-rabbit.pdf
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.17s
Running `target/debug/rag_postgres load data/the-tale-of-peter-rabbit.pdf`
Extracting text from data/the-tale-of-peter-rabbit.pdf...
Processed page 1...
Processed page 2...
Processed page 3...
Processed page 4...
Processed page 5...
Processed page 6...
Processed page 7...
Processed page 8...
Processed page 9...
Processed page 10...
Processed page 11...
Processed page 12...
Processed page 13...
Processed page 14...
Loaded all page chunks into PostgreSQL.
Ask questions with: cargo run -p rag_postgres -- ask "your question"
cargo run -p rag_postgres -- ask "Who is Peter?"
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.18s
Running `target/debug/rag_postgres ask 'Who is Peter?'`
Generating embedding for the question...
Searching for relevant chunks in PostgreSQL...
Top matches for: "Who is Peter?"
Result 1 (source=the-tale-of-peter-rabbit.pdf, page=5, score=0.5156903691976475):
And then, feeling rather sick, he went to look for some parsley. But round the end of a cucumber frame, whom should he meet but Mr. McGregor!
Result 2 (source=the-tale-of-peter-rabbit.pdf, page=9, score=0.50517247648173):
Mr. McGregor was quite sure that Peter was somewhere in the toolshed, perhaps hidden underneath a flower-pot. He began to turn them over carefully, looking under each. Presently Peter sneezed— “Kertyschoo!” Mr. McGregor was after him in no time,
Result 3 (source=the-tale-of-peter-rabbit.pdf, page=2, score=0.4263860820724614):
“Now, my dears,” said old Mrs. Rabbit one morning, “you may go into the fields or down the lane, but don’t go into Mr. McGregor’s garden: your Father had an accident there; he was put in a pie by Mrs. McGregor.”
Generating answer with Ollama...
Answer:
Peter is the rabbit who is the main subject of the story.
Reflections and Key Considerations
1. Vector Engine Scoring (Normalization)
One of the most significant “attention points” when comparing vector databases is how they report similarity:
- Elasticsearch (0.71 – 0.75): Uses a normalized score where 1.0 is a perfect match and 0.0 is no match. For Cosine similarity, it often applies
(1 + cosine) / 2. - Qdrant (0.50 – 0.51): These values are closer to raw cosine similarity (which ranges from -1 to 1) or a different internal normalization.
- SurrealDB: Does not display scores by default in our current implementation, but ranks correctly.
Takeaway: You cannot compare absolute scores across different engines; only the relative ranking within a single engine matters.
2. Data State and Idempotency
Previously, the results contained duplicates (e.g., Page 5 appearing twice in Qdrant). After deleting the databases and rerunning the scripts, the duplicates disappeared.
- Cause: The current scripts use random UUIDs for each load. Without an “upsert” mechanism (checking if a page’s content already exists), running the load script multiple times pollutes the database.
- Influence: Duplicates consume the limited “top-k” slots, preventing the LLM from seeing other relevant context pages.
3. Consistency of Embeddings
Across all three runtimes, the top results were consistently Page 5 and Page 9. This confirms that the embedding model (llama3.1) is stable and generates the same semantic vector regardless of which database stores it. The quality of your RAG system is primarily bounded by the quality of these embeddings.
4. LLM Non-Determinism
Even with identical context, the final answers vary:
- One engine describes Peter as “mischievous”.
- Another emphasizes his relationship with “old Mrs. Rabbit”.
- A third focuses on his “trouble in the garden”.
This variability is inherent to LLMs (unless temperature is set to 0). It highlights that the “Answer” is a synthesis, not a direct retrieval.
5. Document Preprocessing
Small differences in how text is trimmed or chunked (e.g., handling of newlines or page headers) can shift the embedding vector slightly, which might change the score or even the ranking order in more complex datasets. Consistent extraction is key to reproducible RAG.
RAG Sample: SurrealDB with Advanced Chunking
Overview
This module, rag_surrealdb_extends, builds upon the basic SurrealDB RAG sample by implementing Advanced Chunking Strategies.
While the basic version splits documents by page, this version supports:
- Fixed-Size Chunking with Overlap
- Paragraph Splitting
Key Features
- Global Text Extraction: Combines all pages of a PDF into a single text stream.
- Fixed-Size Chunks: Uses a window of 1000 characters with a 200-character overlap.
- Paragraph Chunks: Splits text by double newlines (
\n\n), preserving natural structural boundaries. - Improved Context: Different strategies allow for balancing between precise retrieval and context preservation.
How It Works
The Chunking Algorithms
1. Fixed-Size Sliding Window
The application implements a sliding window approach:
#![allow(unused)]
fn main() {
fn chunk_text_fixed(text: &str, size: usize, overlap: usize) -> Vec<String> {
// ... logic to create chunks of 'size' with 'overlap' ...
}
}
2. Paragraph Split
Splits the text into segments based on double newlines:
#![allow(unused)]
fn main() {
fn chunk_text_paragraphs(text: &str) -> Vec<String> {
text.split("\n\n")
// ... trim and filter ...
}
}
Comparison: Page-level vs. Advanced Strategies
| Feature | Page-Level (Basic) | Fixed-Size (Extended) | Paragraph (Extended) |
|---|---|---|---|
| Granularity | Coarse (Entire Page) | Fine (1000 chars) | Variable (Paragraph) |
| Context Preservation | Poor (breaks at page end) | Good (Overlap) | Excellent (Natural) |
| DB Storage | One record per page | Multiple records | Multiple records |
| Best For | Simple docs | Dense, flat text | Structured narratives |
Setup
1. Start SurrealDB
Ensure SurrealDB is running:
docker compose up -d surrealdb
2. Load a PDF (Fixed-Size Strategy)
cargo run -p rag_surrealdb_extends -- load data/the-tale-of-peter-rabbit.pdf fixed
3. Load a PDF (Paragraph Strategy)
cargo run -p rag_surrealdb_extends -- load data/the-tale-of-peter-rabbit.pdf paragraph
Note: This will use the extended database in the rag namespace.
4. Ask a Question
cargo run -p rag_surrealdb_extends -- ask "Who is Peter?"
Execution
Fixed
cargo run -p rag_surrealdb_extends -- load data/the-tale-of-peter-rabbit.pdf fixed
Compiling rag_surrealdb_extends v0.1.0 (/Users/qdart/projects/rust-ai/rag_surrealdb_extends)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 10.25s
Running `target/debug/rag_surrealdb_extends load data/the-tale-of-peter-rabbit.pdf fixed`
Extracting text from data/the-tale-of-peter-rabbit.pdf...
Using Fixed-size chunking strategy (size=1000, overlap=200)...
Split PDF into 7 chunks
Loaded chunk 5/7...
Loaded chunk 7/7...
Loaded all chunks into SurrealDB (Extended).
cargo run -p rag_surrealdb_extends -- ask "Who is Peter?"
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.77s
Running `target/debug/rag_surrealdb_extends ask 'Who is Peter?'`
Generating embedding for the question...
Searching for relevant chunks...
Top matches for: "Who is Peter?"
Result 1 (source=the-tale-of-peter-rabbit.pdf, chunk=6):
e wondered what he had done with his clothes. It was the second little jacket and pair of shoes that Peter had lost in a fortnight! I am sorry to say that Peter was not very well during the evening. His mother put him to bed, and made some camomile tea; and she gave a dose of it to Peter! “One t...
Result 2 (source=the-tale-of-peter-rabbit.pdf, chunk=5):
a wheelbarrow, and peeped over. The first thing he saw was Mr. McGregor hoeing onions. His back was turned towards Peter, and beyond him was the gate! Peter got down very quietly off the wheelbarrow, and started running as fast as he could go, along a straight walk behind some black-currant bu...
Result 3 (source=the-tale-of-peter-rabbit.pdf, chunk=0):
The Tale of Peter Rabbit Beatrix Potter Once upon a time there were four little Rabbits, and their names were— Flopsy, Mopsy, Cotton- tail, and Peter. They lived with their Mother in a sand-bank, underneath the root of a very big fir tree. “Now, my dears,” said old Mrs. Rabbit one morning, “you...
Generating answer with Ollama...
Answer:
Peter is one of the four little rabbits who live with their mother in a sand-bank under the root of a big fir tree. He is described as "very naughty" and has a tendency to get into trouble by disobeying his mother's instructions not to go into Mr. McGregor's garden.
Paragraph
cargo run -p rag_surrealdb_extends -- load data/the-tale-of-peter-rabbit.pdf paragraph
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.45s
Running `target/debug/rag_surrealdb_extends load data/the-tale-of-peter-rabbit.pdf paragraph`
Extracting text from data/the-tale-of-peter-rabbit.pdf...
Using Paragraph splitting strategy...
Split PDF into 38 chunks
Loaded chunk 5/38...
Loaded chunk 10/38...
Loaded chunk 15/38...
Loaded chunk 20/38...
Loaded chunk 25/38...
Loaded chunk 30/38...
Loaded chunk 35/38...
Loaded chunk 38/38...
Loaded all chunks into SurrealDB (Extended).
cargo run -p rag_surrealdb_extends -- ask "Who is Peter?"
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.75s
Running `target/debug/rag_surrealdb_extends ask 'Who is Peter?'`
Generating embedding for the question...
Searching for relevant chunks...
Top matches for: "Who is Peter?"
Result 1 (source=the-tale-of-peter-rabbit.pdf, chunk=34):
“One table-spoonful to be taken at bed-time.”...
Result 2 (source=the-tale-of-peter-rabbit.pdf, chunk=4):
“Now run along, and don’t get into mischief. I am going out.”...
Result 3 (source=the-tale-of-peter-rabbit.pdf, chunk=9):
And then, feeling rather sick, he went to look for some parsley....
Generating answer with Ollama...
Answer:
The question doesn't ask about Peter's characteristics or actions, but rather who Peter is. Based on the context, it appears that Peter is the subject of "the-tale-of-peter-rabbit.pdf", which suggests that Peter is likely the main character in The Tale of Peter Rabbit.
Next Steps
- Relational & Hybrid Search: Leverage SurrealDB’s relational and full-text search capabilities.
- Recursive Character Splitting: Improve chunking by splitting at natural boundaries like paragraphs and sentences first.
RAG Sample: SurrealDB with Relational & Hybrid Search
Overview
This module, rag_surrealdb_with_db, takes our SurrealDB integration to the next level by utilizing the database’s Relational (Graph) and Full-Text Search (FTS) capabilities.
Instead of just storing chunks as flat records, we now:
- Track Documents as first-class entities.
- Create Relationships (
RELATE) between documents and their chunks. - Use Hybrid Search (Vector + Full-Text) to improve retrieval accuracy.
Key Features
- Relational Data Model: Uses
SCHEMAFULLtables and graph relations to link chunks to their parent document. - Full-Text Search Indexing: Implements
FULLTEXT ANALYZERfor keyword-based retrieval. - Hybrid Retrieval: Combines results from both K-Nearest Neighbor (KNN) vector search and Full-Text Search.
- Context Labeling: The AI model is informed whether a piece of context was found via vector similarity or keyword match.
How It Works
1. Relational Schema & Indexing
We define a formal schema for documents and indexes to support relational data and hybrid search. We use a SCHEMAFULL approach for documents to ensure data integrity, while chunk remains flexible:
-- Schema for documents
DEFINE TABLE document SCHEMAFULL;
DEFINE FIELD name ON document TYPE string;
DEFINE FIELD created_at ON document TYPE datetime DEFAULT time::now();
-- Schema for chunks (records will be related to documents)
DEFINE TABLE chunk SCHEMALESS;
-- Define an analyzer for Full-Text Search
DEFINE ANALYZER ascii TOKENIZERS blank FILTERS ascii, lowercase;
-- Full-Text index on chunk text
DEFINE INDEX chunk_text ON chunk FIELDS text FULLTEXT ANALYZER ascii BM25 HIGHLIGHTS;
-- Vector index on chunk embedding
DEFINE INDEX chunk_embedding ON chunk FIELDS embedding HNSW DIMENSION 4096 DISTANCE COSINE;
-- Define the relationship (graph)
-- RELATE document:id->contains->chunk:id;
2. Hybrid Search Query
The ask command executes two types of searches in a single request to leverage both semantic and keyword matching:
-- Full-Text Search (Exact keyword matches)
SELECT *, search::score(1) AS score FROM chunk WHERE text @1@ 'question' ORDER BY score DESC;
-- Vector Search (Semantic similarity)
SELECT *, vector::distance::knn() AS distance FROM chunk WHERE embedding <|3,40|> [vector] ORDER BY distance ASC;
Setup
1. Start SurrealDB
Ensure SurrealDB is running:
docker compose up -d surrealdb
2. Initialize Database with Sample Data
You can load a sample description of Peter to test the search immediately:
cargo run -p rag_surrealdb_with_db -- init_db
3. Load a PDF
The load command now creates a document record and relates all chunks to it:
cargo run -p rag_surrealdb_with_db -- load data/the-tale-of-peter-rabbit.pdf
4. Ask a Question
The ask command will show you matches from both search methods:
cargo run -p rag_surrealdb_with_db -- ask "Who is Peter?"
Execution
cargo run -p rag_surrealdb_with_db -- init_db
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.25s
Running `target/debug/rag_surrealdb_with_db init_db`
Generating embedding for sample text...
Loaded sample data for Peter the rabbit into SurrealDB.
qdart@MacBookPro rust-ai % cargo run -p rag_surrealdb_with_db -- load data/the-tale-of-peter-rabbit.pdf
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.23s
Running `target/debug/rag_surrealdb_with_db load data/the-tale-of-peter-rabbit.pdf`
Extracting text from data/the-tale-of-peter-rabbit.pdf...
Loaded page 1...
Loaded page 2...
Loaded page 3...
Loaded page 4...
Loaded page 5...
Loaded page 6...
Loaded page 7...
Loaded page 8...
Loaded page 9...
Loaded page 10...
Loaded page 11...
Loaded page 12...
Loaded page 13...
Loaded page 14...
Loaded all page chunks into SurrealDB.
Ask questions with: cargo run -p rag_surrealdb_with_db -- ask "your question"
cargo run -p rag_surrealdb_with_db -- ask "Who is Peter?"
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.23s
Running `target/debug/rag_surrealdb_with_db ask 'Who is Peter?'`
Searching for matches using full-text search...
Generating embedding for the question...
--- Full-Text Search Matches ---
--- Vector Search Matches ---
Vector Match 1: Peter is a small, adventurous rabbit who wears a blue jacket and lives in a sand-bank under a fir tr (source: manual_entry)
Vector Match 2: And then, feeling rather sick, he went to look for some parsley.
But round the end of a cucumber fr (source: the-tale-of-peter-rabbit.pdf)
Vector Match 3: Mr. McGregor was quite sure that Peter was somewhere in the toolshed, perhaps hidden
underneath a f (source: the-tale-of-peter-rabbit.pdf)
Generating answer with Ollama...
Answer:
Peter is a rabbit.
cargo run -p rag_surrealdb_with_db -- ask "Describe Peter?"
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.24s
Running `target/debug/rag_surrealdb_with_db ask 'Describe Peter?'`
Searching for matches using full-text search...
Generating embedding for the question...
--- Full-Text Search Matches ---
--- Vector Search Matches ---
Vector Match 1: Peter is a small, adventurous rabbit who wears a blue jacket and lives in a sand-bank under a fir tr (source: manual_entry)
Vector Match 2: And then, feeling rather sick, he went to look for some parsley.
But round the end of a cucumber fr (source: the-tale-of-peter-rabbit.pdf)
Vector Match 3: Mr. McGregor was quite sure that Peter was somewhere in the toolshed, perhaps hidden
underneath a f (source: the-tale-of-peter-rabbit.pdf)
Generating answer with Ollama...
Answer:
According to the context, Peter is described as a "small, adventurous rabbit" who wears a blue jacket and lives with his family in a sand-bank under a fir tree.
Why Hybrid Search?
Vector search is great at finding “meaning” but sometimes misses specific keywords (like “fortnight” or “jacket” if they aren’t weighted heavily in the embedding). Full-text search excels at exact matches. By combining them, we get the best of both worlds.