Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

mdBook Instructions

This project uses mdBook to generate documentation.

Prerequisites

To build the documentation, you need to install mdbook and the mdbook-mermaid preprocessor for diagrams.

1. Install mdBook

You can install mdbook using Cargo (Rust’s package manager):

cargo install mdbook

Alternatively, you can download binaries from the mdBook releases page.

2. Install mdbook-mermaid

This project uses Mermaid for diagrams. Install the preprocessor:

cargo install mdbook-mermaid

Building the Book

The book is configured to build its output into the /docs directory at the project root.

Build Command

To generate the HTML version of the book, run the following command from the md-book directory:

mdbook build

The generated files will be placed in ../docs.

Live Development

To preview changes as you edit, you can use the serve command:

mdbook serve --open

This will start a local web server and open the book in your default browser. It automatically reloads when you save changes to the markdown files.

Project Structure

  • md-book/book.toml: Configuration file for the book.
  • md-book/src/: Contains the markdown source files.
  • md-book/src/SUMMARY.md: The table of contents for the book.
  • docs/: The generated HTML documentation (output directory).

Rust Models

Strand-Rust-Coder-14B

  • URL: https://huggingface.co/Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF
  • Model: hf.co/Fortytwo-Network/Strand-Rust-Coder-14B-v1-GGUF:latest

Ollama Sample: Introduction to Local LLM Integration

Welcome!

This introduction guide walks you through ollama_sample, a beginner-friendly Rust application that demonstrates how to integrate a local Ollama LLM into your Rust projects. By the end of this guide, you’ll understand how to communicate with local AI models and build interactive applications.

What You’ll Build

A command-line Rust application that:

  • Prompts you to enter 5 keywords
  • Sends those keywords to a local Ollama AI model
  • Generates a creative joke based on those keywords
  • Displays the result in your terminal

This is a great starting point for learning:

  • Async Rust programming with tokio
  • HTTP API calls using reqwest
  • JSON serialization with serde
  • Prompt engineering for AI models

System Requirements

Before you begin, ensure you have:

  1. Rust toolchain - Install from rustup.rs
  2. Ollama - Download from ollama.ai
  3. A language model - Download one with: ollama pull llama3.1 (or llama2)

Quick Start

Step 1: Start Ollama

Open a terminal and keep it running:

ollama serve

Step 2: Run the Application

In a new terminal, navigate to your workspace and run:

cargo run -p ollama_sample

Step 3: Enter Keywords

When prompted, type 5 keywords (one per line), for example:

Keyword 1: coffee
Keyword 2: robots
Keyword 3: pizza
Keyword 4: astronauts
Keyword 5: socks

Step 4: See the Magic

The application will generate and display a funny joke using your keywords!

How It Works Under the Hood

  1. User Input Collection: The app reads 5 keywords from your terminal
  2. Prompt Construction: Keywords are formatted into a natural language instruction
  3. API Request: An async HTTP POST request is sent to Ollama’s /api/generate endpoint
  4. Model Processing: Your local LLM processes the prompt
  5. Response Handling: The generated joke is displayed in your terminal

Project Structure

ollama_sample/
├── Cargo.toml          # Package manifest with dependencies
└── src/
    └── main.rs         # Main application (async Rust code)

Key Technologies

TechnologyPurpose
reqwestMaking HTTP requests to Ollama
tokioAsync runtime for concurrent operations
serde & serde_jsonJSON serialization and deserialization

Tips for Success

  • Model Selection: llama3.1 provides better joke quality than llama2; experiment with different models
  • Ollama Server: Keep the ollama serve command running in a separate terminal
  • Customization: Edit the system prompt in main.rs to change the AI’s personality (e.g., “Act as a standup comedian”)
  • Error Messages: If you see connection errors, verify Ollama is running on http://localhost:11434

What’s Next?

Once comfortable with this example, try:

  • Adding more sophisticated prompts
  • Experimenting with different models
  • Building a web interface using the same Ollama API
  • Creating multi-turn conversations

RAG Sample: Introduction to PDF Retrieval with Qdrant

Overview

This sample demonstrates a full Rust-based Retrieval-Augmented Generation (RAG) workflow that:

  • loads PDF text and generates vector embeddings for each page using Ollama,
  • stores the extracted page content and embeddings in Qdrant,
  • performs semantic vector search to retrieve relevant chunks for a question,
  • synthesizes a final answer using an LLM.

The Qdrant server and Ollama service should be running locally. The Rust workspace includes the rag_qdrant module for loading and querying data.

What This Project Does

  • Starts a local Qdrant service with Docker Compose
  • Uses Ollama (llama3.1) to generate embeddings for document chunks
  • Inserts PDF page chunks and their embeddings into Qdrant
  • Performs semantic search using Qdrant’s vector search
  • Generates a natural language answer using an LLM via Ollama

Plan

  1. Add a new Rust workspace member named rag_qdrant.
  2. Add pdf-extract and uuid to the workspace dependencies.
  3. Create a Docker Compose configuration for Qdrant.
  4. Implement rag_qdrant/src/main.rs with two modes:
    • load <path-to-pdf>: read the PDF and insert page chunks into Qdrant
    • ask "<question>": retrieve relevant PDF chunks from Qdrant
  5. Update documentation with the instructions and sample commands.

How It Works

Loading Process

When you run the load command, the application performs the following steps:

  1. Text Extraction: Uses pdf-extract to read the PDF file and split it into individual pages.
  2. Indexing: Ensures a collection is defined in Qdrant with the appropriate vector size and distance metric (Cosine).
  3. Embedding Generation: For each page, it sends the text to Ollama (llama3.1) to generate a 4096-dimensional vector embedding.
  4. Storage: Stores the page text, metadata (source file, page number), and the embedding as a point in Qdrant.
sequenceDiagram
    participant CLI as rag_qdrant load
    participant PDF as PDF File
    participant O as Ollama (llama3.1)
    participant Q as Qdrant

    CLI->>PDF: Extract text by pages
    CLI->>Q: PUT /collections/pdf_chunks
    loop For each page
        CLI->>O: Get embedding for page text
        O-->>CLI: 4096-D Vector
        CLI->>Q: PUT /collections/pdf_chunks/points (text + embedding)
    end

Querying Process (RAG)

When you run the ask command, the application executes the RAG workflow:

  1. Question Embedding: Generates a vector embedding for your question using Ollama.
  2. Semantic Search: Performs a search in Qdrant to find the top 3 most relevant text chunks based on vector distance.
  3. Context Construction: Combines the retrieved text chunks into a single context block.
  4. Answer Synthesis: Sends the context and your question to Ollama. The LLM uses the provided context to generate a factual answer.
sequenceDiagram
    participant CLI as rag_qdrant ask
    participant O as Ollama (llama3.1)
    participant Q as Qdrant

    CLI->>O: Get embedding for question
    O-->>CLI: Question Vector
    CLI->>Q: POST /collections/pdf_chunks/points/search
    Q-->>CLI: Top 3 relevant chunks
    CLI->>O: Generate answer (Context + Question)
    O-->>CLI: Synthesized Answer
    CLI->>User: Display Answer

Setup

Start Services

  1. Start Qdrant from the repository root:
docker compose up -d qdrant
  1. Access the Qdrant Web UI: Open http://localhost:6333/dashboard in your browser.

  2. Ensure Ollama is running and has the llama3.1 model:

ollama run llama3.1

Load a PDF

cargo run -p rag_qdrant -- load path/to/document.pdf

Sample PDF

Use the included sample file:

cargo run -p rag_qdrant -- load data/the-tale-of-peter-rabbit.pdf

Ask a Question

cargo run -p rag_qdrant -- ask "Who is Peter?"

Notes

  • The sample uses vector embeddings (4096-D) for semantic retrieval.
  • It leverages Qdrant’s efficient similarity search.
  • The final answer is generated by an LLM (llama3.1 via Ollama) using the retrieved context.
  • The collection name is pdf_chunks.

File Structure

rag_qdrant/
├── Cargo.toml
└── src/
    └── main.rs

Next Steps

  • Implement chunking strategies (e.g., fixed-size with overlap) instead of page-level chunks.
  • Add support for multiple PDF documents and filtered search.
  • Explore hybrid search (combining full-text and vector search) for better accuracy.

RAG Sample: Introduction to PDF Retrieval with PostgreSQL (pgvector)

Overview

This sample demonstrates a full Rust-based Retrieval-Augmented Generation (RAG) workflow that:

  • loads PDF text and generates vector embeddings for each page using Ollama,
  • stores extracted page content and embeddings in PostgreSQL with the pgvector extension,
  • performs semantic vector similarity search to retrieve relevant chunks for a question,
  • synthesizes a final answer using an LLM.

The PostgreSQL server and Ollama service should be running locally. The Rust workspace includes the rag_postgres module for loading and querying data.

What This Project Does

  • Starts a local PostgreSQL service with Docker Compose
  • Uses Ollama (llama3.1) to generate embeddings for document chunks
  • Creates a pdf_chunks table with a vector(4096) column
  • Inserts PDF page chunks and their embeddings into PostgreSQL
  • Performs semantic search using pgvector distance operators
  • Generates a natural language answer using an LLM via Ollama

Plan

  1. Add a new Rust workspace member named rag_postgres.
  2. Add necessary dependencies, including tokio-postgres.
  3. Add a Docker Compose configuration for PostgreSQL with pgvector.
  4. Implement rag_postgres/src/main.rs with two modes:
    • load <path-to-pdf>: read the PDF and insert page chunks into PostgreSQL
    • ask "<question>": retrieve relevant PDF chunks from PostgreSQL
  5. Update documentation with instructions and sample commands.

How It Works

Loading Process

When you run the load command, the application performs the following steps:

  1. Text Extraction: Uses pdf-extract to read the PDF file and split it into individual pages.
  2. Schema Bootstrap: Connects to PostgreSQL, enables the vector extension, and ensures pdf_chunks exists.
  3. Embedding Generation: For each page, sends page text to Ollama (llama3.1) to generate a 4096-dimensional vector embedding.
  4. Storage: Inserts source metadata, page content, and embeddings into PostgreSQL.
sequenceDiagram
    participant CLI as rag_postgres load
    participant PDF as PDF File
    participant O as Ollama (llama3.1)
    participant PG as PostgreSQL (pgvector)

    CLI->>PDF: Extract text by pages
    CLI->>PG: CREATE EXTENSION vector
    CLI->>PG: CREATE TABLE pdf_chunks
    loop For each page
        CLI->>O: Get embedding for page text
        O-->>CLI: 4096-D Vector
        CLI->>PG: INSERT row (text + embedding)
    end

Querying Process (RAG)

When you run the ask command, the application executes the RAG workflow:

  1. Question Embedding: Generates a vector embedding for your question using Ollama.
  2. Semantic Search: Queries PostgreSQL with ORDER BY embedding <=> $1::vector LIMIT 3 to get top relevant chunks.
  3. Context Construction: Combines the retrieved text chunks into one context block.
  4. Answer Synthesis: Sends context and question to Ollama for grounded answer generation.
sequenceDiagram
    participant CLI as rag_postgres ask
    participant O as Ollama (llama3.1)
    participant PG as PostgreSQL (pgvector)

    CLI->>O: Get embedding for question
    O-->>CLI: Question Vector
    CLI->>PG: Similarity search (embedding <=> query)
    PG-->>CLI: Top 3 relevant chunks
    CLI->>O: Generate answer (Context + Question)
    O-->>CLI: Synthesized Answer
    CLI->>User: Display Answer

Setup

Start Services

  1. Start PostgreSQL from the repository root:
docker compose up -d postgres
  1. Ensure Ollama is running and has the llama3.1 model:
ollama run llama3.1

Load a PDF

cargo run -p rag_postgres -- load path/to/document.pdf

Sample PDF

Use the included sample file:

cargo run -p rag_postgres -- load data/the-tale-of-peter-rabbit.pdf

Ask a Question

cargo run -p rag_postgres -- ask "Who is Peter?"

Notes

  • The sample uses PostgreSQL with the pgvector extension and vector(4096) embeddings.
  • Similarity search is done with pgvector distance ordering (<=>).
  • The final answer is generated by an LLM (llama3.1 via Ollama) using retrieved context.

File Structure

rag_postgres/
├── Cargo.toml
└── src/
    └── main.rs

RAG Sample: Introduction to PDF Retrieval with Elasticsearch

Overview

This sample demonstrates a full Rust-based Retrieval-Augmented Generation (RAG) workflow that:

  • loads PDF text and generates vector embeddings for each page using Ollama,
  • stores the extracted page content and embeddings in Elasticsearch,
  • performs semantic vector search (k-NN) to retrieve relevant chunks for a question,
  • synthesizes a final answer using an LLM.

The Elasticsearch server and Ollama service should be running locally. The Rust workspace includes the rag_elasticsearch module for loading and querying data.

What This Project Does

  • Starts a local Elasticsearch service with Docker Compose
  • Uses Ollama (llama3.1) to generate embeddings for document chunks
  • Inserts PDF page chunks and their embeddings into Elasticsearch
  • Performs semantic search using Elasticsearch’s dense_vector type and k-NN search
  • Generates a natural language answer using an LLM via Ollama

Plan

  1. Add a new Rust workspace member named rag_elasticsearch.
  2. Add necessary dependencies to the workspace.
  3. Create a Docker Compose configuration for Elasticsearch.
  4. Implement rag_elasticsearch/src/main.rs with two modes:
    • load <path-to-pdf>: read the PDF and insert page chunks into Elasticsearch
    • ask "<question>": retrieve relevant PDF chunks from Elasticsearch
  5. Update documentation with the instructions and sample commands.

How It Works

Loading Process

When you run the load command, the application performs the following steps:

  1. Text Extraction: Uses pdf-extract to read the PDF file and split it into individual pages.
  2. Indexing: Ensures an index is defined in Elasticsearch with a dense_vector mapping for the embedding field.
  3. Embedding Generation: For each page, it sends the text to Ollama (llama3.1) to generate a 4096-dimensional vector embedding.
  4. Storage: Stores the page text, metadata (source file, page number), and the embedding as a document in Elasticsearch.
sequenceDiagram
    participant CLI as rag_elasticsearch load
    participant PDF as PDF File
    participant O as Ollama (llama3.1)
    participant ES as Elasticsearch

    CLI->>PDF: Extract text by pages
    CLI->>ES: CREATE INDEX (mappings)
    loop For each page
        CLI->>O: Get embedding for page text
        O-->>CLI: 4096-D Vector
        CLI->>ES: INDEX _doc (text + embedding)
    end

Querying Process (RAG)

When you run the ask command, the application executes the RAG workflow:

  1. Question Embedding: Generates a vector embedding for your question using Ollama.
  2. Semantic Search: Performs a k-Nearest Neighbors (k-NN) search in Elasticsearch to find the top 3 most relevant text chunks based on cosine similarity.
  3. Context Construction: Combines the retrieved text chunks into a single context block.
  4. Answer Synthesis: Sends the context and your question to Ollama. The LLM uses the provided context to generate a factual answer.
sequenceDiagram
    participant CLI as rag_elasticsearch ask
    participant O as Ollama (llama3.1)
    participant ES as Elasticsearch

    CLI->>O: Get embedding for question
    O-->>CLI: Question Vector
    CLI->>ES: k-NN Search
    ES-->>CLI: Top 3 relevant chunks
    CLI->>O: Generate answer (Context + Question)
    O-->>CLI: Synthesized Answer
    CLI->>User: Display Answer

Setup

Start Services

  1. Start Elasticsearch from the repository root:
docker compose up -d elasticsearch
  1. Ensure Ollama is running and has the llama3.1 model:
ollama run llama3.1

Load a PDF

cargo run -p rag_elasticsearch -- load path/to/document.pdf

Sample PDF

Use the included sample file:

cargo run -p rag_elasticsearch -- load data/the-tale-of-peter-rabbit.pdf

Ask a Question

cargo run -p rag_elasticsearch -- ask "Who is Peter?"

Notes

  • The sample uses dense_vector (4096-D) for semantic retrieval in Elasticsearch.
  • It leverages Elasticsearch’s k-NN search capabilities.
  • The final answer is generated by an LLM (llama3.1 via Ollama) using the retrieved context.

File Structure

rag_elasticsearch/
├── Cargo.toml
└── src/
    └── main.rs

RAG Sample: Introduction to PDF Retrieval with SurrealDB

Overview

This sample demonstrates a full Rust-based Retrieval-Augmented Generation (RAG) workflow that:

  • loads PDF text and generates vector embeddings for each page using Ollama,
  • stores the extracted page content and embeddings in SurrealDB,
  • performs semantic vector search to retrieve relevant chunks for a question,
  • synthesizes a final answer using an LLM.

The SurrealDB server and Ollama service should be running locally. The Rust workspace includes the rag_surrealdb module for loading and querying data.

What This Project Does

  • Starts a local SurrealDB service with Docker Compose
  • Uses Ollama (llama3.1) to generate embeddings for document chunks
  • Inserts PDF page chunks and their embeddings into SurrealDB
  • Performs semantic search using SurrealDB’s vector search (HNSW index)
  • Generates a natural language answer using an LLM via Ollama

Plan

  1. Add a new Rust workspace member named rag_surrealdb.
  2. Add pdf-extract to the workspace dependencies.
  3. Create a Docker Compose configuration for SurrealDB.
  4. Implement rag_surrealdb/src/main.rs with two modes:
    • load <path-to-pdf>: read the PDF and insert page chunks into SurrealDB
    • ask "<question>": retrieve relevant PDF chunks from SurrealDB
  5. Update documentation with the instructions and sample commands.

How It Works

Loading Process

When you run the load command, the application performs the following steps:

  1. Text Extraction: Uses pdf-extract to read the PDF file and split it into individual pages.
  2. Indexing: Ensures a vector index (HNSW) is defined in SurrealDB for the embedding field.
  3. Embedding Generation: For each page, it sends the text to Ollama (llama3.1) to generate a 4096-dimensional vector embedding.
  4. Storage: Stores the page text, metadata (source file, page number), and the embedding as a record in the chunk table in SurrealDB.
sequenceDiagram
    participant CLI as rag_surrealdb load
    participant PDF as PDF File
    participant O as Ollama (llama3.1)
    participant S as SurrealDB

    CLI->>PDF: Extract text by pages
    CLI->>S: DEFINE INDEX (HNSW)
    loop For each page
        CLI->>O: Get embedding for page text
        O-->>CLI: 4096-D Vector
        CLI->>S: CREATE chunk (text + embedding)
    end

Querying Process (RAG)

When you run the ask command, the application executes the RAG workflow:

  1. Question Embedding: Generates a vector embedding for your question using Ollama.
  2. Semantic Search: Performs a K-Nearest Neighbors (KNN) search in SurrealDB to find the top 3 most relevant text chunks based on vector distance.
  3. Context Construction: Combines the retrieved text chunks into a single context block.
  4. Answer Synthesis: Sends the context and your question to Ollama. The LLM uses the provided context to generate a factual answer.
sequenceDiagram
    participant CLI as rag_surrealdb ask
    participant O as Ollama (llama3.1)
    participant S as SurrealDB

    CLI->>O: Get embedding for question
    O-->>CLI: Question Vector
    CLI->>S: Vector Search (KNN)
    S-->>CLI: Top 3 relevant chunks
    CLI->>O: Generate answer (Context + Question)
    O-->>CLI: Synthesized Answer
    CLI->>User: Display Answer

Setup

Start Services

  1. Start SurrealDB from the repository root:
docker compose up -d surrealdb
  1. Ensure Ollama is running and has the llama3.1 model:
ollama run llama3.1

Load a PDF

cargo run -p rag_surrealdb -- load path/to/document.pdf

Sample PDF

Use the included sample file:

cargo run -p rag_surrealdb -- load data/the-tale-of-peter-rabbit.pdf

Ask a Question

cargo run -p rag_surrealdb -- ask "Who is Peter?"

Notes

  • The sample uses vector embeddings (4096-D) for semantic retrieval.
  • It leverages SurrealDB’s HNSW index for efficient similarity search.
  • The final answer is generated by an LLM (llama3.1 via Ollama) using the retrieved context.
  • The database namespace is rag and the database name is sample.

File Structure

rag_surrealdb/
├── Cargo.toml
└── src/
    └── main.rs

Next Steps

  • Implement chunking strategies (e.g., fixed-size with overlap) instead of page-level chunks.
  • Add support for multiple PDF documents and filtered search.
  • Explore hybrid search (combining full-text and vector search) for better accuracy.

Compare

cargo run -p rag_elasticsearch -- load data/the-tale-of-peter-rabbit.pdf
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.55s
     Running `target/debug/rag_elasticsearch load data/the-tale-of-peter-rabbit.pdf`
Extracting text from data/the-tale-of-peter-rabbit.pdf...
Creating index 'pdf_chunks' with dense_vector mapping...
Processed page 1...
Processed page 2...
Processed page 3...
Processed page 4...
Processed page 5...
Processed page 6...
Processed page 7...
Processed page 8...
Processed page 9...
Processed page 10...
Processed page 11...
Processed page 12...
Processed page 13...
Processed page 14...
Loaded all page chunks into Elasticsearch.

cargo run -p rag_elasticsearch -- ask "Who is Peter?"
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.55s
     Running `target/debug/rag_elasticsearch ask 'Who is Peter?'`
Generating embedding for the question...
Searching for relevant chunks in Elasticsearch...
Top matches for: "Who is Peter?"

Result 1 (source=the-tale-of-peter-rabbit.pdf, page=5, score=0.7578453):
And then, feeling rather sick, he went to look for some parsley.  But round the end of a cucumber frame, whom should he meet but Mr. McGregor!

Result 2 (source=the-tale-of-peter-rabbit.pdf, page=9, score=0.7525861):
Mr. McGregor was quite sure that Peter was somewhere in the toolshed, perhaps hidden  underneath a flower-pot. He began to turn them over carefully, looking under each.  Presently Peter sneezed— “Kertyschoo!” Mr. McGregor was after him in no time,

Result 3 (source=the-tale-of-peter-rabbit.pdf, page=2, score=0.713193):
“Now, my dears,” said old Mrs. Rabbit one morning, “you may go into the fields or down the  lane, but don’t go into Mr. McGregor’s garden: your Father had an accident there; he was put in  a pie by Mrs. McGregor.” 

Generating answer with Ollama...

Answer:
According to the context, Peter refers to Peter Rabbit, a young rabbit who lives with his mother, old Mrs. Rabbit. He is the main character in the story and is known for getting into trouble by visiting Mr. McGregor's garden despite his mother's warnings.

QDRANT

cargo run -p rag_qdrant -- load data/the-tale-of-peter-rabbit.pdf
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.16s
     Running `target/debug/rag_qdrant load data/the-tale-of-peter-rabbit.pdf`
Extracting text from data/the-tale-of-peter-rabbit.pdf...
Creating collection 'pdf_chunks'...
Processed page 1...
Processed page 2...
Processed page 3...
Processed page 4...
Processed page 5...
Processed page 6...
Processed page 7...
Processed page 8...
Processed page 9...
Processed page 10...
Processed page 11...
Processed page 12...
Processed page 13...
Processed page 14...
Loaded all page chunks into Qdrant.
Ask questions with: cargo run -p rag_sample -- ask "your question"

cargo run -p rag_qdrant -- ask "Who is Peter?"
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.62s
     Running `target/debug/rag_qdrant ask 'Who is Peter?'`
Generating embedding for the question...
Searching for relevant chunks in Qdrant...
Top matches for: "Who is Peter?"

Result 1 (source=the-tale-of-peter-rabbit.pdf, page=5, score=0.5156903):
And then, feeling rather sick, he went to look for some parsley.  But round the end of a cucumber frame, whom should he meet but Mr. McGregor!

Result 2 (source=the-tale-of-peter-rabbit.pdf, page=9, score=0.5051725):
Mr. McGregor was quite sure that Peter was somewhere in the toolshed, perhaps hidden  underneath a flower-pot. He began to turn them over carefully, looking under each.  Presently Peter sneezed— “Kertyschoo!” Mr. McGregor was after him in no time,

Result 3 (source=the-tale-of-peter-rabbit.pdf, page=2, score=0.42638594):
“Now, my dears,” said old Mrs. Rabbit one morning, “you may go into the fields or down the  lane, but don’t go into Mr. McGregor’s garden: your Father had an accident there; he was put in  a pie by Mrs. McGregor.” 

Generating answer with Ollama...

Answer:
Peter is the rabbit who is the main character of the story. He is a mischievous young rabbit who has escaped from his mother's warnings to stay away from Mr. McGregor's garden, where he has gone in search of parsley.

SurrealDB

cargo run -p rag_surrealdb -- load data/the-tale-of-peter-rabbit.pdf
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.43s
     Running `target/debug/rag_surrealdb load data/the-tale-of-peter-rabbit.pdf`
Extracting text from data/the-tale-of-peter-rabbit.pdf...
Loaded page 1...
Loaded page 2...
Loaded page 3...
Loaded page 4...
Loaded page 5...
Loaded page 6...
Loaded page 7...
Loaded page 8...
Loaded page 9...
Loaded page 10...
Loaded page 11...
Loaded page 12...
Loaded page 13...
Loaded page 14...
Loaded all page chunks into SurrealDB.
Ask questions with: cargo run -p rag_sample -- ask "your question"

cargo run -p rag_surrealdb -- ask "Who is Peter?"
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.62s
     Running `target/debug/rag_surrealdb ask 'Who is Peter?'`
Generating embedding for the question...
Searching for relevant chunks...
Top matches for: "Who is Peter?"

Result 1 (source=the-tale-of-peter-rabbit.pdf, page=5):
And then, feeling rather sick, he went to look for some parsley.  But round the end of a cucumber frame, whom should he meet but Mr. McGregor!

Result 2 (source=the-tale-of-peter-rabbit.pdf, page=9):
Mr. McGregor was quite sure that Peter was somewhere in the toolshed, perhaps hidden  underneath a flower-pot. He began to turn them over carefully, looking under each.  Presently Peter sneezed— “Kertyschoo!” Mr. McGregor was after him in no time,

Result 3 (source=the-tale-of-peter-rabbit.pdf, page=2):
“Now, my dears,” said old Mrs. Rabbit one morning, “you may go into the fields or down the  lane, but don’t go into Mr. McGregor’s garden: your Father had an accident there; he was put in  a pie by Mrs. McGregor.” 

Generating answer with Ollama...

Answer:
Peter is the main character of the story, a rabbit, likely the son or child of old Mrs. Rabbit.

PostGres

cargo run -p rag_postgres -- load data/the-tale-of-peter-rabbit.pdf
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.17s
     Running `target/debug/rag_postgres load data/the-tale-of-peter-rabbit.pdf`
Extracting text from data/the-tale-of-peter-rabbit.pdf...
Processed page 1...
Processed page 2...
Processed page 3...
Processed page 4...
Processed page 5...
Processed page 6...
Processed page 7...
Processed page 8...
Processed page 9...
Processed page 10...
Processed page 11...
Processed page 12...
Processed page 13...
Processed page 14...
Loaded all page chunks into PostgreSQL.
Ask questions with: cargo run -p rag_postgres -- ask "your question"

cargo run -p rag_postgres -- ask "Who is Peter?"
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.18s
     Running `target/debug/rag_postgres ask 'Who is Peter?'`
Generating embedding for the question...
Searching for relevant chunks in PostgreSQL...
Top matches for: "Who is Peter?"

Result 1 (source=the-tale-of-peter-rabbit.pdf, page=5, score=0.5156903691976475):
And then, feeling rather sick, he went to look for some parsley.  But round the end of a cucumber frame, whom should he meet but Mr. McGregor!

Result 2 (source=the-tale-of-peter-rabbit.pdf, page=9, score=0.50517247648173):
Mr. McGregor was quite sure that Peter was somewhere in the toolshed, perhaps hidden  underneath a flower-pot. He began to turn them over carefully, looking under each.  Presently Peter sneezed— “Kertyschoo!” Mr. McGregor was after him in no time,

Result 3 (source=the-tale-of-peter-rabbit.pdf, page=2, score=0.4263860820724614):
“Now, my dears,” said old Mrs. Rabbit one morning, “you may go into the fields or down the  lane, but don’t go into Mr. McGregor’s garden: your Father had an accident there; he was put in  a pie by Mrs. McGregor.” 

Generating answer with Ollama...

Answer:
Peter is the rabbit who is the main subject of the story.

Reflections and Key Considerations

1. Vector Engine Scoring (Normalization)

One of the most significant “attention points” when comparing vector databases is how they report similarity:

  • Elasticsearch (0.71 – 0.75): Uses a normalized score where 1.0 is a perfect match and 0.0 is no match. For Cosine similarity, it often applies (1 + cosine) / 2.
  • Qdrant (0.50 – 0.51): These values are closer to raw cosine similarity (which ranges from -1 to 1) or a different internal normalization.
  • SurrealDB: Does not display scores by default in our current implementation, but ranks correctly.

Takeaway: You cannot compare absolute scores across different engines; only the relative ranking within a single engine matters.

2. Data State and Idempotency

Previously, the results contained duplicates (e.g., Page 5 appearing twice in Qdrant). After deleting the databases and rerunning the scripts, the duplicates disappeared.

  • Cause: The current scripts use random UUIDs for each load. Without an “upsert” mechanism (checking if a page’s content already exists), running the load script multiple times pollutes the database.
  • Influence: Duplicates consume the limited “top-k” slots, preventing the LLM from seeing other relevant context pages.

3. Consistency of Embeddings

Across all three runtimes, the top results were consistently Page 5 and Page 9. This confirms that the embedding model (llama3.1) is stable and generates the same semantic vector regardless of which database stores it. The quality of your RAG system is primarily bounded by the quality of these embeddings.

4. LLM Non-Determinism

Even with identical context, the final answers vary:

  • One engine describes Peter as “mischievous”.
  • Another emphasizes his relationship with “old Mrs. Rabbit”.
  • A third focuses on his “trouble in the garden”.

This variability is inherent to LLMs (unless temperature is set to 0). It highlights that the “Answer” is a synthesis, not a direct retrieval.

5. Document Preprocessing

Small differences in how text is trimmed or chunked (e.g., handling of newlines or page headers) can shift the embedding vector slightly, which might change the score or even the ranking order in more complex datasets. Consistent extraction is key to reproducible RAG.

RAG Sample: SurrealDB with Advanced Chunking

Overview

This module, rag_surrealdb_extends, builds upon the basic SurrealDB RAG sample by implementing Advanced Chunking Strategies.

While the basic version splits documents by page, this version supports:

  1. Fixed-Size Chunking with Overlap
  2. Paragraph Splitting

Key Features

  • Global Text Extraction: Combines all pages of a PDF into a single text stream.
  • Fixed-Size Chunks: Uses a window of 1000 characters with a 200-character overlap.
  • Paragraph Chunks: Splits text by double newlines (\n\n), preserving natural structural boundaries.
  • Improved Context: Different strategies allow for balancing between precise retrieval and context preservation.

How It Works

The Chunking Algorithms

1. Fixed-Size Sliding Window

The application implements a sliding window approach:

#![allow(unused)]
fn main() {
fn chunk_text_fixed(text: &str, size: usize, overlap: usize) -> Vec<String> {
    // ... logic to create chunks of 'size' with 'overlap' ...
}
}

2. Paragraph Split

Splits the text into segments based on double newlines:

#![allow(unused)]
fn main() {
fn chunk_text_paragraphs(text: &str) -> Vec<String> {
    text.split("\n\n")
        // ... trim and filter ...
}
}

Comparison: Page-level vs. Advanced Strategies

FeaturePage-Level (Basic)Fixed-Size (Extended)Paragraph (Extended)
GranularityCoarse (Entire Page)Fine (1000 chars)Variable (Paragraph)
Context PreservationPoor (breaks at page end)Good (Overlap)Excellent (Natural)
DB StorageOne record per pageMultiple recordsMultiple records
Best ForSimple docsDense, flat textStructured narratives

Setup

1. Start SurrealDB

Ensure SurrealDB is running:

docker compose up -d surrealdb

2. Load a PDF (Fixed-Size Strategy)

cargo run -p rag_surrealdb_extends -- load data/the-tale-of-peter-rabbit.pdf fixed

3. Load a PDF (Paragraph Strategy)

cargo run -p rag_surrealdb_extends -- load data/the-tale-of-peter-rabbit.pdf paragraph

Note: This will use the extended database in the rag namespace.

4. Ask a Question

cargo run -p rag_surrealdb_extends -- ask "Who is Peter?"

Execution

Fixed

cargo run -p rag_surrealdb_extends -- load data/the-tale-of-peter-rabbit.pdf fixed
   Compiling rag_surrealdb_extends v0.1.0 (/Users/qdart/projects/rust-ai/rag_surrealdb_extends)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 10.25s
     Running `target/debug/rag_surrealdb_extends load data/the-tale-of-peter-rabbit.pdf fixed`
Extracting text from data/the-tale-of-peter-rabbit.pdf...
Using Fixed-size chunking strategy (size=1000, overlap=200)...
Split PDF into 7 chunks
Loaded chunk 5/7...
Loaded chunk 7/7...
Loaded all chunks into SurrealDB (Extended).

cargo run -p rag_surrealdb_extends -- ask "Who is Peter?"
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.77s
     Running `target/debug/rag_surrealdb_extends ask 'Who is Peter?'`
Generating embedding for the question...
Searching for relevant chunks...
Top matches for: "Who is Peter?"

Result 1 (source=the-tale-of-peter-rabbit.pdf, chunk=6):
e wondered what he had done with his clothes. It  was the second little jacket and pair of shoes that Peter had lost in a fortnight!  I am sorry to say that Peter was not very well during the evening.  His mother put him to bed, and made some camomile tea; and she gave a dose of it to Peter!  “One t...

Result 2 (source=the-tale-of-peter-rabbit.pdf, chunk=5):
a wheelbarrow, and peeped over. The first  thing he saw was Mr. McGregor hoeing onions. His back was turned towards Peter, and beyond  him was the gate!    Peter got down very quietly off the wheelbarrow, and started running as fast as he could go,  along a straight walk behind some black-currant bu...

Result 3 (source=the-tale-of-peter-rabbit.pdf, chunk=0):
The Tale of Peter Rabbit Beatrix Potter  Once upon a time there were four little Rabbits, and their names were— Flopsy, Mopsy, Cotton- tail, and Peter.  They lived with their Mother in a sand-bank, underneath the root of a very big fir tree.    “Now, my dears,” said old Mrs. Rabbit one morning, “you...

Generating answer with Ollama...

Answer:
Peter is one of the four little rabbits who live with their mother in a sand-bank under the root of a big fir tree. He is described as "very naughty" and has a tendency to get into trouble by disobeying his mother's instructions not to go into Mr. McGregor's garden.

Paragraph

cargo run -p rag_surrealdb_extends -- load data/the-tale-of-peter-rabbit.pdf paragraph
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.45s
     Running `target/debug/rag_surrealdb_extends load data/the-tale-of-peter-rabbit.pdf paragraph`
Extracting text from data/the-tale-of-peter-rabbit.pdf...
Using Paragraph splitting strategy...
Split PDF into 38 chunks
Loaded chunk 5/38...
Loaded chunk 10/38...
Loaded chunk 15/38...
Loaded chunk 20/38...
Loaded chunk 25/38...
Loaded chunk 30/38...
Loaded chunk 35/38...
Loaded chunk 38/38...
Loaded all chunks into SurrealDB (Extended).

cargo run -p rag_surrealdb_extends -- ask "Who is Peter?"
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.75s
     Running `target/debug/rag_surrealdb_extends ask 'Who is Peter?'`
Generating embedding for the question...
Searching for relevant chunks...
Top matches for: "Who is Peter?"

Result 1 (source=the-tale-of-peter-rabbit.pdf, chunk=34):
“One table-spoonful to be taken at bed-time.”...

Result 2 (source=the-tale-of-peter-rabbit.pdf, chunk=4):
“Now run along, and don’t get into mischief. I am going out.”...

Result 3 (source=the-tale-of-peter-rabbit.pdf, chunk=9):
And then, feeling rather sick, he went to look for some parsley....

Generating answer with Ollama...

Answer:
The question doesn't ask about Peter's characteristics or actions, but rather who Peter is. Based on the context, it appears that Peter is the subject of "the-tale-of-peter-rabbit.pdf", which suggests that Peter is likely the main character in The Tale of Peter Rabbit.

Next Steps

  • Relational & Hybrid Search: Leverage SurrealDB’s relational and full-text search capabilities.
  • Recursive Character Splitting: Improve chunking by splitting at natural boundaries like paragraphs and sentences first.

RAG Sample: SurrealDB with Relational & Hybrid Search

Overview

This module, rag_surrealdb_with_db, takes our SurrealDB integration to the next level by utilizing the database’s Relational (Graph) and Full-Text Search (FTS) capabilities.

Instead of just storing chunks as flat records, we now:

  1. Track Documents as first-class entities.
  2. Create Relationships (RELATE) between documents and their chunks.
  3. Use Hybrid Search (Vector + Full-Text) to improve retrieval accuracy.

Key Features

  • Relational Data Model: Uses SCHEMAFULL tables and graph relations to link chunks to their parent document.
  • Full-Text Search Indexing: Implements FULLTEXT ANALYZER for keyword-based retrieval.
  • Hybrid Retrieval: Combines results from both K-Nearest Neighbor (KNN) vector search and Full-Text Search.
  • Context Labeling: The AI model is informed whether a piece of context was found via vector similarity or keyword match.

How It Works

1. Relational Schema & Indexing

We define a formal schema for documents and indexes to support relational data and hybrid search. We use a SCHEMAFULL approach for documents to ensure data integrity, while chunk remains flexible:

-- Schema for documents
DEFINE TABLE document SCHEMAFULL;
DEFINE FIELD name ON document TYPE string;
DEFINE FIELD created_at ON document TYPE datetime DEFAULT time::now();

-- Schema for chunks (records will be related to documents)
DEFINE TABLE chunk SCHEMALESS;

-- Define an analyzer for Full-Text Search
DEFINE ANALYZER ascii TOKENIZERS blank FILTERS ascii, lowercase;

-- Full-Text index on chunk text
DEFINE INDEX chunk_text ON chunk FIELDS text FULLTEXT ANALYZER ascii BM25 HIGHLIGHTS;

-- Vector index on chunk embedding
DEFINE INDEX chunk_embedding ON chunk FIELDS embedding HNSW DIMENSION 4096 DISTANCE COSINE;

-- Define the relationship (graph)
-- RELATE document:id->contains->chunk:id;

2. Hybrid Search Query

The ask command executes two types of searches in a single request to leverage both semantic and keyword matching:

-- Full-Text Search (Exact keyword matches)
SELECT *, search::score(1) AS score FROM chunk WHERE text @1@ 'question' ORDER BY score DESC;

-- Vector Search (Semantic similarity)
SELECT *, vector::distance::knn() AS distance FROM chunk WHERE embedding <|3,40|> [vector] ORDER BY distance ASC;

Setup

1. Start SurrealDB

Ensure SurrealDB is running:

docker compose up -d surrealdb

2. Initialize Database with Sample Data

You can load a sample description of Peter to test the search immediately:

cargo run -p rag_surrealdb_with_db -- init_db

3. Load a PDF

The load command now creates a document record and relates all chunks to it:

cargo run -p rag_surrealdb_with_db -- load data/the-tale-of-peter-rabbit.pdf

4. Ask a Question

The ask command will show you matches from both search methods:

cargo run -p rag_surrealdb_with_db -- ask "Who is Peter?"

Execution

cargo run -p rag_surrealdb_with_db -- init_db
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.25s
     Running `target/debug/rag_surrealdb_with_db init_db`
Generating embedding for sample text...
Loaded sample data for Peter the rabbit into SurrealDB.
qdart@MacBookPro rust-ai % cargo run -p rag_surrealdb_with_db -- load data/the-tale-of-peter-rabbit.pdf
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.23s
     Running `target/debug/rag_surrealdb_with_db load data/the-tale-of-peter-rabbit.pdf`
Extracting text from data/the-tale-of-peter-rabbit.pdf...
Loaded page 1...
Loaded page 2...
Loaded page 3...
Loaded page 4...
Loaded page 5...
Loaded page 6...
Loaded page 7...
Loaded page 8...
Loaded page 9...
Loaded page 10...
Loaded page 11...
Loaded page 12...
Loaded page 13...
Loaded page 14...
Loaded all page chunks into SurrealDB.
Ask questions with: cargo run -p rag_surrealdb_with_db -- ask "your question"

cargo run -p rag_surrealdb_with_db -- ask "Who is Peter?"

    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.23s
     Running `target/debug/rag_surrealdb_with_db ask 'Who is Peter?'`
Searching for matches using full-text search...
Generating embedding for the question...

--- Full-Text Search Matches ---

--- Vector Search Matches ---
Vector Match 1: Peter is a small, adventurous rabbit who wears a blue jacket and lives in a sand-bank under a fir tr (source: manual_entry)
Vector Match 2: And then, feeling rather sick, he went to look for some parsley.

But round the end of a cucumber fr (source: the-tale-of-peter-rabbit.pdf)
Vector Match 3: Mr. McGregor was quite sure that Peter was somewhere in the toolshed, perhaps hidden 
underneath a f (source: the-tale-of-peter-rabbit.pdf)

Generating answer with Ollama...

Answer:
Peter is a rabbit.
cargo run -p rag_surrealdb_with_db -- ask "Describe Peter?"

    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.24s
     Running `target/debug/rag_surrealdb_with_db ask 'Describe Peter?'`
Searching for matches using full-text search...
Generating embedding for the question...

--- Full-Text Search Matches ---

--- Vector Search Matches ---
Vector Match 1: Peter is a small, adventurous rabbit who wears a blue jacket and lives in a sand-bank under a fir tr (source: manual_entry)
Vector Match 2: And then, feeling rather sick, he went to look for some parsley.

But round the end of a cucumber fr (source: the-tale-of-peter-rabbit.pdf)
Vector Match 3: Mr. McGregor was quite sure that Peter was somewhere in the toolshed, perhaps hidden 
underneath a f (source: the-tale-of-peter-rabbit.pdf)

Generating answer with Ollama...

Answer:
According to the context, Peter is described as a "small, adventurous rabbit" who wears a blue jacket and lives with his family in a sand-bank under a fir tree.

Vector search is great at finding “meaning” but sometimes misses specific keywords (like “fortnight” or “jacket” if they aren’t weighted heavily in the embedding). Full-text search excels at exact matches. By combining them, we get the best of both worlds.