RAG Sample: Introduction to PDF Retrieval with PostgreSQL (pgvector)
Overview
This sample demonstrates a full Rust-based Retrieval-Augmented Generation (RAG) workflow that:
- loads PDF text and generates vector embeddings for each page using Ollama,
- stores extracted page content and embeddings in PostgreSQL with the
pgvectorextension, - performs semantic vector similarity search to retrieve relevant chunks for a question,
- synthesizes a final answer using an LLM.
The PostgreSQL server and Ollama service should be running locally. The Rust workspace includes the rag_postgres module for loading and querying data.
What This Project Does
- Starts a local PostgreSQL service with Docker Compose
- Uses Ollama (
llama3.1) to generate embeddings for document chunks - Creates a
pdf_chunkstable with avector(4096)column - Inserts PDF page chunks and their embeddings into PostgreSQL
- Performs semantic search using
pgvectordistance operators - Generates a natural language answer using an LLM via Ollama
Plan
- Add a new Rust workspace member named
rag_postgres. - Add necessary dependencies, including
tokio-postgres. - Add a Docker Compose configuration for PostgreSQL with
pgvector. - Implement
rag_postgres/src/main.rswith two modes:load <path-to-pdf>: read the PDF and insert page chunks into PostgreSQLask "<question>": retrieve relevant PDF chunks from PostgreSQL
- Update documentation with instructions and sample commands.
How It Works
Loading Process
When you run the load command, the application performs the following steps:
- Text Extraction: Uses
pdf-extractto read the PDF file and split it into individual pages. - Schema Bootstrap: Connects to PostgreSQL, enables the
vectorextension, and ensurespdf_chunksexists. - Embedding Generation: For each page, sends page text to Ollama (
llama3.1) to generate a 4096-dimensional vector embedding. - Storage: Inserts source metadata, page content, and embeddings into PostgreSQL.
sequenceDiagram
participant CLI as rag_postgres load
participant PDF as PDF File
participant O as Ollama (llama3.1)
participant PG as PostgreSQL (pgvector)
CLI->>PDF: Extract text by pages
CLI->>PG: CREATE EXTENSION vector
CLI->>PG: CREATE TABLE pdf_chunks
loop For each page
CLI->>O: Get embedding for page text
O-->>CLI: 4096-D Vector
CLI->>PG: INSERT row (text + embedding)
end
Querying Process (RAG)
When you run the ask command, the application executes the RAG workflow:
- Question Embedding: Generates a vector embedding for your question using Ollama.
- Semantic Search: Queries PostgreSQL with
ORDER BY embedding <=> $1::vector LIMIT 3to get top relevant chunks. - Context Construction: Combines the retrieved text chunks into one context block.
- Answer Synthesis: Sends context and question to Ollama for grounded answer generation.
sequenceDiagram
participant CLI as rag_postgres ask
participant O as Ollama (llama3.1)
participant PG as PostgreSQL (pgvector)
CLI->>O: Get embedding for question
O-->>CLI: Question Vector
CLI->>PG: Similarity search (embedding <=> query)
PG-->>CLI: Top 3 relevant chunks
CLI->>O: Generate answer (Context + Question)
O-->>CLI: Synthesized Answer
CLI->>User: Display Answer
Setup
Start Services
- Start PostgreSQL from the repository root:
docker compose up -d postgres
- Ensure Ollama is running and has the
llama3.1model:
ollama run llama3.1
Load a PDF
cargo run -p rag_postgres -- load path/to/document.pdf
Sample PDF
Use the included sample file:
cargo run -p rag_postgres -- load data/the-tale-of-peter-rabbit.pdf
Ask a Question
cargo run -p rag_postgres -- ask "Who is Peter?"
Notes
- The sample uses PostgreSQL with the
pgvectorextension andvector(4096)embeddings. - Similarity search is done with
pgvectordistance ordering (<=>). - The final answer is generated by an LLM (
llama3.1via Ollama) using retrieved context.
File Structure
rag_postgres/
├── Cargo.toml
└── src/
└── main.rs