Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Compare

cargo run -p rag_elasticsearch -- load data/the-tale-of-peter-rabbit.pdf
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.55s
     Running `target/debug/rag_elasticsearch load data/the-tale-of-peter-rabbit.pdf`
Extracting text from data/the-tale-of-peter-rabbit.pdf...
Creating index 'pdf_chunks' with dense_vector mapping...
Processed page 1...
Processed page 2...
Processed page 3...
Processed page 4...
Processed page 5...
Processed page 6...
Processed page 7...
Processed page 8...
Processed page 9...
Processed page 10...
Processed page 11...
Processed page 12...
Processed page 13...
Processed page 14...
Loaded all page chunks into Elasticsearch.

cargo run -p rag_elasticsearch -- ask "Who is Peter?"
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.55s
     Running `target/debug/rag_elasticsearch ask 'Who is Peter?'`
Generating embedding for the question...
Searching for relevant chunks in Elasticsearch...
Top matches for: "Who is Peter?"

Result 1 (source=the-tale-of-peter-rabbit.pdf, page=5, score=0.7578453):
And then, feeling rather sick, he went to look for some parsley.  But round the end of a cucumber frame, whom should he meet but Mr. McGregor!

Result 2 (source=the-tale-of-peter-rabbit.pdf, page=9, score=0.7525861):
Mr. McGregor was quite sure that Peter was somewhere in the toolshed, perhaps hidden  underneath a flower-pot. He began to turn them over carefully, looking under each.  Presently Peter sneezed— “Kertyschoo!” Mr. McGregor was after him in no time,

Result 3 (source=the-tale-of-peter-rabbit.pdf, page=2, score=0.713193):
“Now, my dears,” said old Mrs. Rabbit one morning, “you may go into the fields or down the  lane, but don’t go into Mr. McGregor’s garden: your Father had an accident there; he was put in  a pie by Mrs. McGregor.” 

Generating answer with Ollama...

Answer:
According to the context, Peter refers to Peter Rabbit, a young rabbit who lives with his mother, old Mrs. Rabbit. He is the main character in the story and is known for getting into trouble by visiting Mr. McGregor's garden despite his mother's warnings.

QDRANT

cargo run -p rag_qdrant -- load data/the-tale-of-peter-rabbit.pdf
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.16s
     Running `target/debug/rag_qdrant load data/the-tale-of-peter-rabbit.pdf`
Extracting text from data/the-tale-of-peter-rabbit.pdf...
Creating collection 'pdf_chunks'...
Processed page 1...
Processed page 2...
Processed page 3...
Processed page 4...
Processed page 5...
Processed page 6...
Processed page 7...
Processed page 8...
Processed page 9...
Processed page 10...
Processed page 11...
Processed page 12...
Processed page 13...
Processed page 14...
Loaded all page chunks into Qdrant.
Ask questions with: cargo run -p rag_sample -- ask "your question"

cargo run -p rag_qdrant -- ask "Who is Peter?"
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.62s
     Running `target/debug/rag_qdrant ask 'Who is Peter?'`
Generating embedding for the question...
Searching for relevant chunks in Qdrant...
Top matches for: "Who is Peter?"

Result 1 (source=the-tale-of-peter-rabbit.pdf, page=5, score=0.5156903):
And then, feeling rather sick, he went to look for some parsley.  But round the end of a cucumber frame, whom should he meet but Mr. McGregor!

Result 2 (source=the-tale-of-peter-rabbit.pdf, page=9, score=0.5051725):
Mr. McGregor was quite sure that Peter was somewhere in the toolshed, perhaps hidden  underneath a flower-pot. He began to turn them over carefully, looking under each.  Presently Peter sneezed— “Kertyschoo!” Mr. McGregor was after him in no time,

Result 3 (source=the-tale-of-peter-rabbit.pdf, page=2, score=0.42638594):
“Now, my dears,” said old Mrs. Rabbit one morning, “you may go into the fields or down the  lane, but don’t go into Mr. McGregor’s garden: your Father had an accident there; he was put in  a pie by Mrs. McGregor.” 

Generating answer with Ollama...

Answer:
Peter is the rabbit who is the main character of the story. He is a mischievous young rabbit who has escaped from his mother's warnings to stay away from Mr. McGregor's garden, where he has gone in search of parsley.

SurrealDB

cargo run -p rag_surrealdb -- load data/the-tale-of-peter-rabbit.pdf
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.43s
     Running `target/debug/rag_surrealdb load data/the-tale-of-peter-rabbit.pdf`
Extracting text from data/the-tale-of-peter-rabbit.pdf...
Loaded page 1...
Loaded page 2...
Loaded page 3...
Loaded page 4...
Loaded page 5...
Loaded page 6...
Loaded page 7...
Loaded page 8...
Loaded page 9...
Loaded page 10...
Loaded page 11...
Loaded page 12...
Loaded page 13...
Loaded page 14...
Loaded all page chunks into SurrealDB.
Ask questions with: cargo run -p rag_sample -- ask "your question"

cargo run -p rag_surrealdb -- ask "Who is Peter?"
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.62s
     Running `target/debug/rag_surrealdb ask 'Who is Peter?'`
Generating embedding for the question...
Searching for relevant chunks...
Top matches for: "Who is Peter?"

Result 1 (source=the-tale-of-peter-rabbit.pdf, page=5):
And then, feeling rather sick, he went to look for some parsley.  But round the end of a cucumber frame, whom should he meet but Mr. McGregor!

Result 2 (source=the-tale-of-peter-rabbit.pdf, page=9):
Mr. McGregor was quite sure that Peter was somewhere in the toolshed, perhaps hidden  underneath a flower-pot. He began to turn them over carefully, looking under each.  Presently Peter sneezed— “Kertyschoo!” Mr. McGregor was after him in no time,

Result 3 (source=the-tale-of-peter-rabbit.pdf, page=2):
“Now, my dears,” said old Mrs. Rabbit one morning, “you may go into the fields or down the  lane, but don’t go into Mr. McGregor’s garden: your Father had an accident there; he was put in  a pie by Mrs. McGregor.” 

Generating answer with Ollama...

Answer:
Peter is the main character of the story, a rabbit, likely the son or child of old Mrs. Rabbit.

PostGres

cargo run -p rag_postgres -- load data/the-tale-of-peter-rabbit.pdf
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.17s
     Running `target/debug/rag_postgres load data/the-tale-of-peter-rabbit.pdf`
Extracting text from data/the-tale-of-peter-rabbit.pdf...
Processed page 1...
Processed page 2...
Processed page 3...
Processed page 4...
Processed page 5...
Processed page 6...
Processed page 7...
Processed page 8...
Processed page 9...
Processed page 10...
Processed page 11...
Processed page 12...
Processed page 13...
Processed page 14...
Loaded all page chunks into PostgreSQL.
Ask questions with: cargo run -p rag_postgres -- ask "your question"

cargo run -p rag_postgres -- ask "Who is Peter?"
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.18s
     Running `target/debug/rag_postgres ask 'Who is Peter?'`
Generating embedding for the question...
Searching for relevant chunks in PostgreSQL...
Top matches for: "Who is Peter?"

Result 1 (source=the-tale-of-peter-rabbit.pdf, page=5, score=0.5156903691976475):
And then, feeling rather sick, he went to look for some parsley.  But round the end of a cucumber frame, whom should he meet but Mr. McGregor!

Result 2 (source=the-tale-of-peter-rabbit.pdf, page=9, score=0.50517247648173):
Mr. McGregor was quite sure that Peter was somewhere in the toolshed, perhaps hidden  underneath a flower-pot. He began to turn them over carefully, looking under each.  Presently Peter sneezed— “Kertyschoo!” Mr. McGregor was after him in no time,

Result 3 (source=the-tale-of-peter-rabbit.pdf, page=2, score=0.4263860820724614):
“Now, my dears,” said old Mrs. Rabbit one morning, “you may go into the fields or down the  lane, but don’t go into Mr. McGregor’s garden: your Father had an accident there; he was put in  a pie by Mrs. McGregor.” 

Generating answer with Ollama...

Answer:
Peter is the rabbit who is the main subject of the story.

Reflections and Key Considerations

1. Vector Engine Scoring (Normalization)

One of the most significant “attention points” when comparing vector databases is how they report similarity:

  • Elasticsearch (0.71 – 0.75): Uses a normalized score where 1.0 is a perfect match and 0.0 is no match. For Cosine similarity, it often applies (1 + cosine) / 2.
  • Qdrant (0.50 – 0.51): These values are closer to raw cosine similarity (which ranges from -1 to 1) or a different internal normalization.
  • SurrealDB: Does not display scores by default in our current implementation, but ranks correctly.

Takeaway: You cannot compare absolute scores across different engines; only the relative ranking within a single engine matters.

2. Data State and Idempotency

Previously, the results contained duplicates (e.g., Page 5 appearing twice in Qdrant). After deleting the databases and rerunning the scripts, the duplicates disappeared.

  • Cause: The current scripts use random UUIDs for each load. Without an “upsert” mechanism (checking if a page’s content already exists), running the load script multiple times pollutes the database.
  • Influence: Duplicates consume the limited “top-k” slots, preventing the LLM from seeing other relevant context pages.

3. Consistency of Embeddings

Across all three runtimes, the top results were consistently Page 5 and Page 9. This confirms that the embedding model (llama3.1) is stable and generates the same semantic vector regardless of which database stores it. The quality of your RAG system is primarily bounded by the quality of these embeddings.

4. LLM Non-Determinism

Even with identical context, the final answers vary:

  • One engine describes Peter as “mischievous”.
  • Another emphasizes his relationship with “old Mrs. Rabbit”.
  • A third focuses on his “trouble in the garden”.

This variability is inherent to LLMs (unless temperature is set to 0). It highlights that the “Answer” is a synthesis, not a direct retrieval.

5. Document Preprocessing

Small differences in how text is trimmed or chunked (e.g., handling of newlines or page headers) can shift the embedding vector slightly, which might change the score or even the ranking order in more complex datasets. Consistent extraction is key to reproducible RAG.