Mar 18, 2026

Part 3.2: Understanding Vector Stores for RAG

Series: AI Agents & Applications with LangChain, LangGraph and MCP
Part: 3.2 — Understanding Vector Stores for RAG

🌐

Switch language / Đổi ngôn ngữ

In Part 3.1, you learned about the RAG design pattern and its two stages: content ingestion (indexing) and question-answering (retrieval and generation). Both stages rely heavily on one critical component: the vector store.

Understanding how vector stores work is essential for building effective RAG systems. In this part, you’ll explore:

What vector stores are and why they’re different from traditional databases
How vector stores enable semantic similarity search
The evolution of vector stores from image recognition to text-based RAG
Popular vector store options and how to choose the right one
Setting up your first vector store with ChromaDB

By the end of this part, you’ll have a solid foundation for selecting, configuring, and using vector stores in your RAG implementations.

Let’s start by understanding what vector stores are and why they’re so important for semantic search.

The evolution of vector stores from image recognition to text-based RAG
Popular vector store options and how to choose the right one
Setting up your first vector store with ChromaDB

By the end of this part, you’ll have a solid foundation for selecting, configuring, and using vector stores in your RAG implementations.

Let’s start by understanding what vector stores are and why they’re so important for semantic search.

What’s a Vector Store?

A vector store (also called a vector database) is a storage system designed to efficiently store and query high-dimensional vectors. While that sounds technical, the concept is straightforward once you understand the role of vectors in AI applications.

Why Vectors Matter in AI

Vectors are fundamental to AI because embeddings—numerical representations of text, images, sounds, or videos—are built using them. Here’s the key insight:

Embeddings are vectors that capture the meaning of content in their dimensions.

When you convert text into an embedding using a model like OpenAI’s text-embedding-ada-002, you get a vector with 1,536 dimensions. Each dimension represents some aspect of the text’s semantic meaning. Similar texts produce similar vectors.

What Vector Stores Do

The main use of vector stores in LLM and machine learning applications is to store embeddings that act as indexes for text chunks (or chunks of video, image, or audio).

Unlike traditional databases that store rows and columns or documents and fields, vector stores:

Store high-dimensional vectors (embeddings) alongside the original content
Enable similarity searches that find semantically related content
Use specialized algorithms (like ANN - Approximate Nearest Neighbor) for fast retrieval
Scale efficiently to millions or billions of vectors

How Searches Work in Vector Stores

Searches in vector stores are similarity searches, which measure the distance between:

The embedding of the query (the user’s question)
The embeddings of stored chunks (indexed document fragments)

The result is either:

The closest vector (single result)
A list of the closest vectors (top-k results, ranked by similarity)

This semantic similarity reflects how close the meanings of the text chunks are—not just whether they share keywords.

Example: Semantic vs. Keyword Search

Keyword Search (Traditional Database):

Query: “feline animals”
Matches: Documents containing exactly “feline” or “animals”
Misses: Documents about “cats,” “lions,” “tigers” if they don’t use the word “feline”

Semantic Search (Vector Store):

Query: “feline animals” (converted to embedding)
Matches: Documents about cats, lions, tigers, leopards—even without the word “feline”
How: The embedding of “feline animals” is mathematically close to embeddings of cat-related content

Key Characteristics of Vector Stores

Vector stores are optimized for a specific use case that traditional databases don’t handle well:

Feature	Traditional Database	Vector Store
Data Type	Structured rows/columns or documents	High-dimensional vectors + metadata
Search Method	Exact match, keyword search, SQL queries	Similarity search, nearest neighbor
Primary Use	CRUD operations, transactions	Semantic search, recommendation systems
Speed Optimization	Indexes on fields, query optimization	ANN algorithms (HNSW, IVF), vector indexes
Scalability Focus	Rows/documents	Vector dimensions and volume

How Do Vector Stores Work?

Understanding how vector stores perform similarity searches helps you appreciate their power and limitations. Let’s explore the mechanics.

Vector Distance Calculations

Vector stores use distance metrics to measure how similar two vectors are. The most common functions include:

1. Euclidean Distance

Measures the straight-line distance between two vectors in high-dimensional space.

Formula: sqrt( sum( (a_i - b_i)^2 ) ) for i = 1 to n
Range: 0 to infinity (0 = identical, larger = more different)
Use Case: When magnitude matters (e.g., comparing absolute differences)

2. Cosine Similarity

Measures the angle between two vectors, ignoring magnitude.

Formula: (a · b) / (||a|| × ||b||)
Range: -1 to 1 (1 = identical direction, -1 = opposite, 0 = orthogonal)
Use Case: Text embeddings (direction matters more than magnitude)
Why Popular: Embeddings are often normalized, making cosine similarity effective

3. Dot Product

Measures alignment between vectors, combining magnitude and direction.

Formula: sum( a_i × b_i ) for i = 1 to n
Range: -infinity to +infinity
Use Case: When both magnitude and direction matter

4. Hamming Distance (for binary vectors)

Counts the number of positions where vectors differ.

Use Case: Binary embeddings, specialized applications

TIP: Most text-based RAG systems use cosine similarity because it works well with normalized embeddings from models like OpenAI’s embedding API.

Search Algorithms: KNN vs. ANN

Vector stores use machine learning algorithms to find the most similar vectors:

k-Nearest Neighbors (KNN)

How it works: Compares the query vector to every stored vector
Accuracy: Perfect—always finds the exact k nearest neighbors
Speed: Slow for large datasets (linear time complexity: O(n))
Use Case: Small datasets, when perfect accuracy is required

Approximate Nearest Neighbor (ANN)

How it works: Uses clever data structures (indexes) to skip most comparisons
Accuracy: Very high (typically 95%+ of true nearest neighbors)
Speed: Fast even for billions of vectors (sub-linear time)
Use Case: Production RAG systems with large knowledge bases
Popular Algorithms:
- HNSW (Hierarchical Navigable Small World)—Graph-based, excellent recall/speed trade-off
- IVF (Inverted File Index)—Clusters vectors, searches only relevant clusters

Why ANN Matters: Without ANN, searching through millions of vectors for every query would be too slow for real-time applications. ANN makes RAG practical at scale.

The Evolution of Vector Stores

Understanding where vector stores came from helps explain their design and capabilities today.

Early Days: Image Recognition (2019)

The first vector stores emerged in 2019 to support dense vector search, primarily for image recognition. Milvus was one of the pioneering systems.

What Problem Did They Solve?

Before vector stores, storing and comparing millions of image embeddings efficiently was challenging. Traditional databases couldn’t handle:

High-dimensional vectors (1,000+ dimensions)
Similarity searches across millions of images
Real-time query performance

Dense Vector Search

These early vector stores specialized in dense vectors, where:

Most dimensions have nonzero values
Each dimension contributes to the meaning
Examples: Image embeddings, deep learning features

Initial Use Cases:

Image similarity search (find similar photos)
Facial recognition
Product visual search (e.g., “find similar shoes”)

Earlier Search: Sparse Vectors

Before dense vector search, earlier search techniques used sparse vectors for lexical search:

What Are Sparse Vectors?

Sparse vectors have:

Most values are zero
Only a few nonzero values
Dimensions represent specific words or features

Examples of Sparse Vector Search:

1. TF-IDF (Term Frequency-Inverse Document Frequency)

Each dimension represents a word in the vocabulary
Values indicate how important a word is to a document
Search: Find documents with high TF-IDF scores for query terms

2. BM25 (Best Matching 25)

Improved version of TF-IDF
Better handling of document length and term saturation
Still widely used for keyword search

3. Lucene

Search engine library (powers Elasticsearch, Solr)
Uses inverted indexes for fast keyword matching
Optimized for sparse vectors

Key Limitation: These methods focus on exact word matches and can’t understand semantic similarity. “car” and “automobile” are treated as completely different.

Modern Era: Text-Based Semantic Search

With the rise of Large Language Models (LLMs), vector stores evolved dramatically:

What Changed?

Embedding Quality: Models like OpenAI’s embeddings, Sentence-BERT, and others produce high-quality semantic vectors for text
New Use Cases: RAG, question answering, document search, conversational AI
Specialized Vector Stores: Purpose-built for text embeddings and LLM applications

New Vector Stores Emerged:

ChromaDB (2022)—Python-native, easy to use, great for prototyping
Pinecone (2021)—Fully managed cloud service, scales automatically
Weaviate (2019)—GraphQL API, built-in vectorization modules
Qdrant (2021)—Rust-based, high performance, filtering support
Faiss (2017, Meta)—Library (not a full database), extremely fast for pure similarity search

Modern Capabilities:

Hybrid search: Combine semantic (dense) and keyword (sparse) search
Metadata filtering: Filter by source, date, category before similarity search
Multi-tenancy: Isolate data for different users or organizations
Real-time updates: Add/update/delete vectors on the fly
Scalability: Handle billions of vectors

Why This Evolution Matters for RAG

Today’s vector stores are purpose-built for RAG systems:

Optimized for text embeddings from LLM embedding models
Fast semantic similarity search using ANN algorithms
Easy integration with LangChain, LlamaIndex, and other RAG frameworks
Metadata support for source tracking and filtering
Production-ready with features like persistence, backup, monitoring

In the next section, we’ll explore the most popular vector stores for RAG applications and help you choose the right one for your use case.

Most Popular Vector Stores

Compiling a definitive comparison of vector stores is challenging due to their rapid evolution and convergence. New features are added frequently, and the landscape shifts quickly. However, the following table gives a solid overview of what’s available in the market, providing a starting point for your exploration.

Vector Store Landscape

Vector Store	Type	Website
FAISS	Vector library	github.com/facebookresearch/faiss
Milvus	Vector database	milvus.io
Qdrant	Vector database	qdrant.tech
Chroma	Vector database	trychroma.com
Weaviate	Vector database	weaviate.io
Pinecone	Vector database	pinecone.io
Vald	Vector database	vald.vdaas.org
ScaNN	Vector library	github.com/google-research/scann
KDB.AI	Time series database	kdb.ai
Elasticsearch	Search engine	elastic.co
OpenSearch	Fork of Elasticsearch	opensearch.org
PgVector	PostgreSQL extension	github.com/pgvector
MongoDB Atlas	MongoDB extension	mongodb.com

Understanding the Types

Notice that these tools fall into several distinct categories:

Vector Libraries (FAISS, ScaNN):

Lightweight, in-memory similarity search
No built-in persistence, networking, or management features
Best for: Prototyping, research, embedding into larger applications
Trade-off: Fastest raw performance, but you manage everything else yourself

Purpose-Built Vector Databases (Milvus, Qdrant, Chroma, Weaviate, Pinecone, Vald):

Full database features: persistence, CRUD operations, filtering, scaling
Built specifically for vector similarity search
Best for: Production RAG systems, enterprise applications
Trade-off: More features but additional infrastructure to manage (except managed services like Pinecone)

Extensions to Existing Databases (PgVector, MongoDB Atlas):

Add vector search capabilities to databases you may already use
Leverage existing infrastructure, tooling, and expertise
Best for: Teams already invested in PostgreSQL or MongoDB
Trade-off: Not as optimized as purpose-built solutions, but easier to adopt

Search Engines with Vector Support (Elasticsearch, OpenSearch):

Traditional search engines that added dense vector search
Excellent for hybrid search (combining keyword + semantic search)
Best for: Applications that need both keyword and semantic search
Trade-off: More complex setup, but extremely versatile

Specialized Databases (KDB.AI):

Vector search added to domain-specific databases (e.g., time series)
Best for: Niche use cases that combine vector search with specialized data types

Which One Should You Choose?

For the RAG implementations in this series, we’ll use ChromaDB because it offers the best balance of:

Simplicity—Python-native, minimal setup, runs locally
Features—Persistence, metadata filtering, multiple distance metrics
Learning curve—Easy to get started, great documentation
LangChain integration—First-class support in the LangChain ecosystem

Once you understand RAG with ChromaDB, switching to another vector store (like Pinecone for production or PgVector for PostgreSQL-based stacks) is straightforward—the concepts are the same, only the API differs.

Practical Example: Querying Paestum Tourism Data with ChromaDB

Below is a short, runnable example that shows how to create an in-memory ChromaDB collection, insert three text chunks about Paestum (from Britannica), and run semantic queries against the collection.

Import the chromadb module and create an in-memory client. Note the in-memory client loses data when the session ends:

import chromadb
chroma_client = chromadb.Client()

Create a collection (a “bucket” for documents + embeddings) and insert the Paestum chunks. We include simple metadatas and stable ids so results are traceable:

tourism_collection = chroma_client.create_collection(name="tourism_collection")
tourism_collection.add(
  documents=[
    "Paestum, Greek Poseidonia, ancient city in southern Italy near the west coast, 22 miles (35 km) southeast of modern Salerno and 5 miles (8 km) south of the Sele (ancient Silarus) River. Paestum is noted for its splendidly preserved Greek temples.",
    "Poseidonia was probably founded about 600 BC by Greek colonists from Sybaris, along the Gulf of Taranto, and it had become a flourishing town by 540, judging from its temples. After many years’ resistance the city came under the domination of the Lucanians (an indigenous Italic people) sometime before 400 BC, after which its name was changed to Paestum. Alexander, the king of Epirus, defeated the Lucanians at Paestum about 332 BC, but the city remained Lucanian until 273, when it came under Roman rule and a Latin colony was founded there. The city supported Rome during the Second Punic War. The locality was still prosperous during the early years of the Roman Empire, but the gradual silting up of the mouth of the Silarus River eventually created a malarial swamp, and Paestum was finally deserted after being sacked by Muslim raiders in AD 871. The abandoned site’s remains were rediscovered in the 18th century.",
    "The ancient Greek part of Paestum consists of two sacred areas containing three Doric temples in a remarkable state of preservation. During the ensuing Roman period a typical forum and town layout grew up between the two ancient Greek sanctuaries. Of the three temples, the Temple of Athena (the so-called Temple of Ceres) and the Temple of Hera I (the so-called Basilica) date from the 6th century BC, while the Temple of Hera II (the so-called Temple of Neptune) was probably built about 460 BC and is the best preserved of the three. The Temple of Peace in the forum is a Corinthian-Doric building begun perhaps in the 2nd century BC. Traces of a Roman amphitheatre and other buildings, as well as intersecting main streets, have also been found. The circuit of the town walls, which are built of travertine blocks and are 15–20 feet (5–6 m) thick, is about 3 miles (5 km) in circumference. In July 1969 a farmer uncovered an ancient Lucanian tomb that contained Greek frescoes painted in the early classical style. Paestum’s archaeological museum contains these and other treasures from the site."
  ],
  metadatas=[
    {"source": "https://www.britannica.com/place/Paestum"},
    {"source": "https://www.britannica.com/place/Paestum"},
    {"source": "https://www.britannica.com/place/Paestum"}
  ],
  ids=["paestum-br-01", "paestum-br-02", "paestum-br-03"]
)

Perform semantic queries. First request the single best match, then ask for the top 3 to inspect relative distances:

results = tourism_collection.query(
  query_texts=["How many Doric temples are in Paestum"],
  n_results=1
)
print(results)

results = tourism_collection.query(
  query_texts=["How many Doric temples are in Paestum"],
  n_results=3
)
print(results)

Explanation of the results:

ids: the collection id(s) of the matching chunk(s). In the example the top match is paestum-br-03 (the chunk describing the three Doric temples).
documents: the actual text chunk(s) returned by ChromaDB.
metadatas: the metadata you stored with each chunk (useful to show source URLs or provenance).
distances: the embedding distance between the query and each returned chunk (smaller = more similar under the chosen metric).

Important notes:

ChromaDB does not generate natural-language answers; it returns the most semantically similar passages. To produce a final answer (e.g., “Paestum has three Doric temples”), feed the retrieved passages plus the original question into an LLM for synthesis.
Use stable ids and metadatas so you can trace which source a snippet came from when building explanations or citations.
For production, consider using a persistent ChromaDB instance (client-server mode) or a managed vector DB (Pinecone, Qdrant, etc.) and an explicit embedding model rather than the default.

Trong Part 3.1, bạn đã học về RAG design pattern và hai giai đoạn của nó: content ingestion (indexing) và question-answering (retrieval và generation). Cả hai giai đoạn đều phụ thuộc nhiều vào một thành phần quan trọng: vector store.

Hiểu cách vector stores hoạt động là điều cần thiết để xây dựng các hệ thống RAG hiệu quả. Trong phần này, bạn sẽ khám phá:

Vector stores là gì và tại sao chúng khác với traditional databases
Cách vector stores cho phép semantic similarity search
Sự phát triển của vector stores từ image recognition đến text-based RAG
Các tùy chọn vector store phổ biến và cách chọn đúng
Thiết lập vector store đầu tiên của bạn với ChromaDB

Đến cuối phần này, bạn sẽ có nền tảng vững chắc để chọn, cấu hình và sử dụng vector stores trong các implementations RAG của bạn.

Hãy bắt đầu bằng cách hiểu vector stores là gì và tại sao chúng lại quan trọng cho semantic search.

Vector Store Là Gì?

Vector store (còn gọi là vector database) là một hệ thống lưu trữ được thiết kế để lưu trữ và truy vấn high-dimensional vectors (vectors nhiều chiều) một cách hiệu quả. Mặc dù nghe có vẻ kỹ thuật, khái niệm này khá đơn giản khi bạn hiểu vai trò của vectors trong ứng dụng AI.

Tại Sao Vectors Quan Trọng Trong AI

Vectors là nền tảng của AI vì embeddings—biểu diễn số của text, hình ảnh, âm thanh hoặc video—được xây dựng bằng chúng. Đây là hiểu biết chính:

Embeddings là vectors nắm bắt ý nghĩa của nội dung trong các chiều của chúng.

Khi bạn chuyển đổi text thành embedding bằng một mô hình như text-embedding-ada-002 của OpenAI, bạn nhận được một vector với 1,536 chiều. Mỗi chiều đại diện cho một khía cạnh nào đó của ý nghĩa ngữ nghĩa của text. Các text tương tự tạo ra các vectors tương tự.

Vector Stores Làm Gì

Việc sử dụng chính của vector stores trong LLM và machine learning applications là lưu trữ embeddings hoạt động như indexes cho text chunks (hoặc chunks của video, hình ảnh, âm thanh).

Không giống như traditional databases lưu trữ rows và columns hoặc documents và fields, vector stores:

Lưu trữ high-dimensional vectors (embeddings) cùng với nội dung gốc
Cho phép similarity searches tìm nội dung liên quan về mặt ngữ nghĩa
Sử dụng các thuật toán chuyên biệt (như ANN - Approximate Nearest Neighbor) cho retrieval nhanh
Mở rộng hiệu quả đến hàng triệu hoặc hàng tỷ vectors

Cách Tìm Kiếm Hoạt Động Trong Vector Stores

Tìm kiếm trong vector stores là similarity searches, đo khoảng cách giữa:

Embedding của query (câu hỏi của người dùng)
Embeddings của stored chunks (các đoạn tài liệu đã được lập chỉ mục)

Kết quả là:

Vector gần nhất (kết quả đơn)
Danh sách các vectors gần nhất (top-k results, xếp hạng theo similarity)

Semantic similarity này phản ánh mức độ gần gũi của ý nghĩa của các text chunks—không chỉ là liệu chúng có chia sẻ từ khóa hay không.

Ví Dụ: Semantic vs. Keyword Search

Keyword Search (Traditional Database):

Query: “động vật họ mèo”
Khớp: Tài liệu chứa chính xác “họ mèo” hoặc “động vật”
Bỏ lỡ: Tài liệu về “mèo,” “sư tử,” “hổ” nếu chúng không dùng từ “họ mèo”

Semantic Search (Vector Store):

Query: “động vật họ mèo” (được chuyển thành embedding)
Khớp: Tài liệu về mèo, sư tử, hổ, báo—ngay cả không có từ “họ mèo”
Cách: Embedding của “động vật họ mèo” gần về mặt toán học với embeddings của nội dung liên quan đến mèo

Đặc Điểm Chính Của Vector Stores

Vector stores được tối ưu hóa cho một use case cụ thể mà traditional databases không xử lý tốt:

Tính Năng	Traditional Database	Vector Store
Loại Dữ Liệu	Rows/columns có cấu trúc hoặc documents	High-dimensional vectors + metadata
Phương Pháp Tìm Kiếm	Exact match, keyword search, SQL queries	Similarity search, nearest neighbor
Sử Dụng Chính	CRUD operations, transactions	Semantic search, recommendation systems
Tối Ưu Tốc Độ	Indexes trên fields, query optimization	ANN algorithms (HNSW, IVF), vector indexes
Trọng Tâm Mở Rộng	Rows/documents	Vector dimensions và volume

Vector Stores Hoạt Động Như Thế Nào?

Hiểu cách vector stores thực hiện similarity searches giúp bạn đánh giá cao sức mạnh và hạn chế của chúng. Hãy khám phá cơ chế.

Tính Toán Khoảng Cách Vector

Vector stores sử dụng distance metrics để đo mức độ tương tự của hai vectors. Các hàm phổ biến nhất bao gồm:

1. Euclidean Distance

Đo khoảng cách đường thẳng giữa hai vectors trong không gian nhiều chiều.

Công thức: sqrt( sum( (a_i - b_i)^2 ) ) for i = 1 to n
Phạm vi: 0 đến infinity (0 = giống hệt, lớn hơn = khác hơn)
Use Case: Khi magnitude quan trọng (ví dụ: so sánh sự khác biệt tuyệt đối)

2. Cosine Similarity

Đo góc giữa hai vectors, bỏ qua magnitude.

Công thức: (a · b) / (||a|| × ||b||)
Phạm vi: -1 đến 1 (1 = cùng hướng, -1 = ngược hướng, 0 = trực giao)
Use Case: Text embeddings (hướng quan trọng hơn magnitude)
Tại Sao Phổ Biến: Embeddings thường được normalized, làm cho cosine similarity hiệu quả

3. Dot Product

Đo sự căn chỉnh giữa các vectors, kết hợp magnitude và hướng.

Công thức: sum( a_i × b_i ) for i = 1 to n
Phạm vi: -infinity đến +infinity
Use Case: Khi cả magnitude và hướng đều quan trọng

4. Hamming Distance (cho binary vectors)

Đếm số vị trí mà vectors khác nhau.

Use Case: Binary embeddings, ứng dụng chuyên biệt

MẸO: Hầu hết các hệ thống RAG dựa trên text sử dụng cosine similarity vì nó hoạt động tốt với normalized embeddings từ các mô hình như embedding API của OpenAI.

Thuật Toán Tìm Kiếm: KNN vs. ANN

Vector stores sử dụng các thuật toán machine learning để tìm các vectors tương tự nhất:

k-Nearest Neighbors (KNN)

Cách hoạt động: So sánh query vector với mọi stored vector
Độ chính xác: Hoàn hảo—luôn tìm chính xác k nearest neighbors
Tốc độ: Chậm cho datasets lớn (độ phức tạp thời gian tuyến tính: O(n))
Use Case: Datasets nhỏ, khi cần độ chính xác hoàn hảo

Approximate Nearest Neighbor (ANN)

Cách hoạt động: Sử dụng cấu trúc dữ liệu thông minh (indexes) để bỏ qua hầu hết các so sánh
Độ chính xác: Rất cao (thường 95%+ của true nearest neighbors)
Tốc độ: Nhanh ngay cả với hàng tỷ vectors (thời gian sub-linear)
Use Case: Hệ thống RAG production với knowledge bases lớn
Thuật Toán Phổ Biến:
- HNSW (Hierarchical Navigable Small World)—Dựa trên đồ thị, cân bằng recall/speed xuất sắc
- IVF (Inverted File Index)—Phân cụm vectors, chỉ tìm trong các cụm liên quan

Tại Sao ANN Quan Trọng: Không có ANN, việc tìm kiếm qua hàng triệu vectors cho mỗi query sẽ quá chậm cho ứng dụng thời gian thực. ANN làm cho RAG khả thi ở quy mô lớn.

Đọc Thêm Về Distance Metrics

Mặc dù chúng ta sẽ không đi sâu hơn vào distance metrics và similarity search algorithms ở đây, nếu bạn quan tâm đến nền tảng toán học:

Academic Paper: “Vector Database Management Systems: Fundamental Concepts, Current Trends, and Future Directions” của Yikun Han et al.
Hướng Dẫn Thực Tế: “Distance Metrics in Vector Search” của Erika Shorten

Sự Phát Triển Của Vector Stores

Hiểu vector stores đến từ đâu giúp giải thích thiết kế và khả năng của chúng ngày nay.

Những Ngày Đầu: Image Recognition (2019)

Các vector stores đầu tiên xuất hiện vào 2019 để hỗ trợ dense vector search, chủ yếu cho image recognition. Milvus là một trong những hệ thống tiên phong.

Họ Giải Quyết Vấn Đề Gì?

Trước vector stores, việc lưu trữ và so sánh hàng triệu image embeddings một cách hiệu quả là thách thức. Traditional databases không thể xử lý:

High-dimensional vectors (1,000+ dimensions)
Similarity searches trên hàng triệu hình ảnh
Hiệu suất query thời gian thực

Dense Vector Search

Các vector stores đầu tiên này chuyên về dense vectors, nơi:

Hầu hết các chiều có giá trị khác không
Mỗi chiều đóng góp vào ý nghĩa
Ví dụ: Image embeddings, deep learning features

Use Cases Ban Đầu:

Image similarity search (tìm ảnh tương tự)
Facial recognition
Product visual search (ví dụ: “tìm giày tương tự”)

Tìm Kiếm Sớm Hơn: Sparse Vectors

Trước dense vector search, các kỹ thuật tìm kiếm sớm hơn sử dụng sparse vectors cho lexical search:

Sparse Vectors Là Gì?

Sparse vectors có:

Hầu hết giá trị là zero
Chỉ một vài giá trị khác không
Chiều đại diện cho các từ hoặc features cụ thể

Ví Dụ Về Sparse Vector Search:

1. TF-IDF (Term Frequency-Inverse Document Frequency)

Mỗi chiều đại diện cho một từ trong vocabulary
Giá trị cho biết một từ quan trọng như thế nào đối với một tài liệu
Tìm kiếm: Tìm tài liệu với TF-IDF scores cao cho query terms

2. BM25 (Best Matching 25)

Phiên bản cải tiến của TF-IDF
Xử lý tốt hơn độ dài tài liệu và term saturation
Vẫn được sử dụng rộng rãi cho keyword search

3. Lucene

Search engine library (cung cấp sức mạnh cho Elasticsearch, Solr)
Sử dụng inverted indexes cho keyword matching nhanh
Được tối ưu hóa cho sparse vectors

Hạn Chế Chính: Các phương pháp này tập trung vào exact word matches và không thể hiểu semantic similarity. “car” và “automobile” được coi là hoàn toàn khác nhau.

Thời Đại Hiện Đại: Text-Based Semantic Search

Với sự nổi lên của Large Language Models (LLMs), vector stores phát triển đáng kể:

Điều Gì Đã Thay Đổi?

Chất Lượng Embedding: Các mô hình như OpenAI’s embeddings, Sentence-BERT, và những mô hình khác tạo ra semantic vectors chất lượng cao cho text
Use Cases Mới: RAG, question answering, document search, conversational AI
Vector Stores Chuyên Biệt: Được xây dựng mục đích cho text embeddings và LLM applications

Vector Stores Mới Xuất Hiện:

ChromaDB (2022)—Python-native, dễ sử dụng, tuyệt vời cho prototyping
Pinecone (2021)—Dịch vụ cloud được quản lý hoàn toàn, tự động mở rộng
Weaviate (2019)—GraphQL API, các modules vectorization tích hợp
Qdrant (2021)—Dựa trên Rust, hiệu suất cao, hỗ trợ filtering
Faiss (2017, Meta)—Library (không phải database đầy đủ), cực kỳ nhanh cho pure similarity search

Khả Năng Hiện Đại:

Hybrid search: Kết hợp semantic (dense) và keyword (sparse) search
Metadata filtering: Lọc theo nguồn, ngày, category trước similarity search
Multi-tenancy: Cô lập dữ liệu cho các users hoặc organizations khác nhau
Real-time updates: Thêm/cập nhật/xóa vectors ngay lập tức
Scalability: Xử lý hàng tỷ vectors

Tại Sao Sự Phát Triển Này Quan Trọng Cho RAG

Vector stores ngày nay được xây dựng mục đích cho hệ thống RAG:

Được tối ưu hóa cho text embeddings từ LLM embedding models
Fast semantic similarity search sử dụng ANN algorithms
Tích hợp dễ dàng với LangChain, LlamaIndex, và các RAG frameworks khác
Hỗ trợ metadata cho source tracking và filtering
Production-ready với các tính năng như persistence, backup, monitoring

Trong phần tiếp theo, chúng ta sẽ khám phá các vector stores phổ biến nhất cho ứng dụng RAG và giúp bạn chọn đúng cho use case của bạn.

Các Vector Stores Phổ Biến Nhất

Việc tổng hợp một bảng so sánh chính xác của các vector stores là thách thức do chúng phát triển rất nhanh và các tính năng ngày càng trùng lặp với nhau. Tính năng mới được thêm thường xuyên và bối cảnh thay đổi nhanh. Tuy nhiên, bảng sau cung cấp tổng quan vững chắc về những gì có sẵn trên thị trường, cung cấp điểm khởi đầu cho việc khám phá của bạn.

Bối Cảnh Vector Store

Vector Store	Loại	Website
FAISS	Vector library	github.com/facebookresearch/faiss
Milvus	Vector database	milvus.io
Qdrant	Vector database	qdrant.tech
Chroma	Vector database	trychroma.com
Weaviate	Vector database	weaviate.io
Pinecone	Vector database	pinecone.io
Vald	Vector database	vald.vdaas.org
ScaNN	Vector library	github.com/google-research/scann
KDB.AI	Time series database	kdb.ai
Elasticsearch	Search engine	elastic.co
OpenSearch	Fork of Elasticsearch	opensearch.org
PgVector	PostgreSQL extension	github.com/pgvector
MongoDB Atlas	MongoDB extension	mongodb.com

Hiểu Các Loại

Lưu ý rằng các công cụ này thuộc một số danh mục riêng biệt:

Vector Libraries (FAISS, ScaNN):

Similarity search nhẹ, trong bộ nhớlý (in-memory)
Không có tính năng persistence, networking hoặc quản lý tích hợp
Tốt nhất cho: Prototyping, nghiên cứu, nhúng vào ứng dụng lớn hơn
Đánh đổi: Hiệu suất raw nhanh nhất, nhưng bạn tự quản lý mọi thứ khác

Vector Databases Chuyên Dụng (Milvus, Qdrant, Chroma, Weaviate, Pinecone, Vald):

Tính năng database đầy đủ: persistence, CRUD operations, filtering, scaling
Được xây dựng đặc biệt cho vector similarity search
Tốt nhất cho: Hệ thống RAG production, ứng dụng doanh nghiệp
Đánh đổi: Nhiều tính năng hơn nhưng cần thêm infrastructure để quản lý (ngoại trừ dịch vụ managed như Pinecone)

Extensions Cho Databases Hiện Có (PgVector, MongoDB Atlas):

Thêm khả năng vector search vào databases bạn có thể đã sử dụng
Tận dụng infrastructure, tooling và chuyên môn hiện có
Tốt nhất cho: Nhóm đã đầu tư vào PostgreSQL hoặc MongoDB
Đánh đổi: Không tối ưu như giải pháp chuyên dụng, nhưng dễ áp dụng hơn

Search Engines Với Hỗ Trợ Vector (Elasticsearch, OpenSearch):

Search engines truyền thống đã thêm dense vector search
Xuất sắc cho hybrid search (kết hợp keyword + semantic search)
Tốt nhất cho: Ứng dụng cần cả keyword và semantic search
Đánh đổi: Thiết lập phức tạp hơn, nhưng cực kỳ đa năng

Databases Chuyên Biệt (KDB.AI):

Vector search được thêm vào databases chuyên về domain cụ thể (ví dụ: time series)
Tốt nhất cho: Use cases chuyên biệt kết hợp vector search với các loại dữ liệu chuyên dụng

Bạn Nên Chọn Cái Nào?

Cho các RAG implementations trong series này, chúng ta sẽ sử dụng ChromaDB vì nó cung cấp sự cân bằng tốt nhất của:

Sự đơn giản—Python-native, thiết lập tối thiểu, chạy locally
Tính năng—Persistence, metadata filtering, nhiều distance metrics
Đường cong học—Dễ bắt đầu, tài liệu tuyệt vời
Tích hợp LangChain—Hỗ trợ hạng nhất trong hệ sinh thái LangChain

Sau khi bạn hiểu RAG với ChromaDB, việc chuyển sang vector store khác (như Pinecone cho production hoặc PgVector cho PostgreSQL-based stacks) rất đơn giản—các khái niệm giống nhau, chỉ API khác.

Ví dụ thực hành: Truy vấn dữ liệu Paestum với ChromaDB

Dưới đây là ví dụ ngắn bằng Python minh họa cách khởi tạo ChromaDB (in-memory), thêm ba đoạn văn về Paestum và thực hiện truy vấn semantic.

Khởi tạo client ChromaDB (in-memory):

import chromadb
chroma_client = chromadb.Client()

Tạo collection và chèn nội dung (documents, metadatas, ids):

tourism_collection = chroma_client.create_collection(name="tourism_collection")
tourism_collection.add(
  documents=[
    "Paestum, Greek Poseidonia, ancient city in southern Italy near the west coast, 22 miles (35 km) southeast of modern Salerno and 5 miles (8 km) south of the Sele (ancient Silarus) River. Paestum is noted for its splendidly preserved Greek temples.",
    "Poseidonia was probably founded about 600 BC by Greek colonists from Sybaris, along the Gulf of Taranto, and it had become a flourishing town by 540, judging from its temples. After many years’ resistance the city came under the domination of the Lucanians (an indigenous Italic people) sometime before 400 BC, after which its name was changed to Paestum. Alexander, the king of Epirus, defeated the Lucanians at Paestum about 332 BC, but the city remained Lucanian until 273, when it came under Roman rule and a Latin colony was founded there. The city supported Rome during the Second Punic War. The locality was still prosperous during the early years of the Roman Empire, but the gradual silting up of the mouth of the Silarus River eventually created a malarial swamp, and Paestum was finally deserted after being sacked by Muslim raiders in AD 871. The abandoned site’s remains were rediscovered in the 18th century.",
    "The ancient Greek part of Paestum consists of two sacred areas containing three Doric temples in a remarkable state of preservation. During the ensuing Roman period a typical forum and town layout grew up between the two ancient Greek sanctuaries. Of the three temples, the Temple of Athena (the so-called Temple of Ceres) and the Temple of Hera I (the so-called Basilica) date from the 6th century BC, while the Temple of Hera II (the so-called Temple of Neptune) was probably built about 460 BC and is the best preserved of the three. The Temple of Peace in the forum is a Corinthian-Doric building begun perhaps in the 2nd century BC. Traces of a Roman amphitheatre and other buildings, as well as intersecting main streets, have also been found. The circuit of the town walls, which are built of travertine blocks and are 15–20 feet (5–6 m) thick, is about 3 miles (5 km) in circumference. In July 1969 a farmer uncovered an ancient Lucanian tomb that contained Greek frescoes painted in the early classical style. Paestum’s archaeological museum contains these and other treasures from the site."
  ],
  metadatas=[
    {"source": "https://www.britannica.com/place/Paestum"},
    {"source": "https://www.britannica.com/place/Paestum"},
    {"source": "https://www.britannica.com/place/Paestum"}
  ],
  ids=["paestum-br-01", "paestum-br-02", "paestum-br-03"]
)

Thực hiện truy vấn semantic (yêu cầu 1 kết quả, sau đó 3 kết quả để so sánh khoảng cách):

results = tourism_collection.query(
  query_texts=["How many Doric temples are in Paestum"],
  n_results=1
)
print(results)

results = tourism_collection.query(
  query_texts=["How many Doric temples are in Paestum"],
  n_results=3
)
print(results)

Giải thích ngắn:

ids: ID của đoạn phù hợp nhất (ví dụ paestum-br-03).
documents: đoạn văn được trả về.
metadatas: nguồn bạn đã lưu.
distances: khoảng cách embedding (nhỏ hơn = gần nghĩa hơn).

Ghi chú:

ChromaDB trả về các đoạn gần nghĩa; để có câu trả lời hoàn chỉnh, kết hợp kết quả với một LLM để tổng hợp.
Sử dụng ids và metadatas ổn định để bạn có thể truy vết nguồn khi xây dựng giải thích hoặc trích dẫn.
Ở môi trường production, dùng chế độ client-server hoặc một dịch vụ vector DB có persistence.

Note — Instantiating ChromaDB in different ways

You can instantiate ChromaDB in three common modes depending on your needs:

In-memory (the default) — quick for experiments and notebooks.

Persistent (on-disk) — use chromadb.PersistentClient when you need data to survive restarts.

HTTP (client-server) — run a ChromaDB server and connect with chromadb.HttpClient for separation, scaling, and persistence.

Examples (code blocks are kept standalone for easy copy/paste):

# Persistent (on-disk) client
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.create_collection("my_persistent_collection")

# HTTP client (after running `chroma run --port 8010`)
client = chromadb.HttpClient(host="http://localhost", port=8010)
collection = client.get_collection("my_persistent_collection")

The collection API (add, query, update, delete) is the same across modes.

Ghi chú — Khởi tạo ChromaDB theo nhiều cách

Bạn có thể khởi tạo ChromaDB theo ba chế độ phổ biến tuỳ nhu cầu:

In-memory (mặc định) — nhanh, phù hợp cho thử nghiệm và notebook.

Persistent (trên đĩa) — dùng chromadb.PersistentClient khi cần dữ liệu tồn tại qua khởi động lại.

HTTP (client-server) — chạy server ChromaDB và kết nối bằng chromadb.HttpClient để tách dịch vụ, mở rộng và đảm bảo persistence.

Ví dụ (khối code giữ nguyên để dễ sao chép):

# Persistent (trên đĩa)
client = chromadb.PersistentClient(path="./chroma_db")
collection = client.create_collection("my_persistent_collection")

# HTTP client (sau khi chạy `chroma run --port 8010`)
client = chromadb.HttpClient(host="http://localhost", port=8010)
collection = client.get_collection("my_persistent_collection")

API của collection (add, query, update, delete) là giống nhau giữa các chế độ.

Let’s implement RAG by building a chatbot that uses the GPT-5-nano model and a vector database Part 3.3 Implementing-rag-from-scratch

Hãy triển khai RAG bằng cách xây một chatbot sử dụng mô hình GPT-5-nano và một vector database Part 3.3 Implementing-rag-from-scratch