An embedding model turns each chunk into a vector — numbers positioned so that similar meanings end up close together in space.
vectors = embedding_model.embed_documents(chunks)
vector_store.add(vectors, metadatas=[{'source': doc_id}] * len(chunks))
results = vector_store.similarity_search(query, k=4)k=4 means "give me the 4 closest chunks." Those four get inserted into the prompt alongside the user's question — that's the "retrieval-augmented" part, the model is answering from text it was just handed, not just from memory.
Where this goes next
The next step beyond this series is usually an agent that decides when to retrieve at all, rather than retrieving on every single query.