← 0xkazwriting

2026-03-20

Building RAG on Cloudflare Workers + Vectorize

I've been building GCC LexAI — a Q&A tool over UAE and Saudi Arabia AI regulation documents — on Cloudflare Workers + Vectorize + D1. Here's what I'd tell myself before starting.

Everything Runs in One Worker

Everything runs in one file.

Query comes in → embed via Workers AI → nearest-neighbor search via env.VECTORIZE.query() → structured filter via env.DB.prepare().bind().all() on D1 → LLM generation → response. No separate services, no API keys for a vector DB, no managing another instance.

On AWS this would be Lambda + OpenSearch + RDS + API Gateway, wired together. On Cloudflare it's one Worker with three bindings. TypeScript types work across all of them. wrangler deploy and it's live on the edge.

That's the pitch. It holds up.

D1 + Vectorize: split the work clearly

Early on I tried to do too much with Vectorize's metadata filters. It only supports equality — country = "UAE" works, year >= 2022 or partial string matches don't. Once I hit that wall, the right split became obvious:

Fetch wide from Vectorize, then narrow in D1. This pattern is clean and fast. The one setup cost: Vectorize doesn't assign vector IDs automatically. You generate UUIDs yourself, store them in D1 (chunks.vector_id), and use that as the join key everywhere. Without this mapping, you lose track of which chunk belongs to which document.

Four Gotchas to Build Around

upsert is eventually consistent. Push vectors, immediately query — you'll get nothing. Wait a minute, it appears. Not a bug, just how it works. Budget this into your ingestion flow.

topK disappears after filtering. Set topK=8, apply a D1 WHERE clause, end up with 0 results. The vectors are there; the filter is too narrow. Set topK higher than you think you need, filter downstream.

You can't test Vectorize locally. wrangler dev doesn't support it. Every integration test requires a deploy to staging. This slows iteration more than you'd expect if you're used to fast local loops.

Vector dimensions are immutable. Configure an index with 768 dimensions, decide to switch embedding models later — you rebuild the entire index from scratch. Pick your embedding model before you index anything.

When This Stack Is the Right Call

Cloudflare's RAG stack is the fastest path from idea to working product if you're building at small-to-medium scale, want less infrastructure to manage, and are already using Cloudflare. Everything is connected through bindings, so it feels like one system rather than separate services wired together.

If you need complex metadata filtering, multi-tenancy, or billions of vectors, look elsewhere. But for a focused RAG product where you control the data model, this is hard to beat.

Built while working on GCC LexAI — AI regulation Q&A for the Gulf region. Getting the source documents had its own complications — Crawling GCC Government Documents: What Blocked Me.

// feedback