← 0xkaz

2026-03-20

Building RAG on Cloudflare Workers + Vectorize: What's Surprisingly Good (and What'll Bite You)

I've been building GCC LexAI — a Q&A tool over UAE and Saudi Arabia AI regulation documents — on Cloudflare Workers + Vectorize + D1. Here's what I'd tell myself before starting.

The thing that genuinely surprised me

Everything runs in one file.

Query comes in → embed via Workers AI → nearest-neighbor search via env.VECTORIZE.query() → structured filter via env.DB.prepare().bind().all() on D1 → LLM generation → response. No separate services, no API keys for a vector DB, no managing another instance.

On AWS this would be Lambda + OpenSearch + RDS + API Gateway, wired together. On Cloudflare it's one Worker with three bindings. TypeScript types work across all of them. wrangler deploy and it's live on the edge.

That's the pitch. It holds up.

D1 + Vectorize: split the work clearly

Early on I tried to do too much with Vectorize's metadata filters. It only supports equality — country = "UAE" works, year >= 2022 or partial string matches don't. Once I hit that wall, the right split became obvious:

Fetch wide from Vectorize, then narrow in D1. This pattern is clean and fast. The only setup cost: Vectorize doesn't assign vector IDs automatically. You generate UUIDs yourself, store them in D1 (chunks.vector_id), and use that join key everywhere. Without this mapping, you lose track of which chunk belongs to which document.

What'll bite you

upsert is eventually consistent. Push vectors, immediately query — you'll get nothing. Wait a minute, it appears. Not a bug, just how it works. Budget this into your ingestion flow.

topK disappears after filtering. Set topK=8, apply a D1 WHERE clause, end up with 0 results. The vectors are there; the filter is too narrow. Set topK higher than you think you need, filter downstream.

You can't test Vectorize locally. wrangler dev doesn't support it. Every integration test requires a deploy to staging. This slows iteration more than you'd expect if you're used to fast local loops.

Vector dimensions are immutable. Configure an index with 768 dimensions, decide to switch embedding models later — you rebuild the entire index from scratch. Pick your embedding model before you index anything.

Who this is good for

Cloudflare's RAG stack is the fastest path from idea to working product if you're building at small-to-medium scale, want minimal infrastructure overhead, and are already in the Cloudflare ecosystem. The binding model makes the stack feel native in a way that stitching together managed services doesn't.

If you need complex metadata filtering, multi-tenancy, or billions of vectors, look elsewhere. But for a focused RAG product where you control the data model, this is hard to beat.

Built while working on GCC LexAI — AI regulation Q&A for the Gulf region.

// feedback