portfolioprojects
paris, fr
← all projects
MULTI-TENANT RAG PLATFORM · LIVE● LIVE

Chatbots that read your docs

A live, multi-tenant platform where anyone can spin up a chatbot grounded in their own documents — a Next.js app on Vercel talks to a Cloudflare Worker that retrieves from a vector index and answers with gpt-5.1, streamed token by token.

role
Solo build
year
2024 → 2025
frontend
Next.js · Vercel
RAG backend
Cloudflare Worker · Hono
live
askmybot.me
Flow diagram: in the browser you ask a question; the Next.js web app on Vercel (NextAuth login, users stored in Neon Postgres) forwards the question and chat history to a Cloudflare Worker running the RAG engine on Hono and LangChain, which embeds the question with OpenAI, searches the Cloudflare Vectorize index and fetches matching passages from Cloudflare D1, then generates an answer with gpt-5.1 and streams it back token by token. Separately, uploaded documents go to Cloudflare R2 via presigned URLs, are parsed and chunked by Unstructured.io, embedded by OpenAI, and written into Vectorize and D1.
Two paths. To answer (red), you ask in the browser, the Next.js app forwards your question and history to a Cloudflare Worker, which embeds the question, searches the Vectorize index, fetches the matching passages from D1, and streams a gpt-5.1 answer back token by token. To teach it (gold), uploaded files land in R2, get parsed and chunked by Unstructured.io, embedded by OpenAI, and written into the vector index and D1.
01OVERVIEW

AskMyBot lets anyone build a chatbot that answers from their own material instead of the open internet. You sign in, create a named bot, upload your documents — a policy handbook, product docs, lecture notes — and from then on the bot answers questions using what's in those files, and points back to where it found the answer. Every account is isolated: your bots and your knowledge are yours alone.

Answering is a retrieval-augmented-generation (RAG) loop. When you ask something, the question is turned into a vector and matched against your documents in a vector index; the closest passages are pulled back as context, combined with the recent conversation, and handed to the model, which writes the reply. The answer streams back word by word, so you watch it form in real time rather than waiting for the whole thing.

Under the hood it's a deliberately serverless, two-part system. A Next.js app on Vercel handles the interface, login and uploads; a separate Cloudflare Worker does the heavy retrieval and generation close to the data. Documents live in Cloudflare R2, their text and your bots' prompts in Cloudflare D1, the searchable vectors in Cloudflare Vectorize, and your account in Neon Postgres — so each concern scales on its own and the whole thing runs without a server to babysit. It is deployed and live at askmybot.me.

02WHAT I BUILT
01

Multi-tenant chatbots

Each user signs in and creates their own named bots, each with an isolated knowledge base. Retrieval is filtered by user and bot, so one account's documents and answers never leak into another's.

02

Document ingestion pipeline

Files upload straight to Cloudflare R2 through presigned URLs, then get parsed and chunked by Unstructured.io (by-title, with overlap), embedded with OpenAI, and written into the vector index and D1 — turning raw PDFs and Word docs into searchable knowledge.

03

RAG retrieval

A question is embedded and matched against Cloudflare Vectorize; the top-k vector hits are resolved to their full passages in D1 by a custom LangChain retriever, then passed to the model as grounded context — with the question first reformulated against the chat history.

04

Streaming answers

A history-aware retriever and QA chain generate the reply, streamed to the browser over Server-Sent Events. The UI shows live status — connecting, retrieving, generating — so the wait is legible, and answers render as formatted markdown.

05

Serverless split architecture

A Next.js frontend on Vercel is cleanly separated from a Cloudflare Worker (Hono) that owns retrieval and generation. The two talk over a bearer-token API, letting the RAG engine sit next to its data and scale independently of the web tier.

06

Auth & accounts

NextAuth handles email/password sign-up with verification plus Google and Facebook OAuth; accounts and sessions live in Neon Postgres via Drizzle ORM, and transactional email (verification, password reset) goes out through Resend.

07

Customisable bots

Each bot carries its own system prompt and a default-bot setting, and knowledge can be added or removed by named domain — so owners tune voice and scope, and curate exactly what their bot knows.

08

Considered chat UX

Copy and regenerate any reply, listen to answers via browser text-to-speech, and manage files in a drag-and-drop browser built on Atlaskit pragmatic drag-and-drop — consumer-grade polish on top of the AI plumbing.

03STACK

Frontend

Next.jsReact 19TypeScriptTailwindRadix UITanStack Query

RAG backend

Cloudflare WorkersHonoLangChainOpenAIVectorizeD1

Data · infra

Neon PostgresDrizzleR2NextAuthUnstructured.ioVercel
04REFERENCES