AI SaaS Product Development · LLM Integration · Multi-Tenant Architecture

ReplyIQ — AI-Powered Customer Support SaaS for SupportScale Inc.

A production-grade, multi-tenant B2B SaaS platform built in 10 weeks — per-customer RAG knowledge bases, Claude 3.5 Sonnet integration, Stripe usage billing, and Zendesk/Intercom channel connectors. $1.2M ARR reached at Month 8.

Claude 3.5 Sonnet Multi-Tenant RAG Next.js 14 Stripe Billing Zendesk API AWS ECS

89%

Support Tickets Auto-Resolved

$1.2M

ARR at Month 8 Post-Launch

3 sec

Avg Response Time

10 Weeks

MVP Delivered

Project Overview

About This Project

SupportScale's founders had a validated hypothesis: B2B SaaS companies waste 35–60% of their support budget on repetitive tier-1 tickets that AI could handle. They needed a production-grade, multi-tenant SaaS product — not a proof of concept, not a chatbot widget — a real B2B platform with onboarding flows, usage-based billing, Zendesk/Intercom integrations, and demonstrable 80%+ auto-resolution rates.

We built ReplyIQ from scratch in 10 weeks: a multi-tenant AI support platform powered by Claude 3.5 Sonnet, with per-customer RAG knowledge bases, Slack/email/widget channels, a live resolution dashboard, and Stripe usage billing. Delivered on time for a 12-prospect enterprise demo in Week 11.

Claude 3.5 Sonnet

LangChain

Next.js 14

FastAPI (Python)

PostgreSQL + pgvector

Stripe Billing

Zendesk API

Intercom API

AWS ECS + RDS

Redis Queue

Vercel Edge

89%

Support tickets auto-resolved without human intervention — across 47 B2B customers

$1.2M

ARR reached at Month 8 post-launch, with 47 paying B2B SaaS customers onboarded

3 sec

Average end-to-end response time — first token delivered in 800ms via SSE streaming

Paying B2B customers onboarded by Month 3 — first enterprise deal closed 3 weeks after launch

The Problem

Challenges We Solved

Multi-Tenant Knowledge Isolation

Each customer's knowledge base must be completely isolated. A question to Company A's support bot must never retrieve data from Company B's documents — zero cross-contamination, verifiable by penetration testing.

Real-Time Response at Scale

Support tickets expect near-instant responses. At scale with 100+ concurrent tenants, streaming LLM responses across thousands of simultaneous queries required careful queue architecture — naive implementations collapsed in load testing.

Variable Document Quality

Customers upload support docs ranging from polished help centre articles to internal Slack exports and handwritten FAQs photographed on phones. The RAG pipeline had to handle wildly inconsistent input quality without silently producing garbage outputs.

Escalation Confidence Scoring

The system needs to know when it doesn't know — and escalate gracefully to human agents. A badly calibrated confidence threshold would either over-escalate (defeating the purpose) or under-escalate (delivering wrong answers confidently to customers).

Stripe Usage Billing Complexity

Pricing model was per-resolved-ticket with a monthly minimum floor. This required real-time usage metering, webhook handling for billing events, automated dunning, and a customer-facing usage dashboard — built alongside core product features.

10-Week MVP Deadline

The founders had a pipeline of 12 enterprise prospects expecting a live demo in Week 11. Every scope decision — every feature, integration, and optimisation — was measured against demo-readiness. There was no room for scope creep.

Our Approach

How We Solved It

Tenant-Scoped Vector Namespaces with RLS

Implemented tenant-scoped vector namespaces in pgvector with row-level security (RLS) enforced at the database layer. Each tenant's embeddings are logically and physically isolated — verified by penetration testing with cross-tenant query attempts returning zero results.

Async Queue + SSE Streaming Architecture

Redis-based job queue handles ticket ingestion at scale, decoupling intake from processing. FastAPI streams Claude 3.5 Sonnet responses via Server-Sent Events (SSE) to the frontend, delivering the first token within 800ms while full responses arrive in 3-second average.

5-Stage Document Preprocessing Pipeline

Built a 5-stage preprocessing pipeline: format normalisation (PDF/DOCX/HTML/Slack JSON) → smart chunking → quality scoring → deduplication → embedding. Low-quality chunks are flagged and deprioritised in retrieval without being discarded — preserving edge-case knowledge.

Calibrated Confidence + Escalation Router

Built a two-model system: Claude 3.5 Sonnet for answer generation, and a fine-tuned BERT classifier for confidence scoring. Tickets below the threshold are routed to human agents with the AI draft as a starting point — not discarded — improving agent efficiency even on escalated tickets.

Real-Time Metering with Stripe Usage API

Implemented Stripe Billing with the usage records API — every resolved ticket triggers a metered billing event in real time. Customer dashboard shows live ticket counts, resolution rates, and estimated monthly bill. Automated dunning and subscription management built in from Week 8.

Sprint-Gated MVP Scoping

Used a 10-week sprint plan with hard feature gates: core RAG engine (Weeks 1–3) → multi-tenancy + security (Weeks 4–5) → Zendesk/Intercom channel integrations (Weeks 6–7) → billing + dashboard (Weeks 8–9) → load testing + hardening (Week 10). Non-demo features formally deferred.

Results

The Outcomes

Week 10 — Demo

Delivered On Schedule

Live product demo delivered to all 12 enterprise prospects on schedule at Week 11. The multi-tenant architecture, live resolution dashboard, and Stripe billing were all functional. 3 prospects signed letters of intent within the demo week.

Month 3 — Growth

47 Paying B2B Customers

First enterprise deal closed 3 weeks after launch. By Month 3, 47 paying B2B SaaS companies had self-onboarded through the product's onboarding flow — no sales-assisted setup required for SMB tier.

Month 8 — Scale

$1.2M ARR · 89% Resolution Rate

$1.2M ARR reached with 89% average auto-resolution rate across the customer base. Average response time held at 3 seconds under production load. SupportScale raised a Series A on the strength of these metrics.

$1.2M ARR in 8 Months. Built in 10 Weeks.

ReplyIQ went from empty repo to production B2B SaaS with 47 paying customers — a product SupportScale's own engineers estimated at 9 months to build.