Cases

Ten projects that show what we do

Client names are under NDA, so we talk about industry, scale, and measurable results: faithfulness, latency, cost, conversion, uptime. On request — reference calls with teams we've worked with.

AI cases

AI integration and LLM systems

Where client want "AI capability" and get measurable acceleration, cost reduction, and quality you can prove on a golden set.

Fintech / RAG · compliance 2.4M documents

RAG copilot for compliance analysts

Challenge. Analysts spent 30–60 minutes on each case review: searching 2.4M internal documents, checking regulatory requirements. Open-source search gave irrelevant results, naive RAG hallucinated.

What we did. Hybrid search (BM25 + embeddings) with cross-encoder re-ranking. Chunking aware of document structure. Separate fact-checker pass on Claude. Citation-aware UI: every claim linked to source. Eval on golden set of 1200 pairs.

Outcome. Faithfulness 0.94 (vs 0.71 baseline), case review time dropped from 40 to 6 minutes, token cost 72% lower via smart prefix caching and routing simple questions to cheaper models.

6.5×
faster analysis
0.94
faithfulness
−72%
cost vs naive RAG
ClaudepgvectorBM25RagasLangfuse
E-commerce / agents 800K tickets / month

AI agent in a marketplace support team

Challenge. High volume of standard requests: "where's my order", "refund me", "change address". First-response time 9 minutes, support staff burned out, NPS −18.

What we did. Built agent pipeline on LangGraph with tool-use: order status check, return initiation, address update, escalation. Human-in-the-loop on refunds above threshold. Separate safety layer against prompt injection.

Outcome. First-response from 9 minutes to 12 seconds, 68% auto-resolved, support NPS +12. Support team shifted to complex cases and proactive outreach.

45×
FRT
68%
auto-resolution
+12 NPS
support
GPT-4oLangGraphtool-usePostgresRedis
SaaS / LLMOps 10M req/day · multi-team

Production LLM gateway with routing and caching

Challenge. 14 product teams independently calling OpenAI and Anthropic. Cost growing faster than revenue, no visibility on "who's spending what", frequent rate-limit failures, no fallback.

What we did. Designed and launched gateway: unified API, Claude / GPT / Llama routing by task type and SLO, semantic caching, per-tenant rate-limiting, retry / fallback on 429 / 5xx, cost attribution via OpenTelemetry. PII filters and per-user token budget.

Outcome. Token cost down 64% in a quarter. Gateway overhead 42 ms p99. Zero "hit rate-limit" incidents in 6 months. Each team sees real-time spend.

10M
requests / day
−64%
cost
42 ms
overhead p99
GoRedisvLLMOpenTelemetryLangfuse
DevOps / AI agent 1.4K incidents / month

AI agent for production-incident auto-triage

Challenge. SRE team drowned in alerts: 50% false positives, MTTA 18 minutes, on-call staff waking 3–4 times per night.

What we did. AI agent reads alert, fetches related metrics and logs via MCP, searches past incidents in postmortems (RAG), formulates hypothesis and runbook suggestion. Below confidence threshold — escalates to human with pre-gathered context.

Outcome. MTTA: 18 min → 2 min (agent gathers context for user). 41% of alerts closed without on-call as known issues. Nighttime wake-ups down 3.5x.

MTTA
41%
auto-triage
−72%
on-call burden
ClaudeMCPPrometheusLokipgvector
DevOps / AI agent 1.4K incidents / month

AI agent for production-incident auto-triage

Challenge. SRE team drowned in alerts: 50% false positives, MTTA 18 minutes, on-call staff waking 3–4 times per night.

What we did. AI agent reads alert, fetches related metrics and logs via MCP, searches past incidents in postmortems (RAG), formulates hypothesis and runbook suggestion. Below confidence threshold — escalates to human with pre-gathered context.

Outcome. MTTA: 18 min → 2 min (agent gathers context for user). 41% of alerts closed without on-call as known issues. Nighttime wake-ups down 3.5x.

MTTA
41%
auto-triage
−72%
on-call burden
ClaudeMCPPrometheusLokipgvector
Analytics / DataLake & DWH 120K call hours / month

LLM-powered analysis of sales calls

Challenge. Call recordings were unstructured noise. Manual listen-through impossible, legacy BI was CRM-only, managers filled fields inconsistently.

What we did. Pipeline: transcription (Whisper) → diarization → structured extraction (LLM to strict JSON schema) → objection and key-question highlights → ClickHouse views → dashboards for team and product.

Outcome. 100% of calls auto-tagged within 5 min of end. Product team sees real customer objections; managers get feedback with quotes. Sales conversion +9% per quarter.

100%
coverage
5 min
SLA from end-of-call
+9%
conversion
WhisperClaudeClickHouseKafka
Highload cases

Architecture and engineering for systems under load

Payments, RTB, search, multi-tenant SaaS, realtime sessions, data platforms. Metrics — the thing these projects were built for.

Fintech / payments 35K RPS · 4 regions

Payment platform migration from monolith to event-driven core

Challenge. Payment platform hitting database bottleneck on peaks, releases once per two weeks with incidents, business planning 5–6x transaction growth.

What we did. Decomposed monolith into 9 services around bounded contexts. Kafka as backbone, outbox + saga orchestration. Multi-region Kubernetes cluster with active-active payment core.

Outcome. Payment API p99: 850 ms → 110 ms. Daily releases. Infrastructure −40%. Sustained 6x growth.

p99 reduction
TPS growth
−40%
cloud cost
GoKafkaPostgresK8sAWS
AdTech / RTB 250K QPS · multi-AZ

Real-time bidding pipeline with SLA 95 ms

Challenge. Launch RTB bidder that fits within 100 ms auction budget and consistently handles 200K+ QPS on peak.

What we did. Pipeline on Go with in-memory feature store on Redis Cluster. Analytics and model training in ClickHouse, real-time ingestion via Kafka. Multi-AZ deployment with health-aware load balancing.

Outcome. 250K QPS, auctions fit in 95 ms p99. 18 months with zero revenue-loss incidents.

250K
QPS
95 ms
SLA p99
99.99%
uptime
GoRedisClickHouseKafkaAWS
E-commerce / search 12M SKU · 4M MAU

Search and faceted navigation for a marketplace

Challenge. Search across 12M products slow at 800 ms p95 and "collapsed" under peaks. Search-to-cart conversion stuck.

What we did. Migrated to Elasticsearch with pre-calculated facets and autocomplete, split nodes into data/coordinating/ingest, indexing — stream from Kafka.

Outcome. p95 search: 800 → 80 ms. Search → cart conversion +18%. Full catalog indexing 25 min instead of 6 hours.

10×
faster p95
+18%
conversion
14×
faster indexing
ElasticsearchNode.jsKafkaPostgres
B2B SaaS / multi-tenant 1.2K tenants

Multi-tenant platform: isolation and scalability

Challenge. SaaS outgrew shared-DB approach: noisy neighbors affecting each other, compliance audits took weeks, tenant provisioning manual by engineers.

What we did. Shifted to schema-per-tenant with auto-provisioning via Terraform. Split critical path and analytics. Per-tenant rate limits and resource quotas in Kubernetes.

Outcome. P95 queries 3x lower. Tenant provisioning: 2 days → 90 seconds. Audit and compliance automated.

×3
P95
90 s
provisioning
0
incidents 12 mo
PostgresTerraformK8sGo
Gaming / realtime 800K MAU · <50 ms p99

Realtime backend for multiplayer sessions

Challenge. Backend for realtime sessions with predictable latency, handling seasonal audience spikes.

What we did. Matchmaking and state service on Go with Redis Streams. Sessions distributed across region-shards, single-shard failure doesn't touch others. Blue/green deployments with warmup.

Outcome. P99 network latency under 50 ms. 800K MAU without degradation. 18 months zero downtime in production.

<50 ms
p99
800K
MAU
0
downtime 18 mo
GoRedis StreamsNATSK8s

Have a similar project?

Tell us about your system and metrics — we'll show how we'd approach it and what it would cost.

Discuss the project →