Cases

Ten projects that show what we do

Client names are under NDA, so we talk about industry, scale, and measurable results: faithfulness, latency, cost, conversion, uptime. On request — reference calls with teams we've worked with.

AI cases

AI integration and LLM systems

Where client want "AI capability" and get measurable acceleration, cost reduction, and quality you can prove on a golden set.

Fintech / RAG · compliance 2.4M documents

RAG copilot for compliance analysts

Challenge. Analysts spent 30–60 minutes on each case review: searching 2.4M internal documents, checking regulatory requirements. Open-source search gave irrelevant results, naive RAG hallucinated.

What we did. Hybrid search (BM25 + embeddings) with cross-encoder re-ranking. Chunking aware of document structure. Separate fact-checker pass on Claude. Citation-aware UI: every claim linked to source. Eval on golden set of 1200 pairs.

Outcome. Faithfulness 0.94 (vs 0.71 baseline), case review time dropped from 40 to 6 minutes, token cost 72% lower via smart prefix caching and routing simple questions to cheaper models.

6.5×

faster analysis

0.94

faithfulness

−72%

cost vs naive RAG

ClaudepgvectorBM25RagasLangfuse

E-commerce / agents 800K tickets / month

AI agent in a marketplace support team

Challenge. High volume of standard requests: "where's my order", "refund me", "change address". First-response time 9 minutes, support staff burned out, NPS −18.

What we did. Built agent pipeline on LangGraph with tool-use: order status check, return initiation, address update, escalation. Human-in-the-loop on refunds above threshold. Separate safety layer against prompt injection.

Outcome. First-response from 9 minutes to 12 seconds, 68% auto-resolved, support NPS +12. Support team shifted to complex cases and proactive outreach.

45×

FRT

68%

auto-resolution

+12 NPS

support

GPT-4oLangGraphtool-usePostgresRedis

SaaS / LLMOps 10M req/day · multi-team

Production LLM gateway with routing and caching

Challenge. 14 product teams independently calling OpenAI and Anthropic. Cost growing faster than revenue, no visibility on "who's spending what", frequent rate-limit failures, no fallback.

What we did. Designed and launched gateway: unified API, Claude / GPT / Llama routing by task type and SLO, semantic caching, per-tenant rate-limiting, retry / fallback on 429 / 5xx, cost attribution via OpenTelemetry. PII filters and per-user token budget.

Outcome. Token cost down 64% in a quarter. Gateway overhead 42 ms p99. Zero "hit rate-limit" incidents in 6 months. Each team sees real-time spend.

10M

requests / day

−64%

cost

42 ms

overhead p99

GoRedisvLLMOpenTelemetryLangfuse

DevOps / AI agent 1.4K incidents / month

AI agent for production-incident auto-triage

Challenge. SRE team drowned in alerts: 50% false positives, MTTA 18 minutes, on-call staff waking 3–4 times per night.

What we did. AI agent reads alert, fetches related metrics and logs via MCP, searches past incidents in postmortems (RAG), formulates hypothesis and runbook suggestion. Below confidence threshold — escalates to human with pre-gathered context.

Outcome. MTTA: 18 min → 2 min (agent gathers context for user). 41% of alerts closed without on-call as known issues. Nighttime wake-ups down 3.5x.

9×

MTTA

41%

auto-triage

−72%

on-call burden

ClaudeMCPPrometheusLokipgvector

DevOps / AI agent 1.4K incidents / month

AI agent for production-incident auto-triage

Challenge. SRE team drowned in alerts: 50% false positives, MTTA 18 minutes, on-call staff waking 3–4 times per night.

Outcome. MTTA: 18 min → 2 min (agent gathers context for user). 41% of alerts closed without on-call as known issues. Nighttime wake-ups down 3.5x.

9×

MTTA

41%

auto-triage

−72%

on-call burden

ClaudeMCPPrometheusLokipgvector

Analytics / DataLake & DWH 120K call hours / month

LLM-powered analysis of sales calls

Challenge. Call recordings were unstructured noise. Manual listen-through impossible, legacy BI was CRM-only, managers filled fields inconsistently.

What we did. Pipeline: transcription (Whisper) → diarization → structured extraction (LLM to strict JSON schema) → objection and key-question highlights → ClickHouse views → dashboards for team and product.

Outcome. 100% of calls auto-tagged within 5 min of end. Product team sees real customer objections; managers get feedback with quotes. Sales conversion +9% per quarter.

100%

coverage

5 min

SLA from end-of-call

+9%

conversion

WhisperClaudeClickHouseKafka

Highload cases

Architecture and engineering for systems under load

Payments, RTB, search, multi-tenant SaaS, realtime sessions, data platforms. Metrics — the thing these projects were built for.

Fintech / payments 35K RPS · 4 regions

Payment platform migration from monolith to event-driven core

Challenge. Payment platform hitting database bottleneck on peaks, releases once per two weeks with incidents, business planning 5–6x transaction growth.

What we did. Decomposed monolith into 9 services around bounded contexts. Kafka as backbone, outbox + saga orchestration. Multi-region Kubernetes cluster with active-active payment core.

Outcome. Payment API p99: 850 ms → 110 ms. Daily releases. Infrastructure −40%. Sustained 6x growth.

7×

p99 reduction

6×

TPS growth

−40%

cloud cost

GoKafkaPostgresK8sAWS

AdTech / RTB 250K QPS · multi-AZ

Real-time bidding pipeline with SLA 95 ms

Challenge. Launch RTB bidder that fits within 100 ms auction budget and consistently handles 200K+ QPS on peak.

What we did. Pipeline on Go with in-memory feature store on Redis Cluster. Analytics and model training in ClickHouse, real-time ingestion via Kafka. Multi-AZ deployment with health-aware load balancing.

Outcome. 250K QPS, auctions fit in 95 ms p99. 18 months with zero revenue-loss incidents.

250K

QPS

95 ms

SLA p99

99.99%

uptime

GoRedisClickHouseKafkaAWS

E-commerce / search 12M SKU · 4M MAU

Search and faceted navigation for a marketplace

Challenge. Search across 12M products slow at 800 ms p95 and "collapsed" under peaks. Search-to-cart conversion stuck.

What we did. Migrated to Elasticsearch with pre-calculated facets and autocomplete, split nodes into data/coordinating/ingest, indexing — stream from Kafka.

Outcome. p95 search: 800 → 80 ms. Search → cart conversion +18%. Full catalog indexing 25 min instead of 6 hours.

10×

faster p95

+18%

conversion

14×

faster indexing

ElasticsearchNode.jsKafkaPostgres

B2B SaaS / multi-tenant 1.2K tenants

Multi-tenant platform: isolation and scalability

Challenge. SaaS outgrew shared-DB approach: noisy neighbors affecting each other, compliance audits took weeks, tenant provisioning manual by engineers.

What we did. Shifted to schema-per-tenant with auto-provisioning via Terraform. Split critical path and analytics. Per-tenant rate limits and resource quotas in Kubernetes.

Outcome. P95 queries 3x lower. Tenant provisioning: 2 days → 90 seconds. Audit and compliance automated.

×3

P95

90 s

provisioning

incidents 12 mo

PostgresTerraformK8sGo

Gaming / realtime 800K MAU · <50 ms p99

Realtime backend for multiplayer sessions

Challenge. Backend for realtime sessions with predictable latency, handling seasonal audience spikes.

What we did. Matchmaking and state service on Go with Redis Streams. Sessions distributed across region-shards, single-shard failure doesn't touch others. Blue/green deployments with warmup.

Outcome. P99 network latency under 50 ms. 800K MAU without degradation. 18 months zero downtime in production.

<50 ms

p99

800K

MAU

downtime 18 mo

GoRedis StreamsNATSK8s

Ten projects that show what we do

AI integration and LLM systems

RAG copilot for compliance analysts

AI agent in a marketplace support team

Production LLM gateway with routing and caching

AI agent for production-incident auto-triage

AI agent for production-incident auto-triage

LLM-powered analysis of sales calls

Architecture and engineering for systems under load

Payment platform migration from monolith to event-driven core

Real-time bidding pipeline with SLA 95 ms

Search and faceted navigation for a marketplace

Multi-tenant platform: isolation and scalability

Realtime backend for multiplayer sessions

Have a similar project?