Scaling a patient-management system without buying a bigger server.
Our 2023 piece on scaling a 500K-client patient system on AWS without upgrading the host became one of our most-quoted notes. Three years later: the architecture held, and the parts that didn't are the ones AI changed.
In 2023 we wrote up how we got a Singapore medical group’s patient-management system back into reasonable response times without taking AWS’s recommendation to upgrade to a dedicated host and a high-performance database tier. The article ran with the subtitle “cost-effective scalability solutions” because that’s what the client cared about. It is one of our most-cited pieces.
The architecture is still in production, with modifications. Here’s the 2026 rewrite — same skeleton, three changes that matter.
What the original got right
The four moves we made in 2022–2023:
- Split the public patient portal from the employee portal. Two separate compute tiers, each scaled and tuned independently.
- Distribute cold data across cheaper auxiliary servers. Logs older than a week, visit data older than six months, reports older than six months — each on its own modest VM.
- An archive automation service. A scheduled .NET service that swept the primary DB and migrated cold rows to the cheap servers, with metadata in the primary so queries could still find them.
- Web-service APIs for cross-server reads. When a primary-tier query needed a cold row, it called a web service against the appropriate auxiliary, returning real-time results.
The architecture survived a near-tripling of the user base over 2023–2025. The decision not to upgrade the dedicated host saved the client roughly 60% on infrastructure cost over that window, on the same workload.
What we changed in 2024
1. Replaced the bespoke web-service APIs with PolyBase / linked queries
In 2022 we wrote a small .NET web-service layer because cross-server SQL Server queries (linked servers) had a reputation for being unstable. By 2024 we had less ideological feelings about this and consolidated onto SQL Server PolyBase / linked-server reads for the cold-data tiers. The custom .NET layer became one fewer thing to deploy.
This is a case where the boring built-in tool was, in fact, the right tool.
2. Moved the archive service to AWS Step Functions + Lambda
The 2023 archive service ran as a Windows scheduled task on the primary server. It worked. It also concentrated a fragile job on the primary host. In 2024 we re-implemented it as a Step Functions workflow with Lambda steps and S3 staging. The primary host got slightly faster; the archive workflow became visibly retryable.
3. Read replicas, finally
The 2023 article didn’t mention read replicas because the client wanted to avoid the licence cost of a second SQL Server. By 2024 the client had grown enough that the licence cost was a smaller proportion of the bill, and we added a read replica for the heavy-report queries. Reports moved off the primary entirely.
What AI changed
This is the new section. Two changes worth calling out.
Reports moved to an agent loop
In 2025 we ported the patient-summary report generation from a static-template pipeline to an agent loop. The agent reads the patient record, the latest assays, recent visit notes, and writes a summary report for the attending clinician — with every fact in the summary linked back to the source row. Roughly:
- A clinician requests a report; the request goes onto a queue.
- An agent picks up the request, reads the relevant rows from the read replica, drafts the summary.
- A second agent fact-checks every claim against the underlying rows, flagging anything that doesn’t trace back.
- A clinician reviews and signs.
We removed about 4,000 lines of template code and a few hundred lines of edge-case handling. We also added an eval harness for the agent loop and a dedicated incident-response runbook for it (a separate piece of work — see “IMDA’s Model Governance Framework for Agentic AI, read by builders”).
Cost monitoring became a first-class signal
The original article monitored CPU, memory, IOPS. The agent-loop reports run on inference, which is a new cost line with its own variance. We now monitor:
- Inference cost per report, per model version.
- Latency per report, with separate alerts for tail.
- Eval pass rate on the fact-checker, weekly.
A correct report generated for thirty dollars of inference is, in product terms, a wrong report. Cost is part of the eval.
Architecture today (2026)
public patient portal ────► public-tier ASP.NET ──┐
│
employee portal ──────► internal-tier .NET ───┤
▼
primary MS SQL (hot)
│
read replica (reports)
│
┌──────────────────────┼──────────────────────┐
▼ ▼ ▼
log archive visit archive report archive
(>1 week) (>6 months) (>6 months)
▲ ▲ ▲
└──────── Step Functions archive workflow ────┘
Report generation:
clinician request ──► queue ──► drafter.agent ──► fact-checker.agent ──► clinician sign-off
│
eval harness
What we cut from the original
- The “AWS Aurora is a future improvement” line. We ran into Aurora’s quirks for SQL Server-shaped workloads in 2024 and pulled back. RDS for SQL Server with read replicas covered our needs.
- The implication that bespoke .NET web services were the only safe cross-server path. PolyBase / linked servers, with care, are fine.
What carries over unchanged
The instinct: scale by separating, then by distributing, then by replicating, then by upgrading the host — in that order. Most teams reach for the upgrade first. It is rarely the cheapest move.
— wGrow studio · migrated from Team-Notes #58