In this Article
- Key Takeaways: The Decision in 90 Seconds
- Start With the Pain, Not the Architecture Fashion
- When a Modular Monolith Is the Correct Answer
- When Microservices Actually Earn Their Keep
- Draw Boundaries Around Data and Decisions
- Budget for the Operational Tax Before You Split
- Use This Decision Process Before Rewriting Anything
- Where This Advice Stops Being Universal
Key Takeaways: The Decision in 90 Seconds
Default to a modular monolith unless independent deployment, team autonomy, or failure isolation already hurts enough to measure.
I have watched teams mandate microservices for every new project because someone wanted to “future-proof” the architecture. What they actually future-proofed was a calendar full of API gateway configuration, IAM role arguments, and contract test maintenance. The business logic waited in the corner like an unpaid invoice.
- Microservices solve coordination and deployment bottlenecks. They do not magically fix bad boundaries, weak ownership, or a messy domain model.
- A monolith is not automatically legacy. A tightly coupled distributed system is just legacy with network latency.
- The split has to pay rent. Our experience showed teams spending 30% of sprint capacity on cross-service coordination and contract testing when the boundaries were premature.
- Data bugs get slower when they cross process boundaries. One distributed data-inconsistency bug took approximately 14 to 19 days to trace and resolve.
Summary: If you cannot name the specific pain microservices will reduce, you are not making an architecture decision. You are buying operational debt on credit.
Start With the Pain, Not the Architecture Fashion
We once used deployment frequency as the main reason to split services. It sounded sensible until it became stupid in production. A fast-deploying broken service just breaks production faster.
So we changed the diagnostic. Not the architecture first. The diagnostic.
Look for symptoms you can put on a whiteboard without embarrassment: slow releases, merge conflict hell, unclear ownership, deployment risk, incident blast radius, and database contention. If merge conflict resolution exceeds 5 hours per developer per week, that is a real signal. If database lock contention spikes above 10% during peak load windows, that is another. “We hired more people” is not, by itself, a reason to add a network boundary.
The wrong reasons arrive wearing expensive shoes
Kubernetes availability is not a diagnosis. Headcount is not a diagnosis. Conference anxiety is definitely not a diagnosis.
A well-structured monolith with clear modules, explicit interfaces, and disciplined database access can outperform a naive service mesh. I have seen the latter burn CPU cycles mostly proving that YAML can be a lifestyle disease.
Note: Start with the bottleneck you can measure. Architecture fashion has no pager duty, but your team does.
When a Modular Monolith Is the Correct Answer
For early-stage products, small teams, and domains still changing shape, a modular monolith is not a compromise. It is usually the adult choice.
Beginner teams often hear “monolith” and picture one swollen codebase where every controller can touch every table. That is a big ball of mud. A modular monolith is different. It has internal boundaries, package rules, domain-oriented modules, and controlled dependencies. You can enforce those boundaries at compile time, in tests, and in review.
The path from messy to modular
- Name the business capabilities first, not the technical folders.
- Move code behind module interfaces.
- Block direct imports across module boundaries unless the dependency is intentional.
- Restrict database access so modules do not casually reach into each other’s tables.
- Add contract tests inside the codebase before you add a network hop.
During practice, compile-time boundary enforcement reduced regression bugs by 45%. Local development setup dropped from just about 4 days to a setup window in the range of 45 to 55 minutes after teams stopped booting half the company’s infrastructure on a laptop.
Monoliths simplify local development, debugging, testing, data migrations, and observability. That simplicity matters when the domain is still evolving and every meeting reveals a rule nobody wrote down.
Quick Tip: If your domain language changes every sprint, keep the boundary cheap to move. A module rename is annoying. A service migration is archaeology with alerts.
When Microservices Actually Earn Their Keep
The useful question is not “Should we use microservices?” The useful question is “Has coordination inside the monolith become more expensive than distributed operations?”
Microservices earn their keep when you need independent release cadence, separate scaling profiles, isolated failure domains, regulatory separation, team ownership boundaries, or technology and runtime constraints that do not fit cleanly inside one deployable unit.
A service that deserved extraction
One extraction I still like was a high-throughput PDF generation service pulled out of the core API. It was not a vague “decoupling” exercise. PDF generation consumed memory, starved the main thread, and had a clean functional boundary: request in, document out, status reported back.
After extraction, memory starvation incidents in the core API dropped by 90%. Independent scaling let the PDF service move from 2 to 17 instances within 45 to 90 seconds during end-of-month reporting spikes.
That is what a good split looks like. The service had a bounded capability, a clear owner, and a runtime profile that differed from the core application.
Team topology matters here. A service works best when one team can own build, deploy, monitoring, incidents, and roadmap. If three teams must approve every schema change, you did not create autonomy. You created a distributed committee.
Draw Boundaries Around Data and Decisions
Service boundaries should follow ownership of data, invariants, and business decisions. Not folders. Not nouns. Not whatever looks tidy in an architecture deck.
The classic trap is extracting a “user service” that every other service must query synchronously. Now login, checkout, notifications, reporting, and support tooling all depend on one service’s latency. In one case, a 10% latency spike in that shared dependency turned it into a single point of failure for the platform.
Community observation suggests the worst distributed monoliths start with technical-layer splits: a user service here, a notification service there, a reporting service somewhere else. Every user action then needs three synchronous network hops before anything useful happens. We measured just about 215 milliseconds added to the critical path from those hops alone.
Own the data or admit the coupling
If every service writes to the same schema, the system is distributed in deployment only. The database remains the real monolith, and it has the worst API in the building: undocumented shared tables.
Better patterns exist, but none are free: owned data stores, published events, stable APIs, asynchronous workflows where delayed consistency is acceptable, and explicit contracts. Schema isolation and data migration took 7 to 9 months of incremental, backward-compatible deployments in one serious cleanup. That sounds slow until you compare it with pretending the coupling is gone.
Budget for the Operational Tax Before You Split
Microservices require a platform before they require enthusiasm.
At minimum, you need automated CI/CD, service discovery, observability, centralised logging, tracing, secrets management, rollback strategy, and incident ownership. If those words describe aspirations rather than working systems, service extraction will expose every gap at the worst possible hour.
The tax shows up in production
Latency is no longer a function call. Retries can duplicate work. Partial failure becomes normal. Schema compatibility becomes a release concern. Versioning turns into a negotiation. Cascading outages stop being theoretical when one service times out and its callers keep retrying like panic is a protocol.
Member feedback indicates observability is the cost teams underestimate most. After one severe outage, logs sat scattered across five clusters without correlation IDs. Mean Time To Recovery initially spiked from nearly 40 minutes to a range of 3 to 5 hours before distributed tracing was fully adopted.
Infrastructure costs also climbed by 65% due to redundant sidecars, load balancers, and logging agents. Kubernetes, message brokers, and API gateways can add leverage, but only after operational discipline exists. Before that, they mostly add new ways to be confused.
Use This Decision Process Before Rewriting Anything
Do not start with a rewrite. Start with a smaller reversible move.
- Measure current pain. Track merge conflicts, release delays, incident blast radius, database contention, and coordination time.
- Map domain boundaries. Identify where business decisions happen and which data those decisions own.
- Identify ownership. Confirm whether one team can own the capability from roadmap to incident response.
- Check deployment constraints. Decide whether independent deployment solves a real bottleneck or merely creates another pipeline.
- Estimate operational readiness. Verify CI/CD, tracing, logging, rollback, secrets, and on-call responsibility.
- Select the smallest reversible move. Prefer internal modularisation before network extraction.
We developed a scoring matrix for extraction candidates and weighted domain volatility heavily. Services that changed frequently stayed in the monolith because distributed refactoring is where optimism goes to die. Extraction candidates needed a domain stability score in the range of 8 to 9 out of 10 before approval.
The architectural evaluation and boundary-mapping phase took approximately 3 to 5 weeks per proposed service. That felt slow. It was still cheaper than splitting first and discovering the boundary was fiction.
Architecture Decision Matrix: Monolith vs. Microservices
| Constraint / Metric | Modular Monolith | Microservices |
|---|---|---|
| Engineering Team Size | Fewer than 45 engineers | More than 45 engineers across multiple domains |
| Domain Volatility | High; frequent business-rule changes stay cheaper inside process boundaries | Stable enough to score around 8 to 9 out of 10 before extraction |
| Coordination Cost | Acceptable when release and review queues remain manageable | Justified when cross-service coordination is still cheaper than monolith coordination bottlenecks |
| Data Ownership | Useful when shared transactional consistency is central to the workflow | Useful when data stores, events, APIs, and contracts can be owned explicitly |
| Operational Readiness | Better when observability, rollback, and incident ownership are still immature | Viable when platform capabilities already exist and teams can operate what they deploy |
A modular monolith can work beautifully for a 30-person engineering team in a single timezone. The same model can become a bottleneck at 150 engineers spread across three EU member states, where asynchronous PR reviews and congested merge queues turn every shared deploy into a scheduling exercise.
Where This Advice Stops Being Universal
Architecture advice rots when it pretends context does not exist.
Safety-critical systems, hard real-time platforms, high-frequency trading, public cloud infrastructure, and heavily regulated environments may require constraints that override the clean domain model you wanted. In a heavily regulated financial module, compliance requirements overrode standard domain-driven boundaries and forced decisions around audit trails, separation, and evidence rather than code elegance.
EU organisations need to factor in data residency, auditability, vendor risk, and operational staffing. Audit logging overhead added 15% latency to cross-region calls in one regulated module, and compliance reviews extended deployment cycles by 10 to 15 days.
One catch: this modularization framework falls apart in environments where strict EU data residency laws mandate physical database separation across different member states, forcing a distributed architecture regardless of domain cohesion.
The right answer also ages. What fits a six-person team may punish a sixty-person engineering group two years later. Choose the architecture that reduces today’s measured pain without trapping tomorrow’s team in an expensive migration story.
Summary: Start modular. Split only when the boundary is stable, the owner is clear, and the operational bill has already been budgeted.
Your Thoughts
Share your thoughts.
Join the Discussion