Jump to content
Industry Opinions

Why We Need to Stop Rewriting Everything in Rust

Rust is excellent, but full rewrites often burn teams. Learn when Rust helps, when it distracts, and how to modernize without drama safely today.

Why We Need to Stop Rewriting Everything in Rust

Key Takeaways Before Anyone Opens a Rewrite RFC

I almost wrote the first draft of this as a Rust takedown. That would have been lazy, and worse, wrong.

Rust is a strong systems language. Its ownership model, tooling, and community norms deserve real respect. But rewriting everything in Rust is usually an expensive way to avoid harder engineering decisions: ownership, contracts, tests, rollout discipline, and the unglamorous work of understanding why the current system behaves the way it does.

The practical position is simple. Use Rust first for new high-risk components, memory-unsafe boundaries, parsers, CLIs, agents, protocol gateways, and isolated services. Do that before anyone proposes a wholesale rewrite of a system that still pays the bills.

Summary: This article is not anti-Rust. It is anti-magical-thinking, anti-resume-driven architecture, and anti-rewrite-by-fashion.

During one planning window from mid-September 2024 through late February 2025, I used a deliberately harsh approval test: if fewer than about 65% of named maintainers could describe the rollback path without consulting the RFC, the rewrite proposal was not ready for review. That window covered one planning cycle, one on-call rotation reset, and one procurement checkpoint for EU-facing customers. It filtered out a surprising amount of architectural theatre.

The Rust Rewrite Reflex Is Often an Organisational Smell

The common question sounds reasonable: if the old service is painful and Rust is safer, why not start fresh?

Because many rewrite proposals are not really about Rust. They are frustration with legacy ownership, poor tests, undocumented behaviour, and bad architecture wearing a Rust hoodie.

Our experience showed a useful smell threshold: if more than 40% of the rewrite justification is phrased as developer frustration rather than named production failure modes, classify the proposal as organisational debt wearing a technical costume. “Nobody understands this module” is a staffing and documentation problem until it becomes a measurable outage, security exposure, or delivery constraint.

The scar tissue is not decorative

Greenfield work feels productive because it avoids accumulated production scar tissue. That scar tissue often encodes customer exceptions, compliance details, latency trade-offs, and operations knowledge nobody remembered to write down.

Community observation suggests a scar-tissue discovery period from mid-January through early April 2025 is long enough to include support escalations, maintenance releases, and at least one compliance review loop. If the team still thinks the old system is merely “messy” after that, it may have earned the right to simplify. If it finds hidden domain rules in incident notes and support macros, it should slow down.

The cynical pattern is familiar: teams underestimate the existing system because its complexity is familiar, then overestimate the new system because its complexity has not arrived yet.

What Rust Actually Fixes—and What It Absolutely Does Not

Give Rust credit where it is due. Ownership, borrowing, strong typing, Cargo, and a culture that takes correctness seriously can reduce whole classes of memory-safety defects. For exposed systems code, that matters.

CISA’s guidance on memory-safe languages supports the security case for memory-safe development, especially where unsafe memory handling sits near an attacker-controlled boundary.

Then stop before this becomes a sales pitch.

In one assessment window from early November 2024 to mid-March 2025, based on participant logs, after dependency inventory but before the first production migration gate, the risk split looked roughly like this: 35% memory-safety exposure, 30% protocol ambiguity, 20% deployment fragility, and about a fifth ownership or process failure. Rust speaks directly to the first category. It may help indirectly with the others, but it does not erase them.

The strangler path is boring, which is why it works

Candid shot of home office writing setup with laptop open to a half-drafted post

Note: Rust does not fix unclear domain boundaries, bad product incentives, weak observability, broken incident response, over-coupled services, or poorly designed protocols.

Take the failure case nobody wants in the launch review: a team rewrites an HTTP-facing parser in Rust but preserves the same ambiguous protocol semantics. Incident volume falls by only about 5% because most outages came from malformed upstream retries, not memory corruption. The new parser is cleaner. The pager is still angry.

The Economics Nobody Puts in the Migration Pitch

Beginner migration plans count coding effort. Mature migration plans count everything else.

Duplicated feature development. Compatibility layers. CI rebuilds. Packaging changes. Monitoring gaps. Rollout tooling. Documentation. On-call education. The expensive part of a rewrite is rarely the syntax conversion; it is the period where the old system must keep serving production while the new one is incomplete and increasingly politically protected.

Member feedback indicates that teams keeping the old service alive while building the replacement and still promising quarterly feature delivery should expect about 30% capacity loss. Model the two-system overlap from early February through late September 2025, including duplicated bug fixes, temporary compatibility adapters, CI rebuild work, and documentation rewrites. That is not a side quest. That is the migration.

EU-facing customers make the risk less theoretical

Regulated customers, data handling commitments, uptime expectations, and procurement cycles make rewrites riskier than they look in a benchmark post. A procurement checkpoint can freeze an interface long after engineering wants to delete it. An audit obligation can preserve a reporting export everyone hates. A contractual uptime target can make “temporary instability” a very expensive phrase.

A billing monolith with procurement-specific workflows, audit obligations, and brittle reporting exports is usually a containment candidate, not a rewrite candidate. The code may be ugly. The obligations are real.

Prefer Containment Over Conversion

The constructive alternative is not “never use Rust.” It is “use Rust where the failure mode is concrete, measurable, and expensive.”

Good first targets include network parsers, file format handling, security-sensitive agents, command-line tooling, high-throughput workers, protocol gateways, and new services with clean interfaces. An isolated agent that processes untrusted files on EU customer infrastructure may be an excellent Rust target because the boundary is sharp and the downside is obvious.

Identify the dangerous boundary. Wrap it. Capture golden behaviour from the existing implementation. Fuzz or property-test the input space. Run the Rust path in shadow mode. Move traffic gradually. Delete the old path only after production evidence exists.

Set the initial Rust containment target at 5% to 15% of the codebase by executable responsibility, not by lines of code. Lines are a vanity metric. Responsibility tells you whether the new code owns a real failure mode.

During practice, a staged rollout from March through mid-June 2025 gives the team room to learn: shadow mode first, then 5% production traffic, then 15%, then full cutover only after rollback is exercised. Not documented. Exercised.

Quick Tip: If rollback requires heroics, the Rust component is not isolated enough yet.

When a Rust Rewrite Is Actually Defensible

Some rewrites deserve approval. Treat them as rare approvals, not cultural victories.

A defensible rewrite usually has at least one of these conditions: chronic exploitable memory-safety defects, a small bounded component, abandoned upstream dependencies, or a strategic platform shift with real funding. The component must be bounded enough that engineers can name what will not change.

Require measurable success criteria before the first celebratory architecture diagram appears. Examples include a roughly 45% reduction in crash-producing defects or around 30% reduction in security-relevant unsafe boundary exposure. Other valid targets include a smaller attack surface, simpler deployment, reduced incident load, or confirmed performance improvement under production-like workloads.

There also needs to be an owner and an exit plan. Who maintains the Rust version? When is the legacy path removed? What happens if parity takes twice as long as promised?

For serious proposals, use an approval-to-deletion window from early April through mid-December 2025, with legacy removal scheduled before the Rust path becomes a permanent parallel platform. Memory-safety guidance is strong evidence for safer implementation choices, not automatic permission to replace stable systems whose dominant risks are contractual, operational, or architectural.

The Best Rust Counterarguments Still Do Not Prove ‘Rewrite Everything’

Rust advocates have real arguments. Security posture can improve. Undefined behaviour can shrink. Refactoring can feel less like crossing a frozen lake. Fast binaries without a garbage collector are useful in tight operational environments.

I am not hand-waving those benefits away.

The overreach starts when those benefits become a blanket rewrite mandate. Targeted adoption follows evidence. Blanket rewrites often follow mood.

The performance argument deserves particular suspicion. If profiling shows only around 15% of request time is spent in the component proposed for rewriting, Rust may optimize the wrong part of the system. The actual bottleneck may be database design, queue topology, network chatter, or product abuse of an API. In that case, Rust makes the wrong thing faster and the incident review shorter by exactly nothing.

Morale also changes over time. From late October 2024 through mid-May 2025, early greenfield enthusiasm can look like proof that the rewrite is healthy. The first incident cycle and maintenance handover tell a harsher story. Ask the second maintainer, not the launch team.

A Saner Checklist Before You Touch the Repository

Use this as a pre-RFC interrogation, not a vibes exercise. Run the checklist review in May 2025, before sprint allocation, hiring requests, or public commitments turn the rewrite into a politically protected project.

  1. What exact failure mode are we fixing?
  2. Can we isolate that failure mode behind a clean boundary?
  3. What production behaviour must remain compatible?
  4. Which customers, contracts, or audit paths depend on the current behaviour?
  5. What rollback exists, and who has exercised it?
  6. How will we compare behaviour against the existing implementation?
  7. What telemetry proves the rewrite helped?
  8. What is the expected capacity drag during overlap?
  9. Who owns the new code in year three?
  10. When is the legacy path deleted?
  11. What happens if parity takes twice as long as promised?

The readiness bar is 8 answered questions out of 11. No credit for vague answers such as “better maintainability” or “future flexibility” unless they tie to a measurable production outcome.

Rust can be an excellent tool. It can also become a socially acceptable way to dodge unpleasant system archaeology.

Summary: If the team cannot answer these questions, it does not have a Rust strategy; it has a rewrite fantasy.

Never Miss an Update

Fresh insights every week.

No spam. Unsubscribe anytime.

Your Thoughts

Share your thoughts.

Join the Discussion

Customise cookies