Let me start with something I've seen play out more times than I can count. A team hits a wall with the cloud. Costs creep up, then spike. Performance starts to feel inconsistent. Someone in finance asks a simple question like "why did this double?" and nobody has a clean answer.
Compliance meanwhile starts raising eyebrows about where data lives and who controls it. The conversation at some point shifts from optimization to something more fundamental. Maybe this isn't the right place for everything. That realization feels like a breakthrough, like you've identified the problem.
In reality, you've just identified the starting line.
The Decision to Move Is the Obvious Part
There are usually very rational reasons behind it, like costs not behaving to your expectations. Infrastructure spend that's hard to forecast quarter to quarter. Workloads that just don't perform the way they should in shared, abstracted environments.
It's a growing discomfort, now that everything is tied tightly to a single provider's ecosystem.
None of that is hypothetical. It's operational reality for a lot of teams. So they decide to move. On paper, it sounds straightforward enough. Pull workloads out, put them somewhere else, regain control. Simple.
Except it's not.
The Move Itself Is Where the Illusion Breaks
Because the moment you actually start planning the move, you realize you're not just relocating infrastructure. You're redesigning how your environment works. There's no clean export and import button. No "flip the switch and we're done" moment.
You start asking questions that didn't matter as much before.
How are systems going to talk to each other now?
What stays in the cloud and what doesn't?
Where do certain workloads actually perform best?
What does your network look like when it's no longer abstracted away?
How do you monitor and support something that isn't sitting inside a single console?
Suddenly, this isn't a migration project. It's an operating model shift. That's where things start to get real.
The Friction Doesn't Show Up Until After You Move
Here's the part that catches people off guard. You get through the migration. The workloads are running. Technically, everything works. That's when the real work begins. Because now you have to operate it.
Running infrastructure outside of a hyperscaler environment introduces a different kind of complexity. You don't have the same guardrails. You don't have the same baked-in tooling. You're responsible for more of the stack, whether you planned to be or not. At the same time, your team might be incredibly capable in the cloud, but that doesn't automatically translate to running hybrid environments or managing bare metal at scale. There's a skills gap that shows up quietly, then all at once.
Most organizations don't move everything in one shot. So now you're operating across environments. Part cloud, part on-prem, maybe part colocation. Each with its own behavior, its own tooling, its own expectations. That's where complexity compounds.
Then comes what I call the "day two problem." The system works. However, it's not optimized. It's not scalable in the way you expected — and it's not fully aligned with why you made the move in the first place.
That's the moment where teams realize the migration wasn't the hard part. Running it well is.
What the Most Successful Teams Do Differently
The teams that navigate this well approach it differently from the start. They don't treat it like a one-time move. They treat it like a transformation. They involve the right stakeholders early. Finance, operations, compliance, not just IT. Everyone who's going to feel the impact of these decisions.
Instead of forcing everything into a single model, they spend time understanding where workloads actually belong. They think through cost, not just at migration, but over time. They don't only think about how it will be built, they plan for how the environment will be run. Increasingly, they don't try to do it alone.
They work with partners. Those who can help them evaluate trade-offs, design architectures that make sense, guide the migration process, and stay involved long enough to make sure the end state actually delivers on the promise.
Because at this point, the line between vendor and partner matters. A lot.
What This Actually Looks Like in Practice
Let's take this out of the theoretical for a minute, because this is where most teams either get disciplined or get into trouble. If you're going to do this right, you need to treat it like a working session, not a strategy deck. Get the right people in a room. Architecture, operations, finance, security. Close the laptops. Start writing things down.
Here's exactly how I would run it.
Start with a whiteboard and list your top 10 workloads. Not everything. Just the ones that matter most to the business.
For each one, force the team to answer, in plain language:
What does this workload actually do for the business?
What breaks if it slows down or goes offline?
What does "good performance" actually look like in numbers?
If you can't quantify that, you're flying blind.
Next, take each workload and document its reality, not what you think it is.
Where is it running today?
What does it cost per month, fully loaded?
What are the peak usage patterns?
What other systems does it depend on, and where are those running?
This is where people start realizing how interconnected everything really is. Then, and only then, you evaluate placement.
For each workload, make the team choose one of four three options:
- Belongs with a hyperscaler
- More suitable for bare metal
- Move to colocation/on-prem
- Requires hybridization
However, here's the rule. You don't get to choose based on preference. You have to justify it across four criteria:
- Performance requirements
- Cost predictability over 12–24 months
- Compliance and data control
- Operational complexity
If you can't defend the decision across those four, it's the wrong placement.
After that, shift the conversation to operations. This is where most teams fall apart. Pick one of the workloads you plan to move and walk through a failure scenario.
It's 2:00 a.m. Something breaks.
Who gets alerted?
What tool tells you there's a problem?
Who is responsible for responding?
What's the first action they take?
How do they know if the fix worked?
If nobody in the room can answer that step by step, you're not ready to run that workload outside the cloud.
Then do the same thing for scaling. Traffic doubles unexpectedly.
What happens?
Is scaling manual or automated?
How long does it take?
Who approves the cost impact?
Again, if the answers are vague, that's a gap you need to close before you move anything.
Now bring finance into it in a real way. Don't ask "is this cheaper." That's the wrong question. Instead ask:
What will this cost every month at baseline?
What causes that number to change?
Can we forecast that within a 10–15 percent range?
If finance can't model it, you haven't actually solved the problem that got you here in the first place. Finally, define success in a way that forces accountability.
Pick three metrics per workload. No more.
- Cost per month
- Performance benchmark
- Availability target
Write down what those numbers are today, and what they need to be after the move. If those numbers don't improve, or at least become more predictable, then the move didn't work. It just moved the problem.
This is the work most teams try to skip. It's detailed. It's uncomfortable. It forces trade-offs. However, this is also where the outcome gets decided — long before anything is migrated.