Most conversations about AI adoption skip straight to capability — what the model can do, how fast it can do it, how much time it saves. Those are the right questions eventually. But for any business handling sensitive data, there's a prior question that rarely gets asked: where does your data actually go, and who can reach it once it's there?
We come at this from three decades in global finance, where the answer to that question was never optional. Out-of-jurisdiction data had to be kept strictly partitioned — never comingled — to satisfy Swiss, Singaporean, and other privacy regimes. The AI era hasn't softened that requirement. It has made it harder to satisfy, because AI workflows pull your most sensitive data into places it has never been before.
"Where it's stored" is no longer "who can reach it"
The intuitive model of data residency is geographic: keep the servers in Frankfurt and the data is protected by EU law. That model is now incomplete.
Under the United States CLOUD Act (Clarifying Lawful Overseas Use of Data Act), a US-headquartered provider can be compelled by US legal process to produce data it controls — regardless of where that data is physically stored. A server in the EU operated by a US company is still reachable. Add FISA Section 702, and the surveillance exposure compounds. Meanwhile, the United States has no comprehensive federal privacy law to push back with — only a patchwork of sectoral rules (HIPAA, GLBA) and state laws (CCPA). The protection an individual or a business can rely on is thinner than most assume.
The EU takes a different stance. Under the GDPR, data protection is treated as a fundamental right, not a commercial courtesy. That difference in philosophy is the whole reason "keep EU data in the EU" became a board-level concern — and why it isn't satisfied simply by choosing an EU region in a US provider's console.
We're a US company, and we're not here to run down anyone's legal system. The point is narrower and practical: if you have data that must not leave your jurisdiction or your control, the provider's headquarters and legal exposure matter as much as the server's postal code.
The AI-specific risk: your data as someone else's training data
There's a second exposure unique to AI. When you send private data to a third-party frontier model, you are trusting that provider's policy on whether your inputs are retained, reviewed, or used to train the next version of the model.
To be fair: many providers now state that they don't train on business or API data by default, and that's a meaningful improvement. But notice what you're relying on — a policy, which can change, be misread, or be undermined by a misconfigured integration. For genuinely sensitive data, "we trust their policy" is a weaker position than "their policy is irrelevant to us."
That's the real argument for local inference: when the model runs on your own hardware or inside your own jurisdiction-bound environment, there is no third-party policy to trust, because your data never leaves to be governed by one. The model comes to your data, not the other way around.
Sovereignty is a set of choices, not a switch
Here's the honest part most vendors won't tell you: nobody can wave a wand and make your AI "sovereign." Data sovereignty is the result of a series of deliberate decisions, and they have to be made consistently or the weakest link defines your real exposure. In practice, it comes down to three commitments:
1. In-jurisdiction infrastructure — including backups. Compute, storage at rest, and backups all need to stay within your chosen jurisdiction, on local hardware or cloud operated under that jurisdiction's law. This is where most "compliant" setups quietly fail: production lives in-region while backups replicate to a default offshore bucket. Backups are data. They count.
2. In-jurisdiction software. Favor regionally-built software, or open source you can self-host on-premises or in an appropriate cloud. Every third-party SaaS tool in the chain is another processor in another jurisdiction whose sub-processors and policies you now have to validate. Minimizing SaaS isn't dogma — it's reducing the number of places your data can be reached.
3. Local inference for the data that matters most. Not everything needs to run locally. But the data you genuinely cannot afford to expose should get its AI leverage from a model running inside your boundary, so it never becomes training data or discovery material for anyone else.
A pragmatic middle path
None of this means going fully air-gapped or rejecting the cloud. Most businesses end up with a tiered approach: routine, low-sensitivity workloads use the best available models for convenience and capability, while a defined class of sensitive data is walled off and served by local or in-jurisdiction inference. The skill is in drawing that line deliberately — knowing exactly which data sits on which side and why — rather than discovering after an incident that everything was on the wrong side.
That tiering is exactly the discipline financial institutions have applied to information barriers for decades. It transfers cleanly to AI.
Where to start
If data sovereignty matters to your business, the first step isn't buying anything — it's mapping where your data lives today and where it travels the moment an AI workflow touches it. That map almost always surfaces a few surprises (the offshore backup, the SaaS integration nobody validated, the "EU region" run by a US provider). Once you can see it, you can decide which data needs the stronger standard and architect for it.
And if sovereignty isn't a priority for your business — that's a legitimate choice too. Plenty of work doesn't warrant the extra rigor, and business-as-usual AI will serve you well. The mistake isn't choosing convenience; it's choosing it by accident, for data that deserved better.