Vision · 01 — AI Governance

AI as an accountability question.

The interesting question is not whether a model is impressive. It is whether your ministry, hospital, or supervisor can be held to account for a decision in which AI was a step.

  1. i.The claim

    AI in the public sector is an accountability problem before it is a capability problem. The institution, not the model, carries the consequence.
  2. ii.Why it matters

    The EU AI Act is in force; the high-risk regime begins in August 2026. Conformity does not absolve the deploying institution. Audit, court, and parliamentary scrutiny land on the institution that put the system in production.
  3. iii.Operational test

    For every AI-shaped decision: can you produce, on demand, the prompt, model version, evaluation evidence, human-in-the-loop record, rollback plan, and termination criterion? If not, you are not yet ready to deploy.
  4. iv.What I help with

    Ministerial framing of AI policy. Evaluation and procurement discipline for public-sector AI. Board-level AI governance. Legislative drafting and translation between Brussels English and member-state practice.

The mistake European institutions keep making with artificial intelligence is treating it as a capability question instead of an accountability question. The capability question is loud — every six months a new model arrives that scores higher on something, and a fresh round of pilot projects begins. The accountability question is the one a minister, a CIO, or an audit board actually has to answer when the regulator, the parliamentary committee, or the citizen complaint arrives: when a decision in our pipeline was AI-shaped, can we explain it, justify it, and take responsibility for it?

That is the only question that survives a parliamentary inquiry, a court case, or a Friday-evening news cycle. Everything else is procurement.

The accountability problem

The EU AI Act has now been in force long enough that the early-bird excitement has worn off. The general-purpose AI provisions began to apply in August 2025; the high-risk system rules begin in August 2026. Implementing acts, harmonised standards, and the early conformity assessments are still being negotiated.

The Act is good legislation. It is also infrastructure that institutions still treat as somebody else's responsibility. The compliance team waits for the regulator's guidance. The procurement team waits for the vendor's certification. The operational team waits for both. None of those waits eliminate the institution's accountability for the output. A municipality whose social-services chatbot misroutes a vulnerable applicant cannot defend itself by saying the model passed conformity. A bank whose credit decisions silently shift weight to a fine-tuned classifier cannot defend itself by pointing at the model card.

Accountability is the one part of the AI stack that cannot be outsourced. The technical pieces can be procured; the governance posture cannot.

Evaluations are not benchmarks

A model card lists what a system can do on synthetic tests. It does not tell you what it does on your data, in your workflow, with your failure modes. A leaderboard score is a marketing artefact. An evaluation is task-bound, deployment-aware, repeatable, and instrumented to fail loudly.

This is the discipline missing from most public-sector AI procurement. The vendor pitches the benchmark; the buyer should run the eval. A working evaluation answers four questions: How does the system perform on a representative sample of our live cases? How does it fail, and how often? How quickly do we notice the failure? What is the cost of the failure to the person on the other end?

If your procurement process cannot answer those four for any AI component you are about to deploy, the procurement is not yet ready to sign.

Deployment is where the policy actually lives

Policy lives in deployment, not in legislation. The most thoughtful AI Act articles are made meaningless by an unmonitored chatbot, an over-trusted decision-support tool, or an integration that bypasses the oversight committee because it was scoped as "process automation." The legal status of a system shifts the moment its output is read by a caseworker who treats it as an instruction.

Deployment discipline has four parts. A human in the loop where the law and the consequence require it — and that loop has to be load-bearing, not theatrical. Logging that survives an audit, with the actual prompt, model version, and decision recorded together. Rollback plans that have been tested in anger. A defined termination criterion: the metric or threshold at which the system is taken out of production, written down before the system goes in.

Most current public-sector deployments have one or two of those four. Almost none have all four.

Capability without spectacle

The frontier matters. So do the open-weight models that now run on a 64 GB laptop. The interesting choices for an accountable institution are not between OpenAI and Anthropic and Google. They are between capability that you can govern and capability that you cannot.

That choice is rarely about which model. It is about where the model lives, how its outputs are reviewed, and what happens when it is wrong. A small open-weight model running on infrastructure you control, with an evaluation suite, a human-in-the-loop, and a rollback plan, is a more accountable system than the most capable hosted frontier model with none of those things.

Capability without spectacle is the working principle. The demo is not the deployment. The press release is not the policy.

What I work on

Ministerial-level AI policy framing — including the 2026 Finnish report on new technologies and digital resilience. Evaluation methodology and procurement discipline for public-sector AI deployment, in advisory work with regulators and operators of essential services. AI governance for boards and supervisory authorities. And the technology working group of the Social Democratic Party of Finland, where AI legislation gets translated from Brussels English into the legislative and party-political language that determines how an Act actually lands in member-state practice.

The companion essay The 2026 AI Landscape — Frontier Above, Floor Below sketches what the model market actually looks like right now, and why the most consequential release of 2026 is the one that costs nothing.

For specific engagements — briefings, board work, evaluation reviews, or speaking — please get in touch.