Value realisation for next-gen AI services – field lessons

6–9 minutes

For more than a decade, enterprise software has been priced by the seat. You bought a licence, you handed it to a user and the cost for use was fixed. Agentic AI quietly breaks that model. When an assistant can run a long, multi-step job on your behalf — reading across files, calling tools, drafting, checking its own work — the cost stops being a flat monthly fee and starts moving with what people actually do. Every task carries a variable cost.

Over the last few months I’ve been leading an innovation motion for exactly this shift (disclosure): taking a new agentic “co-working” capability from its earliest release (Copilot Cowork was announced in March and just gone GA) through general availability, and building the discipline that lets organisations adopt it without their spend running away from them. Here’s what I learned doing it.

What was done

The goal was never “sell the feature.” The technology is genuinely good; people who try it want to keep it. The real problem to solve was everything around the technology: how you turn it on responsibly, who gets it first, how finance keeps sight of the cost, and how you prove the value is worth the meter.

Concretely, over the period we:

  • Ran rapid customer engagements, deliberately spanning two modes — a small number of deep, hands-on “white-glove” rollouts, and a broader one-to-many programme reaching many organisations at once. The two modes teach you different things, and we tried hard not to lose the learning from either.
  • Distilled all of it into a field playbook — an internal guide for the specialists who sit with customers: how to position the service, configure it safely, govern consumption, handle objections, and measure value.
  • Built a customer “runbook” — a repeatable, staged programme you can actually run an organisation through, from align and prioritise to implement, optimise and scale, rather than a slide deck you talk at them.
  • Developed the cost-optimisation guidance that turns “this feels unpredictable” into something a finance team can plan around.

Note that a playbook and a runbook are two different artefacts. The playbook is the broad body of knowledge (positioning, security, competition, governance), generally it has an internal focus. The runbook is the sharp, clean thing you take to a customer. Keeping them separate — and prioritising the customer-facing runbook under deadline — has so far been a good call.

What worked

It’s not a product problem. It’s a rollout problem. This was the single most clarifying insight. Customers weren’t looking for reasons to drop the tool; they were looking for ways to keep using it. The friction lived almost entirely in commercialisation, governance and change management. That reframes the whole job: you’re not defending a product, you’re de-risking an adoption.

“Managed consumption” beats “just turn it on.” The rollouts that went well never opened with “enable it for everyone.” They opened with control: a focused cohort of high-value users, their specific use cases, clear cost ownership, guardrails in place, and early usage data guiding expansion. Leading with control and measurable value consistently lowered customer anxiety and shortened approval cycles. Anxiety, not doubt about the technology, was the thing slowing deals down.

Prompt quality is also a cost lever, not just a quality lever. Because the work is metered, how you ask changes what you pay. In side-by-side tests, a well-specified prompt used roughly half the credits of a vague “you figure it out” prompt for the same deliverable — and sending the assistant off to hunt for files and sources it wasn’t pointed to could more than double the cost again. Defining the output, the structure, the source material and the success criteria up front turned out to be the simplest, most effective cost control available to an ordinary user. That’s a genuinely useful thing to teach and maybe will help embed the quality angle: quality input = quality output = lower cost.

Model choice is a governance lever. Where the assistant can run on more than one underlying model, the choice has a real cost signature. The automatic router was consistently the most economical option; lighter models cost less than heavier ones; and on an identical task the gap between the cheapest and most expensive model was roughly two-to-one. “Which model, for which job” is now a legitimate cost conversation, not just a capability one.

“When do I use this versus plain chat?” is worth answering early. A lot of unfavourable cost comparisons came from using an expensive, multi-tool agent for a job a quick chat answer would have handled (“what’s my first meeting tomorrow?”). Giving people a simple rule of thumb — chat to think quickly, deep research to investigate, the agent to get compound work done across tools — prevented both wasted spend and unfair “it’s too expensive” verdicts.

The capability itself lands the moment someone watches it finish a real job. Set the cost conversation aside for a second — purely as a piece of technology, Cowork as an assistant earns its keep. The “aha” was never a feature tour; it was watching it take a single compound instruction — summarise these files, draft the exec brief, build the deck, and turn it into a web dashboard — and carry it end to end across files, email, calendar, chat and document libraries, then hand back finished work. In enablement sessions it was consistently the standout capability, and customer reaction was immediate.

What was hard

Cost predictability is hard – learn by using is better. Customers forecasting spend before they consumed is really hard when the factors are so variable per task: model used, actions taken, connections leveraged, etc. People want “cost-before-you-run” visibility but persona or workload-based forecasting is seriously hard and blocks everything.

You cannot reliably translate “we run 500 tasks a week” into a budget number, because task complexity varies enormously. Estimation has to be consultative: sample real usage, gauge intensity, model a single persona, then recalibrate against live data. Setting that expectation early is best.

Governance maturity gates everything. Customers will accept consumption-based billing — but only once the guardrails exist: budgets by group, delegated administration, approval workflows, charge-back, sensible defaults. Where those controls are still maturing, even keen customers will hold back, understandably. Sometimes, adoption moves at the speed of governance, not the speed of the technology, especially in the enterprise.

It’s hard to separate the service from the cost. Because this was the launch of a new service billed in a separate way to the norm (user based licensing), a recurring pattern emerged: the appetite to control spend ran ahead of the product/service itself. The cost became the thing. Reporting lagged real-time behaviour; controls arrived at the group level before they arrived per user; the granularity customers wanted wasn’t always there yet. None of this is fatal, but the product and the value it offers was almost shadowed by the cost controls. The lesson is to clearly distinguish the value of the product/service from the cost controls, as inextricable as that has now become with AI, token-metred services.

The product has its own friction, ignored at risk. The biggest challenge: “when do I actually use this, versus chat, versus research, versus an agent?” That ambiguity is cheap to pre-empt and expensive to ignore: without a crisp rule of thumb, people either reach for the assistant on trivial lookups it is overqualified for, or never reach for it on the compound jobs where it shines. Output quality proves unusually sensitive to how you ask, too: a broad “you sort it out” instruction produces weaker results and more back-and-forth than a well-scoped one, so there’s a real learning curve in writing a good task spec. People also want to compare before they trust: with several models on offer, customers want a way to benchmark one model against another.

The conclusion so far

If there’s one meta-lesson, it’s this: the hard part of agentic AI adoption is rarely the AI or the cost. It’s readiness, communication, governance and executive alignment — the same muscles every serious technology change needs, applied to a cost model most organisations have never had to manage before.

It’s still early. The tooling is still catching up to the ambition, the estimation is still consultative, and the governance patterns are still settling. But the shape of the disciplines are clear now and there are two in the main.

There is the financial-operations discipline. This is separate to the AI in the enterprise discipline which is also evolving. We started with Knowledge (retrieval, summarization, Q&A in Chat, etc.), to reasoning (planning, analysis and decision support) and we are now at the point where it can take actions on our behalf (via tools, APIs and workflows). All need different muscle and attention.

Fediverse reactions

Leave a Reply