EngineeringMay 14, 2026·7 min read

Observability is the first feature you ship, not the last

Every MCP author eventually reinvents logs and metrics. Here's what we learned building it into the platform from day one — and why waterfall traces changed our support flow.

The Cognoverge team

Engineering

Every MCP server author eventually adds logging. Every one of them adds it too late. This is a note about why we built observability into Cognoverge from day one — and what it changed about our support flow.

The bug you can't reproduce

The single most common support request we get is: "my server works most of the time but sometimes returns 500 and I don't know why." Without observability, this is unresolvable — the author has no logs, no traces, no way to know what caller hit what path. Half the time they don't even know it's failing until a user complains.

With observability shipped in the runtime, this becomes a thirty-second answer: "here's the failed call, here's the trace, here's the specific downstream API that returned 429." The author fixes it and moves on.

Why waterfall traces changed our support

We didn't ship waterfall traces at first — just structured logs. Six weeks in, we realized most support issues were about latency, not errors. "My server is slow, why?" And logs alone can't answer that: they tell you a request took 812ms, but not where those 812ms went.

Once we added waterfall traces — a hierarchical view of every span inside a request, from auth check to downstream API call to response encode — support times dropped by half. The author can see instantly that 700ms of the 812ms was a slow Notion API call with an automatic retry, not their code.

Retention decisions matter more than you'd think

We started with 7-day retention because storage is expensive. Users hated it. The bugs they wanted to investigate were often weeks-old — a slow drift that only became obvious after a cohort of new customers reported it.

We now retain 30 days on Pro and 90 days on Team. The storage cost turned out to be trivial (R2 is cheap; MCP payloads are small). The correctness win — being able to look back — was gigantic.

The rule we settled on

Observability is not a feature you add for "grown-up" customers.It's the feature that determines whether your product feels serious the first time someone hits an unexpected failure. Ship it before pricing tiers. Ship it before OAuth. Ship it as literally the first piece of infrastructure past "a working request path."

Tagged

observabilitylogstraces

Related · Engineering

Keep reading

All posts

Engineering

Why OAuth for MCP is a nightmare (and how we fixed it)

June 22, 20269 min

Engineering

Stripe Connect for MCP servers: what we learned building it

June 15, 202611 min

Try what you just read.

Everything in this post runs in the platform we built. Free forever tier. Deploy in 30 seconds.

Deploy free Read the docs

loading workspace…

The bug you can't reproduce

Why waterfall traces changed our support

Retention decisions matter more than you'd think

We now retain 30 days on Pro and 90 days on Team. The storage cost turned out to be trivial (R2 is cheap; MCP payloads are small). The correctness win — being able to look back — was gigantic.

The rule we settled on