Every MCP author eventually reinvents logs and metrics. Here's what we learned building it into the platform from day one — and why waterfall traces changed our support flow.
Every MCP server author eventually adds logging. Every one of them adds it too late. This is a note about why we built observability into Cognoverge from day one — and what it changed about our support flow.
The single most common support request we get is: "my server works most of the time but sometimes returns 500 and I don't know why." Without observability, this is unresolvable — the author has no logs, no traces, no way to know what caller hit what path. Half the time they don't even know it's failing until a user complains.
With observability shipped in the runtime, this becomes a thirty-second answer: "here's the failed call, here's the trace, here's the specific downstream API that returned 429." The author fixes it and moves on.
We didn't ship waterfall traces at first — just structured logs. Six weeks in, we realized most support issues were about latency, not errors. "My server is slow, why?" And logs alone can't answer that: they tell you a request took 812ms, but not where those 812ms went.
Once we added waterfall traces — a hierarchical view of every span inside a request, from auth check to downstream API call to response encode — support times dropped by half. The author can see instantly that 700ms of the 812ms was a slow Notion API call with an automatic retry, not their code.
We started with 7-day retention because storage is expensive. Users hated it. The bugs they wanted to investigate were often weeks-old — a slow drift that only became obvious after a cohort of new customers reported it.
We now retain 30 days on Pro and 90 days on Team. The storage cost turned out to be trivial (R2 is cheap; MCP payloads are small). The correctness win — being able to look back — was gigantic.
Observability is not a feature you add for "grown-up" customers.It's the feature that determines whether your product feels serious the first time someone hits an unexpected failure. Ship it before pricing tiers. Ship it before OAuth. Ship it as literally the first piece of infrastructure past "a working request path."
Everything in this post runs in the platform we built. Free forever tier. Deploy in 30 seconds.