Back to Podcast Digest
AI Engineer··20m

Lessons from Scaling GitHub's Remote MCP Server — Sam Morrow, GitHub

TL;DR

  • GitHub learned the hard way that 100+ tools made agents worse, not better — after its MCP server took off in April, Sam Morrow says bloated tool surfaces confused agents and blew context windows, echoing LangChain research that “more tools don't make better agents.”

  • Elegant configurability failed because users stuck with defaults — GitHub built tool sets, dynamic tool selection, and even a RAG-style semantic tool search, but most people never touched the JSON, exposing a real product gap between technically correct solutions and actual usage.

  • The team clawed back performance with ruthless context and output trimming — by narrowing default tools and grouping CRUD actions, GitHub cut initial context load by about 49%, brought the default setup to roughly 40 tools, and reduced some outputs like list pull requests by more than 75% tokens.

  • Reliability improved when GitHub encoded agent intent into the server instead of expecting agents to orchestrate raw APIs — the MCP server now handles multi-call workflows server-side, pushing tool success above 95% and reducing both round trips and hallucination-driven failures.

  • Security is still the thorniest part of MCP at scale — Morrow calls plaintext PATs sitting where agents can access them a real hazard, argues OAuth 2.1 plus PKCE is the safer default, and explains why GitHub rejected dynamic client registration despite user expectations.

  • GitHub is already operating at serious scale while betting the future will swing back toward many more tools — the remote server handles around 7 million tool calls a week today, the repo has nearly 30,000 stars and 126 contributors, and Morrow expects automatic discovery plus compositional tool use to make “thousands of tools” normal soon.

The Breakdown

From breakout launch to tool overload

Sam Morrow opens with a room check — who’s used MCP, who’s used GitHub’s, who has a hot take — and then jumps back to April, when GitHub open-sourced its local MCP server and briefly became the most-starred repo on GitHub that week. The problem arrived fast: with over 100 tools spanning repos, issues, PRs, actions, and projects, agents started getting worse at using GitHub, not better, and context windows got chewed up almost immediately.

The clever fixes nobody configured

To solve that, the team introduced “tool sets,” dynamic tool discovery, and even a RAG-like semantic tool search prototype. But, as Morrow puts it with some exasperation, everyone just used the default settings. His takeaway is painfully practical: elegant solutions that require users to edit JSON usually lose, and some of this may also reflect gaps in MCP spec proposals and client UX.

Cutting context load in half

GitHub then turned to usage data and aggressively optimized for the common case, trimming initial context by about 49% and later shrinking it further by grouping CRUD tools. The default config now lands around 40 tools, and the team has also been attacking output verbosity — in one example, tightening list pull requests cut output tokens by more than 75%. His tone here is very “this is a moving target”: if you haven’t tried the server recently, it’s probably materially different.

Reliability came from hiding complexity inside the server

Morrow says tool success is now above 95%, though some failures remain unavoidable because agents still hallucinate permissions and repo access. The bigger win was redesigning the tool surface to better match agent intent, even if that means the server quietly makes five API calls under the hood. Rather than obsessing over one perfect tool description, the team also runs evals to make sure tools compete correctly with each other — not overfiring, not disappearing.

OAuth, PATs, and the “fun day” of prompt injection headlines

On security, he’s blunt: plaintext access tokens are often long-lived, over-privileged, and sitting where agents can abuse them. That’s why GitHub leaned hard into remote HTTP plus OAuth 2.1, and even helped add PKCE support to GitHub’s authorization server, though Morrow says they rejected dynamic client registration because it creates ugly app-database, identity, and rate-limit problems. He also addresses Invariant Labs’ GitHub MCP prompt-injection exfiltration demo with a mix of honesty and perspective: yes, the tools can enable it, but the vulnerability pattern applies far beyond GitHub MCP and reflects the broader “lethal trifecta” problem.

Using auth to shape the tool surface

One of the cleaner ideas in the talk is using authorization to reduce both risk and failure. PAT users automatically see tools filtered by token scopes, OAuth users can get step-up scope challenges mid-flow, and non-user contexts like Actions lose user-specific tools entirely. Morrow likes this because it removes whole classes of wasted context and dead-end calls without asking users to manually manage tool lists.

Stateless architecture and shipping experiments in public

GitHub runs the remote server as a stateless system with Redis-backed sessions and no session affinity, and Morrow notes that they actually instantiate a fresh SDK-level server on every request, then attach only the tools a user’s config and policies allow. That setup has scaled to around 7 million tool calls a week. On top of that, GitHub ships an “Insiders” mode for feature-flagged experiments, including MCP apps that let users edit an AI-generated issue before posting — a human-in-the-loop touch Morrow says he’s grown to love.

The future: thousands of tools, invisible MCP

Looking ahead, he expects server discovery to become automatic and tool use to become more compositional — more like bash piping, Cloudflare’s code mode, or Anthropic/OpenAI tool-search APIs. He thinks the industry may soon reverse today’s “fewer tools” bias and normalize thousands of tools, ideally without users even knowing what MCP is. He closes with the scale of the moment: 11 million+ Docker downloads for the stdio server, 126 contributors, nearly 30,000 stars, almost 4,000 forks, and a repo getting more than seven issues or PRs per day for over a year — “everything’s mildly on fire,” but in the exciting way.