X

Fish Audio for AI Coding Agents: llms.txt, MCP, and Skills

Fish Audio now ships three native interfaces built for AI agents — llms.txt for navigation, a Docs MCP server for live API lookup, and installable skills for offline-first code generation. Here's what each one does, why it matters, and how to set it up in under five minutes.

May 2026 | Fish Audio agent tooling is now live across llms.txt, MCP, and Skills

Most developer documentation is written for humans. It assumes you open a browser, read a guide, copy a snippet, and switch back to your editor. That workflow is fine when you're working alone. It breaks down the moment your coding agent is the one doing the reading.

AI coding agents — Claude Code, Cursor, Codex, Windsurf, and a growing list of others — need LLM-friendly documentation in a fundamentally different form. They don't browse. They fetch. They don't skim headings; they parse structure. And when a context window fills up, unstructured documentation becomes noise that crowds out the code.

We've seen this firsthand. Developers integrating Fish Audio into LLM pipelines kept running into the same class of errors: coding agents generating authentication code for the wrong endpoint, pulling deprecated model IDs from training data, or constructing WebSocket payloads against an outdated schema. The problem wasn't the API — it was that agents had no reliable way to access current, structured documentation at generation time.

Fish Audio now ships three purpose-built interfaces to solve this: llms.txt for AI agent navigation, a Docs MCP server for live documentation lookup, and Agent Skills for offline-first code generation. Fish Audio ships all three as first-class developer features — each one independently usable, and all three are designed to work together as an agent-native documentation layer for any coding agent workflow.

Already using Fish Audio? Fetch https://docs.fish.audio/llms.txt and point your agent at it now — no additional configuration required. Get started in the Developer Panel →

llms.txt: How AI Agents Navigate Your Docs

What Is llms.txt?

Comparison showing how llms.txt gives AI agents a structured entry point versus crawling an unstructured docs site

llms.txt is an emerging open standard that gives AI agents a clean, structured index of a website's most important content. Defined at llmstxt.org, the format is a Markdown file placed at the root of a domain — a curated list of links with short descriptions, organized into meaningful categories.

Think of it as a robots.txt for LLMs — except instead of telling agents what to avoid, llms.txt tells them exactly where to start. Fish Audio uses llms.txt to give coding agents a structured, low-noise entry point into its API documentation.

Most documentation websites have hundreds of pages. When a coding agent pulls in an entire docs site cold, it wastes context window tokens on content that isn't relevant to the task — changelog entries, deprecated endpoints, marketing copy. A well-crafted llms.txt filters that down to a curated set of high-signal entry points, which means faster responses, lower token costs, and more accurate code generation.

The standard also defines llms-full.txt — a broader variant that includes fuller page content for agents that need deeper context. Both are plain Markdown, which every LLM can parse without any preprocessing.

Fish Audio's llms.txt and llms-full.txt

Fish Audio publishes two versions, both available without authentication:

docs.fish.audio/llms.txt — a curated, low-noise index organized into six categories: Start Here, API Specs, Core REST API, SDKs, Product Guides, and Operational Docs. The file opens with an Agent Quickstart link and a direct path to the AI Coding Agents guide, so any agent can orient itself in a single fetch. Every link points to a .md file — not HTML — so agents parse content directly without stripping markup.

docs.fish.audio/llms-full.txt — a broader version that includes the full emotion reference, all SDK pages, every REST and WebSocket endpoint, and extended guides for voice cloning, real-time streaming, and phoneme control across English, Chinese, and Japanese.

Here's a simplified llms.txt example showing the structure Fish Audio uses:


# Fish Audio

> Canonical documentation index for Fish Audio APIs, SDKs, models,
> voice cloning, real-time streaming, and self-hosting.

## Start Here
- [Agent Quickstart]: Minimal-noise entry point for AI agents
- [Quick Start]: Generate your first AI voice in under 5 minutes
- [AI Coding Agents]: Connect coding assistants via MCP

## Core REST API
- [Text to Speech Endpoint]: Convert text to speech
- [Speech to Text Endpoint]: Transcribe audio to text
- [WebSocket TTS Streaming]: Real-time streaming via WebSocket
...

The llms.txt standard has seen rapid adoption across developer tooling and AI infrastructure — with companies including Anthropic Claude, Perplexity, Cloudflare, Vercel, Cursor, ElevenLabs, and Coinbase already publishing their own implementations. Fish Audio ships a fully structured implementation across llms.txt, MCP, and installable agent skills — each layer independently usable and designed to work together. The "Start Here" section is specifically designed to give coding agents a decision tree, not just a link list.

How an Agent Uses It in Practice

When you ask a coding agent to "implement Fish Audio TTS in Python," a well-configured agent fetches llms.txt first, identifies the relevant pages (Python SDK, TTS Endpoint, Authentication), pulls those pages as Markdown, and generates code from the current documentation — not from training data that may be months out of date.

This matters more than it sounds. API schemas change. Model IDs get deprecated. Emotion tag syntax evolves between model generations. Without a live documentation fetch, an agent is generating code against a snapshot of the API that may no longer work.

The two-file approach gives agents a natural escalation path: start with llms.txt for a focused, low-token index; escalate to llms-full.txt when a task requires deeper context like the full emotion reference or edge-case streaming behavior.

Already building with Fish Audio? Point your coding agent at docs.fish.audio/llms.txt and stop generating outdated API calls. Get started in the Developer Panel →

Docs MCP: Real-Time API Lookup for Coding Agents

What Is MCP?

Diagram showing how the Fish Audio MCP server connects a coding agent to live documentation

MCP (Model Context Protocol) is an open protocol that allows AI coding agents like Claude Code and Cursor to fetch live documentation and external data during code generation — without leaving the editor.

Fish Audio uses MCP to expose its full API documentation as a real-time retrieval layer inside coding agents. When you connect the Fish Audio MCP server, your agent can answer questions like "what emotion tags does Fish Audio support?" or "what's the rate limit on the TTS endpoint?" by fetching the current answer from published documentation, rather than relying on training data that may be months out of date.

Setting Up the Fish Audio MCP Server

The Fish Audio Docs MCP server is available at https://docs.fish.audio/mcp. Setup takes one command.

MCP Setup: Step-by-Step Tutorial

The following walkthrough uses Claude Code as an example. Fish Audio's MCP server also supports Cursor and Windsurf — see the editor-specific setup links below.

Step 1 — Run the install command

Open your terminal in your project directory and run:

claude mcp add --transport http fish-audio --scope project https://docs.fish.audio/mcp

This creates a .mcp.json configuration file in your project root. The --scope project flag means the server is available to everyone working in this project directly.

Step 2 — Verify the connection

claude mcp list

You should see fish-audio in the list of configured servers. If it doesn't appear, check that you're running the command inside a project directory.

Step 3 — Test it

Ask Claude Code directly: "What Fish Audio models are currently available?" or "How do I authenticate with the Fish Audio API?" If the MCP server is connected, Claude Code will fetch the answer from the live documentation rather than relying on training data.

Common issues:

If the server doesn't appear in the claude mcp list, confirm you have the latest version of Claude Code installed. If you prefer the server to be available across all your projects rather than just one, replace --scope project with --scope user.

New to the Fish Audio API? Start with the API Introduction → to understand authentication, endpoints, and response formats before connecting the MCP server.

Claude Code (quick reference):

claude mcp add --transport http fish-audio --scope project https://docs.fish.audio/mcp

This creates a .mcp.json file in your project root. Verify the connection:

claude mcp list
# You should see: fish-audio

Cursor: Set up via the command palette. See the Cursor setup guide →

Windsurf: Set up via File > Preferences > Windsurf Settings. See the Windsurf setup guide →

Once connected, your coding agent has real-time access to:

Complete REST API reference with all parameters and response schemas
Python and JavaScript SDK guides and working examples
Best practices for voice cloning and real-time streaming
Model comparison and current pricing and rate limit tables
Troubleshooting guides for common integration issues

What You Can Ask Once It's Connected

The Fish Audio MCP server is designed for natural-language queries inside your editor. A few examples:

Query	What the agent fetches
"How do I authenticate with Fish Audio?"	Authentication guide from the Python or JS SDK docs
"What emotion tags are available?"	Full emotion reference — all 64+ tags across Basic, Advanced, Tone, and Audio Effect categories
"Show me Python code for WebSocket streaming"	WebSocket TTS guide with the current streaming protocol
"What's the difference between S1 and S2?"	Models overview with capability comparison — see also: Fish Audio Open-Sources S2 →
"How do I clone a voice?"	Voice cloning guide with reference audio requirements

Because the MCP server uses live API retrieval from published documentation, answers reflect the latest available API reference. When Fish Audio ships a new model or updates an endpoint, your agent sees it on the next query.

Security: The MCP server provides read-only access to public documentation. No API keys are transmitted through the connection. All requests use HTTPS. No queries or usage data are stored.

Not using Fish Audio yet? Start free → — add the MCP server in under 30 seconds and generate working TTS integrations directly from live documentation.

Agent Skills: Offline-First API Instructions for 50+ Coding Agents

What Are Agent Skills?

Diagram showing how a Fish Audio SKILL.md file is installed and used by Claude Code, Codex, and Cursor

Agent Skills are reusable instruction sets for coding agents — structured SKILL.md files that tell an agent exactly how to handle a specific task, without requiring live documentation fetch at generation time.

Each skill contains a name, a description, and step-by-step instructions the agent follows automatically when a matching task comes up.

Skills are installed into an agent's local skill directory. The exact path varies by agent — for example, Claude Code uses ~/.claude/skills/ globally or .claude/skills/ per project. Once installed, the agent reads the skill without any additional prompting. No MCP server required. No network call at generation time.

The open agent skills ecosystem (maintained by Vercel Labs) defines the specification and ships a CLI — npx skills — for installing, updating, and managing skills. It currently supports 50+ agents, including Claude Code, Codex, Cursor, Windsurf, OpenCode, Gemini CLI, and GitHub Copilot.

Installing the Fish Audio Skill

Fish Audio publishes a ready-made Agent Skill that covers the full REST and WebSocket API: authentication, every endpoint in the OpenAPI schema, MessagePack vs JSON vs multipart encoding rules, multi-speaker dialogue setup, and the WebSocket streaming protocol.

npx skills add https://docs.fish.audio --skill fish-audio-api

The skill is installed in your agent's local directory. Once installed, try asking your coding agent:

"Call the Fish Audio TTS API with curl"
"Stream TTS over WebSocket in Python"
"Set up multi-speaker dialogue with emotion tags like [happy] and [sad]"
"Generate speech with S2 using a [whispering] style"

For the full list of supported emotion tags and advanced delivery controls, see the Fish Audio S2 Fine-Grained Control Guide →

Building a multi-character project? See Text to Speech with Multiple Voices → for a practical setup guide.

The skill provides the conventions — the agent follows them without fetching documentation first.

To install for a specific agent:

# Claude Code only
npx skills add https://docs.fish.audio --skill fish-audio-api -a claude-code

# Codex only
npx skills add https://docs.fish.audio --skill fish-audio-api -a codex

# All detected agents at once
npx skills add https://docs.fish.audio --skill fish-audio-api --all

Run npx skills --help for the full list of supported agent flags.

MCP vs. Skills: Which Should You Use?

Both tools make your coding agent more accurate with Fish Audio. They're optimized for different scenarios.

	MCP	Agent Skills
Documentation freshness	Always current — fetches live	Fixed at install time — run npx skills update to refresh
Network required	Yes	No — works fully offline after install
Best for	Open-ended questions, exploring new features, debugging edge cases	Repeatable tasks, standardized code generation, CI/CD environments
Setup	One mcp add command	One npx skills add command
Works in	Claude Code, Cursor, Windsurf	50+ agents including Claude Code, Codex, Cursor, Windsurf, Gemini CLI

The practical rule: use MCP for live documentation search and exploratory queries. Use skills for reliable, offline-first code generation on known patterns.

In most production setups, using both makes sense. The skill handles standard patterns — authentication, basic TTS calls, WebSocket setup — without a network round-trip. MCP handles the questions you didn't anticipate: new model parameters, updated rate limits, edge cases in the streaming protocol.

Why Traditional Docs Fall Short for AI Agents

Comparison of traditional HTML documentation versus agent-ready Markdown docs for AI coding agents

Traditional API documentation is optimized for human browsing. AI coding agents need something different: structured indexes, low-noise Markdown, and live retrieval paths that reduce stale generations and wasted context tokens.

Most API documentation was designed for a specific workflow: a developer opens a browser, searches for the endpoint they need, reads the page, and copies a snippet. That workflow has worked well for years.

The assumption underneath it — that the reader is a human with a browser — is now worth examining. AI coding agents don't use browsers. They fetch raw content, parse it, and generate code from what they retrieve. The infrastructure that makes docs readable for humans — navigation menus, search bars, rendered HTML, embedded media — adds friction for agents rather than removing it.

A few specific patterns cause the most problems:

HTML as the primary format. Agents can technically parse HTML, but it contains a large amount of structural markup that isn't relevant to the task — layout tags, scripts, navigation elements. A page that's 10,000 characters of HTML might contain 2,000 characters of actual documentation. That gap has a real cost when context windows are finite.

No clear entry point. A documentation site with 200 pages gives an agent no signal about where to start. Without a structured index, agents either pull too much content (wasting tokens) or pull the wrong pages (generating incorrect code).

Content that doesn't age well. Model IDs, endpoint paths, and parameter names change. Documentation that doesn't have a clear versioning or deprecation signal causes agents to generate code against specifications that may no longer be accurate.

None of this is a criticism of how documentation has been built — it was built for the right audience at the time. The practical question now is: as AI coding agents become a significant part of how developers interact with APIs, does your AI agent documentation work for both audiences?

Fish Audio's llms.txt, MCP server, and Agent Skills are our answer to that question — three layers that make the same documentation work as both human-readable API docs and AI-readable docs for LLMs and coding agents alike.

The Full Picture: How All Three Work Together

Diagram showing how Fish Audio llms.txt, MCP server, and Agent Skills work together for AI coding agents

Here's what the complete three-layer setup looks like in a real workflow:

Agent opens your project and encounters a Fish Audio task. It fetches llms.txt first — getting a structured map of all available LLM-friendly documentation before pulling individual pages. Token cost: minimal. Orientation time: one fetch.
Agent generates code. If the fish-audio-api skill is installed, it draws on the skill's conventions for authentication, encoding format, and streaming protocol — no documentation fetch required for standard patterns. The output matches the API spec from the first generation.
Agent needs to verify something specific — a current model ID, a rate limit, an emotion tag syntax for S2. It queries the MCP server and gets the answer directly from published documentation — reducing the risk of stale or incorrect generations.

The result is a coding agent that generates accurate Fish Audio integrations on the first attempt, with less back-and-forth correction and no guessing about whether an endpoint or model ID has changed since training.

Ship voice features faster with agent-native docs. Install the Fish Audio skill once and reuse production-safe TTS patterns across every project. Connect the MCP server and let your coding agent read the docs itself.

Set up MCP → · Install the Skill → · Get started in the Developer Panel →

Sabrina Shu

Sabrina is part of Fish Audio's support and marketing team, helping users get the most out of AI voice products while turning launches, updates, and customer insights into clear, practical content.