docset2md: Giving AI Agents Offline Documentation Access

I built a CLI that converts documentation docsets to Markdown files—because Apple's online docs don't work without JavaScript, and I don't want MCPs in my workflow.

Apple’s developer documentation is a bit of a mess for AI agents. Try fetching the UIView docs directly—you get a blank page with a polite “This page requires JavaScript” message. The actual content loads dynamically, which means web scrapers, curl, and AI coding assistants all see… nothing useful.

This is a problem when you’re working with coding agents and need quick access to API documentation. Sure, the AI knows a lot from training data, but training data gets stale. The exact signature of that new Swift 6 API? The deprecation notice on that method you’re about to use? That’s not in the training set.

So I built docset2md.

What It Does

docset2md takes those .docset bundles—the same format used by Dash, Zeal, and other documentation browsers—and converts them to plain Markdown files. Structured, searchable, AI-readable Markdown.

docset2md ~/Library/Application\ Support/Dash/DocSets/Apple_API_Reference/Apple_API_Reference.docset -o ./apple-docs --index

Run that, and you get a folder full of organized documentation:

apple-docs/
├── search.db        # SQLite FTS5 index
├── search           # Standalone search CLI
├── swift/
│   └── uikit/
│       ├── _index.md
│       ├── uiview.md
│       └── uiview/
│           ├── addsubview.md
│           └── layoutsubviews.md
└── objective-c/
    └── ...

Every class, method, property, and protocol gets its own Markdown file. Internal links work. The hierarchy is preserved. And there’s a search index so you can find what you need without grepping through thousands of files.

Why Not Just Use Dash’s MCP?

Dash actually has an MCP server now. Kapeli—the guy behind Dash—built it, and there are a few third-party implementations too. So why didn’t I just use that?

I’ve already written about why I don’t like MCPs for my workflow. The short version: they eat context tokens, add protocol overhead, and introduce another layer of abstraction between the AI and the data. Every MCP tool definition lives in the context window whether you’re using it or not.

With flat files, there’s no overhead until you actually read something. Every coding agent already knows how to read files and run shell commands. There’s nothing new to learn, no tool definitions to inject, no server to keep running.

The search CLI that docset2md generates? It’s a single binary. Run ./search "security scoped URL" and you get a list of matching files with relevance ranking. That’s it. No daemon, no protocol, no token tax.

The Apple Docs Problem

Here’s the thing that finally pushed me to build this: Apple’s online documentation is hostile to programmatic access.

Visit developer.apple.com/documentation/uikit/uiview in a browser and everything looks fine. Visit it with curl or fetch and you get a loading placeholder that never loads. The content comes from JavaScript-based client-side rendering—the server sends a shell, the browser executes code, the docs appear.

This might be fine for human browsing, but it breaks:

  • Web scrapers
  • wget/curl
  • AI coding assistants trying to fetch current docs
  • Basically any non-browser HTTP client

The docset format sidesteps this entirely. Apple compiles the same documentation into docsets for Xcode, and those docsets contain the actual content—pre-rendered, ready to read. No JavaScript required.

How I Use It

I’ve converted the full iOS and macOS documentation and put it in my agents directory:

~/Agents/API Documentation/Apple/
├── search
├── search.db
├── swift/
└── objective-c/

My global agent config (AGENTS.md) points to this location and tells the agent there’s a search binary inside each documentation folder. But that’s just the pointer—the real magic is in the skill file.

I’ve got a 200+ line skill file called apple-docs that teaches the agent how to actually use this stuff. It covers:

  • Basic searches: How to find classes, methods, protocols by name
  • Filtering: Narrowing results by type (--type Class), framework (--framework UIKit), or language (--language swift)
  • Boolean queries: Combining terms with AND, OR, NOT
  • Following links: The Markdown files have relative links to related types—the skill teaches the agent to resolve these paths and explore connected APIs

Here’s what a typical lookup looks like:

~/Agents/API\ Documentation/Apple/search "bookmark*" --language swift --type Method

The search returns file paths. The agent reads the relevant Markdown and has accurate, current documentation at its fingertips. If that doc links to related types like BookmarkCreationOptions, the agent can follow the relative path and keep exploring.

func bookmarkData(options: [URL](../url.md).[BookmarkCreationOptions](./bookmarkcreationoptions.md) = [])

The skill file also documents common workflows—“find an API you vaguely remember,” “explore a framework,” “compare Swift vs Objective-C signatures.” It’s essentially a reference manual that lives in the agent’s context only when needed.

For common lookups, this is faster than going to Apple’s website myself. The search uses SQLite FTS5 with BM25 ranking—the same full-text search tech that powers much bigger systems—so it finds relevant results quickly even across hundreds of thousands of entries.

Supported Formats

docset2md handles three main docset formats:

  • Apple DocC: The modern format Apple uses, with Brotli-compressed content stored as DocC JSON
  • Standard Dash: The traditional format with HTML content, used by most third-party docsets
  • CoreData-style: An older format with different database tables

The tool auto-detects which format it’s dealing with. For Apple docsets, it can also download missing content from Apple’s API if something’s incomplete in the local bundle.

The Search Index

The --index flag generates two things: a SQLite database and a standalone search binary compiled with Bun.

The search CLI supports:

  • Basic term search: ./search UIWindow
  • Prefix matching: ./search bookmark*
  • Phrase matching: ./search "table view"
  • Boolean queries: ./search "view AND controller"
  • Filtering by type, framework, or language
# Find all URL-related methods in Foundation
./search "URL" --type Method --framework Foundation --language swift

The results include the file path, so you can pipe them to other tools or have the agent read them directly.

Building It

docset2md is written in TypeScript and runs on Node.js. Install it globally:

npm install -g docset2md

Or run it with npx:

npx docset2md info ~/docsets/Example.docset

The code is on GitHub at domzilla/docset2md if you want to poke around.

The Bigger Picture

This is part of my ongoing experiment with building CLIs for AI workflows. The tools that work best with coding assistants are the ones that:

  • Read and write files
  • Accept flags instead of interactive prompts
  • Output parseable text
  • Don’t require persistent servers or background processes

docset2md fits that mold. Convert once, search forever, no maintenance required.

If you’re working with AI coding assistants and want offline access to documentation—especially Apple’s JavaScript-heavy docs—give it a shot. It’s solved a real annoyance for me, and maybe it will for you too.