How StackSight Ingests Your Codebase in 47 Seconds

The Challenge

StackSight needs to understand your entire codebase to generate documentation. That means parsing every file — code, configs, dependencies, READMEs — and building a dependency graph. For large repos, that's thousands of files.

The naive approach (clone, walk, parse) blocks the agent loop for minutes. Users won't wait that long.

Streaming Architecture

We built a streaming parser that processes files as they're cloned. The git clone runs with --filter=blob:none for a treeless clone, then fetches blobs on demand as the parser requests them.

Each file goes through a language-specific parser that extracts: imports, exports, function signatures, class hierarchies, and doc comments. These feed into a dependency graph that StackSight's agent uses to understand relationships.

The 47-Second Benchmark

Our target: analyze a 10,000-file TypeScript monorepo in under 60 seconds. Current best: 47 seconds on a cold clone. The bottleneck isn't parsing — it's network I/O on the initial clone.

What's Coming

Incremental analysis. After the first ingestion, StackSight only re-parses changed files using git diff. That drops subsequent analyses to single-digit seconds.