My Formatter Will Never Be As Fast As Ruff

#zig #performance #parsing #tree-sitter #formatters #python

I've been building a universal code formatter in Zig, and I recently discovered something humbling: my formatter will never be as fast as Ruff on large files. But here's the twist - it's actually faster on small files, and the journey to this realization taught me a lot.

The vision was simple: build a blazing-fast, universal code formatter using tree-sitter. One formatter to rule them all - Python, JavaScript, Go, Java - with consistent formatting logic across languages. Like having gofmt for every language.

I set myself what I thought were ambitious goals:

10x faster than Black (Python's most popular formatter)
~full Black compatibility
Parse and format 10k lines in under 100ms
Support files up to 1M lines

And I actually hit those goals. Kind of. Then a coworker told me about Ruff.

Reality Check

After weeks of optimization, profiling, and embracing Zig's data-oriented design principles (think arrays of structs, cache-friendly layouts), here's where I landed:

Small files (<100 lines): 25-40x faster than Black
Medium files (100-1000 lines): 12-15x faster than Black
Large files (5000+ lines): ~11x faster than Black!

I felt like I had something really special here. Until I added Ruff to my benchmark and compat testing suite:

Small files (<100 lines): 2.7-3.1x faster than Ruff,
Medium files (100-1000 lines): 1.5-3x faster than Ruff
Large files (5000+ lines): 10x slower than Ruff! Oh no...

That last number stung. What did I miss? Do I have a O(n²) I didn't notice? But all good, I just had to do some more profiling, batch processing, and optimization! My algorithm is sound, my data structures are pretty efficient, and tree-sitter is a solid and extremely fast parser. Just a few more tweaks and I would be back on track, right?

Spoiler: nop.

My first instinct was to profile everything. Zig makes this surprisingly pleasant - I built a simple profiler that tracks function calls and timing:

fn formatNode(allocator: std.mem.Allocator, output: *std.ArrayList(u8), node: tree_sitter.Node, source: []const u8, depth: u32) !void {
    const start_time = std.time.nanoTimestamp();
    defer {
        const end_time = std.time.nanoTimestamp();
        profiler.recordCall("formatNode", @intCast(end_time - start_time));
    }
    // ... formatting logic ...
}

The results on a 9,919-line file were eye-opening:

Function                    | Calls     | Total Time (ms) | Avg Time (μs)
----------------------------|-----------|-----------------|---------------
parse                       |         1 |           245 |        245615
formatExpressionWithDepth   |      3099 |            99 |            31
...

Wait, what? Parsing takes 245ms out of 308ms total? That's 79% of the time just building the AST! I had to double and triple check this. I was sure I had algorithmic issues somewhere in my own code, but no, it was just the parser.

Can We Make Tree-sitter Faster?

This sent me on a quest to optimize tree-sitter itself following various ideas I found online:

Parser reuse? Measured it - creating a parser only takes 74μs. Not worth the complexity of managing global state. And that wouldn't help when parsing a single large file anyway.

Custom memory allocators? Tree-sitter allows overriding malloc/free. Could implement an arena allocator or pool allocator for common node sizes. But the complexity didn't seem worth the potential 10-20% gain.

Partial parsing? Tree-sitter has ts_parser_set_included_ranges for parsing only parts of a file. But it's designed for embedded languages, not skipping sections of a single language.

Disable features? Not too helpful, tree-sitter grammars are already minimal by design.

The Uncomfortable Truth

Eventually I found the answer in tree-sitter's GitHub issues. Tree-sitter has a fundamental limitation: it must parse the entire file at once. There's no streaming, no partial parsing for single-language files.

Why? Because tree-sitter builds a complete, lossless syntax tree suitable for any use case - syntax highlighting, code analysis, refactoring, you name it. For formatting, we only need:

Statement boundaries
Expression structure
Indentation levels
Token positions

We're paying for lots of features we don't use.

Why Ruff Wins

Let me be clear: Ruff is an incredible achievement. The Astral team has built something genuinely impressive - a Python formatter that's not just fast, but ridiculously fast. They've managed to be 10-100x faster than Black while maintaining compatibility and adding a full linter. That's not easy. That's engineering excellence.

Ruff is written in Rust and likely uses a hand-written recursive descent parser optimized specifically for Python. My approach for this project is to consider the competition as black boxes to avoid being too influenced, so I won't review the implementation to have a definitive answer (at least not yet) - but I can guess how they achieved this performance. It probably:

Builds a minimal AST with only formatting-relevant nodes
Benefits from Rust's zero-cost abstractions — Though Zig is also zero-cost
Might even parallelize parsing of independent code sections
Has been meticulously optimized by people who deeply understand both Rust and Python

The fact that they achieved this performance while maintaining Black compatibility is honestly inspiring. They didn't just make a fast formatter - they made a fast formatter that people actually want to use.

Meanwhile, I'm using a generic parsing framework that builds a complete AST for any possible use case. tree-sitter is fantastic, but not designed for that specific use case.

But honestly, I'm okay with this.

I still crushed the original goal of being 10x faster than Black by quite a lot
I expect the majority of Python files to be under 5000 lines, and here my formatter is competitive with Ruff! That was a really good surprise to be honest!
I started with Python because I felt that was an easier target, but where the value will be is supporting multiple languages with a shared infrastructure. I expect pretty massive wins against Java, Kotlin, TypeScript, Ruby, C# formatters. The Python formatter serves as a proof of concept and does validate the architecture.

The real lesson here is a better understanding of trade-offs I accepted by going with tree-sitter. And the benefits are still substantial:

Support for 100+ languages
Consistent parsing behavior
A proven, battle-tested foundation

Writing a custom parser for each language to match Ruff's performance would be too big of a distraction. And if you need the absolute fastest Python formatter? Use Ruff. Seriously. It's fantastic.

I still have a good amount of work to do, my compatibility with Black is currently close to 90%, according to my (likely limited) test suite. Which is pretty good to be honest!

So let's see how things evolve when I reach 100% :)