I've been building a universal code formatter in Zig, and I recently discovered something humbling: my formatter will never be as fast as Ruff on large files. But here's the twist - it's actually faster on small files, and the journey to this realization taught me a lot.
The vision was simple: build a blazing-fast, universal code formatter using tree-sitter. One formatter to rule them all - Python, JavaScript, Go, Java - with consistent formatting logic across languages. Like having gofmt
for every language.
I set myself what I thought were ambitious goals:
And I actually hit those goals. Kind of. Then a coworker told me about Ruff.
After weeks of optimization, profiling, and embracing Zig's data-oriented design principles (think arrays of structs, cache-friendly layouts), here's where I landed:
I felt like I had something really special here. Until I added Ruff to my benchmark and compat testing suite:
That last number stung. What did I miss? Do I have a O(n²) I didn't notice? But all good, I just had to do some more profiling, batch processing, and optimization! My algorithm is sound, my data structures are pretty efficient, and tree-sitter is a solid and extremely fast parser. Just a few more tweaks and I would be back on track, right?
Spoiler: nop.
My first instinct was to profile everything. Zig makes this surprisingly pleasant - I built a simple profiler that tracks function calls and timing:
fn formatNode(allocator: std.mem.Allocator, output: *std.ArrayList(u8), node: tree_sitter.Node, source: []const u8, depth: u32) !void {
const start_time = std.time.nanoTimestamp();
defer {
const end_time = std.time.nanoTimestamp();
profiler.recordCall("formatNode", @intCast(end_time - start_time));
}
// ... formatting logic ...
}
The results on a 9,919-line file were eye-opening:
Function | Calls | Total Time (ms) | Avg Time (μs)
----------------------------|-----------|-----------------|---------------
parse | 1 | 245 | 245615
formatExpressionWithDepth | 3099 | 99 | 31
...
Wait, what? Parsing takes 245ms out of 308ms total? That's 79% of the time just building the AST! I had to double and triple check this. I was sure I had algorithmic issues somewhere in my own code, but no, it was just the parser.
This sent me on a quest to optimize tree-sitter itself following various ideas I found online:
Parser reuse? Measured it - creating a parser only takes 74μs. Not worth the complexity of managing global state. And that wouldn't help when parsing a single large file anyway.
Custom memory allocators? Tree-sitter allows overriding malloc/free. Could implement an arena allocator or pool allocator for common node sizes. But the complexity didn't seem worth the potential 10-20% gain.
Partial parsing? Tree-sitter has ts_parser_set_included_ranges
for parsing only parts of a file. But it's designed for embedded languages, not skipping sections of a single language.
Disable features? Not too helpful, tree-sitter grammars are already minimal by design.
Eventually I found the answer in tree-sitter's GitHub issues. Tree-sitter has a fundamental limitation: it must parse the entire file at once. There's no streaming, no partial parsing for single-language files.
Why? Because tree-sitter builds a complete, lossless syntax tree suitable for any use case - syntax highlighting, code analysis, refactoring, you name it. For formatting, we only need:
We're paying for lots of features we don't use.
Let me be clear: Ruff is an incredible achievement. The Astral team has built something genuinely impressive - a Python formatter that's not just fast, but ridiculously fast. They've managed to be 10-100x faster than Black while maintaining compatibility and adding a full linter. That's not easy. That's engineering excellence.
Ruff is written in Rust and likely uses a hand-written recursive descent parser optimized specifically for Python. My approach for this project is to consider the competition as black boxes to avoid being too influenced, so I won't review the implementation to have a definitive answer (at least not yet) - but I can guess how they achieved this performance. It probably:
The fact that they achieved this performance while maintaining Black compatibility is honestly inspiring. They didn't just make a fast formatter - they made a fast formatter that people actually want to use.
Meanwhile, I'm using a generic parsing framework that builds a complete AST for any possible use case. tree-sitter is fantastic, but not designed for that specific use case.
But honestly, I'm okay with this.
The real lesson here is a better understanding of trade-offs I accepted by going with tree-sitter. And the benefits are still substantial:
Writing a custom parser for each language to match Ruff's performance would be too big of a distraction. And if you need the absolute fastest Python formatter? Use Ruff. Seriously. It's fantastic.
I still have a good amount of work to do, my compatibility with Black is currently close to 90%, according to my (likely limited) test suite. Which is pretty good to be honest!
So let's see how things evolve when I reach 100% :)