nom 1.0 is here! REJOICE!By Geoffroy Couprie | @gcouprie | 2015-11-16 | read
nom is a parser combinators library witten in Rust that I started about a year ago. Its goal is to let you write parsers that are safe by default, fast, and abstract all of the dangerous or annoying details of data consumption.
During that year, more than 50 projects have started using it; from toy parsers to high performance production code. Their feedback has been invaluable to improve the library, include more and more parsing patterns, and test ideas on what makes a great parser library. The 1.0 version is the result of that feedback. More stable version, but also a few breaking changes to improve the architecture, make it more flexible and easier to use. We now feel it is reliable enough to be used in production at Clever Cloud. We have a lot of data to manage, coming from trusted and untrusted sources, and this is exactly the kind of tool we need to build a safe infrastructure.
The quantity of open source projects using nom has been really helpful in developing that stable release. If you maintain one of those projects, you may have received a pull request from me. That's right: I took care of testing the 1.0 branch on every project I could get my hands on, to see what would break, which features developers were using, and document the upgrade process. This has been a lot of work, but worth it. I'll probably tell more about that in a future blog post, for other library maintainers that want to try the approach.
That's all good, but why would you use nom right now? Let's see!
nom is fast. How fast? A few benchmarks have shown that it is consistently faster than Parsec and attoparsec (Haskell parser combinator libraries), faster than other Rust parser combinator libraries, and even faster than Rust's regular expression library. There is even a benchmark where it beats Joyent's http-parser on parsing HTTP request headers.
Why is it faster? I have a few ideas about this. First, unlike most parser combinators systems, nom does not copy data if it is not needed. It uses the slice heavily, a Rust data structure containing a pointer and a length. Since Rust's compiler manages memory correctly, you can afford to refer to the original input from the beginning to the end of the parser, without copying anything.
Second, nom does not chain parsers at runtime. The macros directly generate the parsing code at compile time. This creates very linear code, something that CPUs find very easy to handle. If you tried to decompile the final binary to C code, you would just see a long list of if-else branches.
It is also a safe alternative to handwritten C parsers. nom bases its memory safety on Rust's compiler: it knows, at any moment, which part of the code owns which part of the memory, prevents out of bound accesses, automatically manages memory allocation and deallocation. And since that is not enough, some nom parsers were fuzzed to hell with American Fuzzy Lop, just to verify those claims.
The result? The only flaws that were found appeared, not in nom generated code, but in code written manually outside of nom: index calculations that could overflow if a specific value appeared in the input. And those could not result in memory corruption, just crashes.
you can quickly write a parser that will be safe by default
This has awesome implications: you can quickly write a parser that will be safe by default. This lets you test ideas, experiment with your design, without fear for your security.
You should now see where I'm going: with parsers that are easy to write, as fast or faster than handwritten C, and safe by default; you can replace old and vulnerable C parsers. Rust can work without a runtime, and is easily embedded in C code. It has already been used to write extensions for Ruby, Python, NodeJS and others. It is only a matter of time until it replaces the vulnerable parts of current C projects.
This is one of my long term goals: making reliable, safe building blocks to build our systems. Not only new bricks, but also replacing the old ones. This will require a tremendous effort, and nom is just the first step, but a big one.
To get started using nom, you can include it in your Rust projects from crates.io. Here are a few links you will find useful:
- Github repository Geal/nom
- Reference documentation
- Upgrading to nom 1.0
- Gitter chat room. You can also go to the #nom IRC channel on irc.mozilla.org, or ping 'geal' on Mozilla, Freenode, Geeknode or oftc IRC
- Tutorial about parsing ISO8601 dates
- Making a new parser from scratch (general tips on writing a parser and code architecture)
- How to handle parser errors
- How nom's macro combinators work
Also, if you have existing code running older versions of nom, please take a look at the upgrade documentation