Sunday, 01 August, 2004

Regular Expressions Make My Brain Hurt

Working with regular expressions makes my brain hurt.  Understand, I'm not new to this particular brand of torture:  I've been using regular expressions for at least 20 years.  Not every day, but often enough that I'm reasonably comfortable with them.  And I still find them confusing to write, difficult to decipher, and almost impossible to modify.  The "language" of regular expressions deserves the description "write only" much more than do APL or Perl.

Regular expressions are a powerful and concise way of expressing simple or complex text matching and replacement behavior.  The basics of the language are easy enough to learn and use--even for a novice programmer--but long and complex regular expressions confound even experienced programmers.  Much of working with regular expressions involves trial and error along with a whole lot of head scratching and searching the Internet for examples of regular expressions that do "something like" whatever it is you're trying to do.

The Internet is chock full of regular expression "tutorials," most of which appear to be derived from the same reference whose origins I am unable to determine.  All of these "tutorials" describe what regular expressions are, provide simple examples of their use, and then launch into highly detailed technical reference information that means almost nothing.  Gleaning practical information like how to use an advanced feature is difficult at best.

The Internet also is chock full of programs that claim to help build and test regular expressions.  I've yet to find one that does anything more than allow me to enter a regular expression and test its action against a block of text.  When I see the term "regular expression builder" used to describe a program, I expect to see a tool that actually assists in building the regular expression:  helping me construct the correct syntax and validating a regular expression for correctness, providing useful explanatory error messages when it finds incorrect syntax.  I'm very surprised that no such tool exists.

Granted, such a tool would be very difficult to create from a user interface perspective.  It's either impossibly difficult, or none of the eight people in the world who actually understand regular expression syntax in detail are willing to expend the energy required to create such a program.  I'm thinking that maybe I should become the ninth person to understand regular expressions and then write the tool myself.

One person who certainly understands regular expressions is Jeffrey Friedl, the author of the book Mastering Regular Expressions.  At over 400 pages, this is the book on regular expressions.  Not only does it describe in detail the language of regular expressions and their behavior, but it does so with a focus on solving real problems--something that none of the other references I've seen does.  It includes sections on the different types of regular expression engines, descriptions of how expressions are processed, tips on creating efficient expressions, and individual chapters for Java, Perl, and .NET programmers.  If you need to understand regular expressions in detail (and if you're writing text processing applications, you do), you absolutely should read this book.