Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×

Java Regular Expressions 181

Simon P. Chappell writes "Regular expressions (regex to their friends) are an incredibly powerful addition to most programmer's personal toolkit of techniques. Programming using a language that doesn't support them can be frustrating if you need to do any amount of non-trivial string handling. Java was just such a language until the release of the 1.4.x series. Sure, there were libraries like ORO that would provide regex support, but it wasn't built in and not many companies allow the use of 3rd party libraries. With version 1.4.x, the corporate Java developer in the trench, received the power of regular expression pattern matching." Read the rest of Simon's review.
Java Regular Expressions
author Mehran Habibi
pages 255 (7 page index)
publisher Apress
rating 8/10
reviewer Simon P. Chappell
ISBN 1590591070
summary A great starter for using regular expressions in Java


The book seems targeted towards those who have a solid level of Java programming skills, but who have not yet used the java.util.regex package. I see two types of Java programmers who might not have used the regex package, those who do not know about regular expressions and those who know them, but have not yet used them within Java. This book should satisfy both sets of users. The first group will be benefited by the general introduction to regular expressions and the gentle introduction to using them within Java. The later group will benefit from the more advanced material in the book.

The book is nicely structured and progresses easily through its subject matter. The first chapter is an introduction to regular expressions. While this is most obviously for the readers new to the subject, it will be useful for those more experienced, because not all regex engines are created equal and this chapter lays out the particular dialect of regular expressions used by the Java 1.4.x regex engine. The second chapter introduces the object model used by java.util.regex. This gives detailed explanations of the Pattern and Matcher objects as well as the new regular expression methods added to the standard String class.

The third chapter takes the reader into advanced Regular expressions. While there is much that can be done using just the Pattern and Matcher objects, the path to the full power of regex travels through an understanding of groups (and subgroups) and qualifiers. Regex groups are hard to explain until you've seen them in action, whereupon you may find yourself wondering how you'd ever managed without them before. Mr. Habibi does an excellent job, both explaining them and introducing us to the unusual noncapturing subgroups. (I'd never heard of these before.) Qualifiers are the other side of the same coin with groups. While it's one thing to define a group and whether it's expected and to be captured, it's equally important to be able to describe the expected occurrence of those groups using qualifiers.

Chapter four tackles the interesting challenges of using regex in an object-oriented language. Mr. Habibi describes the general principles of use of regex as similar to those used with SQL through the JDBC interface. These principles are the optimisimg of connections, batching reads and writes, storing patterns externally, Just In Time compilation of patterns and remembering that not every piece of String handling code needs to be written as a regex. All very useful advice.

Chapter five is the big examples chapter. All of the examples are intended to be practical; the kind of thing you might have to address at the day job. With examples covering Zip codes, telephone numbers, dates, searching text files and even validating an EDI document, he seems to have delivered on that assertion. There are further examples in Appendix C, if the afore-mentioned patterns aren't enough.

The writing and progression of material are good. The examples are very well thought out and explained. Many of the examples are built from first principles. Mr. Habibi seems to want to not only teach you how to use regular expressions, but also how to design them. He does this by working up from an understanding of the data until he has a working regex.

While it doesn't make any promises about being an encyclopedia of regex patterns, this book does contain enough of the normal business patterns to be a useful initial reference work, before turning to the Internet to search for patterns.

If you want an encyclopedic reference work on regex, then buy Jeffery Friedl's Mastering Regular Expressions which is published by O'Reilly. This is not that book, preferring to stick with the practical usage of regex.

This is a great starter book, for developers who are new to using regular expressions in Java."


You can purchase Java Regular Expressions from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
This discussion has been archived. No new comments can be posted.

Java Regular Expressions

Comments Filter:
  • Wrong way round (Score:2, Interesting)

    by Tim Ward ( 514198 ) on Wednesday August 02, 2006 @04:30PM (#15834773) Homepage
    Regular expressions (regex to their friends) are an incredibly powerful addition to most programmer's personal toolkit of techniques. Programming using a language that doesn't support them can be frustrating if you need to do any amount of non-trivial string handling.

    Er, no. It is only for trivial string handling that the regex approach is useful.

    For non-trivial string handling (particularly if you feel like giving the authors of erroneous strings helpful error messages!!) I'll write a proper lexical analyser and a proper parser every time.
  • Re:Recursion? (Score:5, Interesting)

    by SIGALRM ( 784769 ) on Wednesday August 02, 2006 @04:34PM (#15834793) Journal
    Regular expressions aren't really meant for recursive solutions, but if we have recursive regular expressions, we can define our balanced-paren expression like this: first match an opening paren; then match a series of things that can be non-parens or an another balanced-paren group; then a closing paren. Turned into Perl code, this becomes:

    $paren = qr/(([^()]+|(??{ $paren }))*)/x;
    When this is run on some text like
    (lambda (x) (append x '(hacker)))
    the following happens: we see our opening paren, so all is well. Then we see some things which are not parens (lambda ) and all is still well. Now we see (, which definitely is a paren. Our first alternative fails, we try the second alternative. Now it's finally time to interpolate what's inside the double-secret operator, which just happens to be $paren. And what does $paren tell us to match? First, an open paren - ooh, we seem to have one of those handy. Then some things which are not parens, such as x, and then we can finish this part of the match by matching a close paren. This polishes off the sub-expression, so we can go back to looking for more things that aren't parens, and so on.
  • Microsoft and regex (Score:4, Interesting)

    by truthsearch ( 249536 ) on Wednesday August 02, 2006 @04:40PM (#15834836) Homepage Journal
    Slightly off-topic, but...

    Back when my only experience was development on Windows I was very frustrated with the lack of good string handling in Microsoft languages (VB, T-SQL). If you didn't find a third-party library you had to write a lot of expensive code to do fancy string searches. Try writing recursion in VB6 without bringing your computer to a screeching halt.

    Then when I switched to linux and open source I was shocked to learn that something as useful as regex had already been around for many years. Most of the Windows developers I knew never even heard of it. It was tricky to learn but has paid off many times over in utility.

    Every developer is better of for knowing it. Even if they never use regex the thought process in understanding it is quite interesting and educational.
  • What? (Score:3, Interesting)

    by avalys ( 221114 ) on Wednesday August 02, 2006 @04:43PM (#15834853)
    Sure, there were libraries like ORO that would provide regex support, but it wasn't built in and not many companies allow the use of 3rd party libraries
    Who's boneheaded enough to do this? I want to know so I can avoid buying anything from them, because their products are going to be overpriced by at least 50% due to the wasted effort.

    I can understand restricting third-party libraries to those of a certain license, like BSD or LGPL, but a blanket ban without any exceptions for something as essential as regular expressions? That's just stupid.

    One of the biggest advantages of Java is the enormous number of high-quality third-party libraries available.

    Is this just something the submitter dreamed up to fill space, or do companies actually do this?

  • Re:My main complaint (Score:4, Interesting)

    by Pxtl ( 151020 ) on Wednesday August 02, 2006 @04:48PM (#15834886) Homepage
    Well, does Java have a facility similar to C#'s @strings? In C#, a string prefixed with @ is literal, much like Python's """ strings - no escape characters. Very handy for regular expressions.

    In general, C#'s regular expression package is very nice, except for the whole "groups" and "captures" thing.
  • by Heembo ( 916647 ) on Wednesday August 02, 2006 @05:21PM (#15835139) Journal
    One of the reasons we as programmers write code is to take a very complex idea, like a software application, and write something that a human engineer can understand. The KISS principle especially applies to coders.

    As I get older, my code has gotten more and more straightforward, cause I consider to maintainance cycle of code to be more than 95% of the puzzle. And these days, I have more than one security analyst who is not a senior software engineer poking around me code.

    RegEx's are not-so-readable and not-very-maintainable programming abstracts that should be avoided whenever possible. I prefer using string manipulation abstraction classes (such as my own version of StringTokenizer). They are not as fast and furious as other methods like lexical analysis, and the code is more bloated, but the code is Straight Forward And Easy To Read. There is a power is code of this nature, and my clients have thanked me more than once to not focusing on writing "cool code" but for writing "clean and simple" code. I just tried to paste in a few ugly regex samples, but slashdot blocked me calling them "junk characters" I agree! :)

    For example, take XPATH, this is a clean and simple way to address XML objects. Sure, there is an additional level of abstraction, but you can look at an XPATH query, even from a layman's point of view, and have a clear understanding as to what it is doing.
  • Re:My main complaint (Score:2, Interesting)

    by Deef ( 162646 ) on Wednesday August 02, 2006 @06:56PM (#15835772)
    I sometimes do this:

    Pattern foo = Pattern.compile("c:/foo/bar".replace('/','\\'));

    or just put the above in a library method that does it automatically:

    Pattern foo = PatternUtils.compile("c:/foo/bar");

    which is handy if other replacements are made by that library method also:

    Pattern foo = PatternUtils.compile("({number}):{number}:({identi fier})-{number}");

UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever things. -- Doug Gwyn

Working...