Forgot your password?
typodupeerror

Java Regular Expressions 181

Posted by samzenpus
from the no-cream-or-sugar dept.
Simon P. Chappell writes "Regular expressions (regex to their friends) are an incredibly powerful addition to most programmer's personal toolkit of techniques. Programming using a language that doesn't support them can be frustrating if you need to do any amount of non-trivial string handling. Java was just such a language until the release of the 1.4.x series. Sure, there were libraries like ORO that would provide regex support, but it wasn't built in and not many companies allow the use of 3rd party libraries. With version 1.4.x, the corporate Java developer in the trench, received the power of regular expression pattern matching." Read the rest of Simon's review.
Java Regular Expressions
author Mehran Habibi
pages 255 (7 page index)
publisher Apress
rating 8/10
reviewer Simon P. Chappell
ISBN 1590591070
summary A great starter for using regular expressions in Java


The book seems targeted towards those who have a solid level of Java programming skills, but who have not yet used the java.util.regex package. I see two types of Java programmers who might not have used the regex package, those who do not know about regular expressions and those who know them, but have not yet used them within Java. This book should satisfy both sets of users. The first group will be benefited by the general introduction to regular expressions and the gentle introduction to using them within Java. The later group will benefit from the more advanced material in the book.

The book is nicely structured and progresses easily through its subject matter. The first chapter is an introduction to regular expressions. While this is most obviously for the readers new to the subject, it will be useful for those more experienced, because not all regex engines are created equal and this chapter lays out the particular dialect of regular expressions used by the Java 1.4.x regex engine. The second chapter introduces the object model used by java.util.regex. This gives detailed explanations of the Pattern and Matcher objects as well as the new regular expression methods added to the standard String class.

The third chapter takes the reader into advanced Regular expressions. While there is much that can be done using just the Pattern and Matcher objects, the path to the full power of regex travels through an understanding of groups (and subgroups) and qualifiers. Regex groups are hard to explain until you've seen them in action, whereupon you may find yourself wondering how you'd ever managed without them before. Mr. Habibi does an excellent job, both explaining them and introducing us to the unusual noncapturing subgroups. (I'd never heard of these before.) Qualifiers are the other side of the same coin with groups. While it's one thing to define a group and whether it's expected and to be captured, it's equally important to be able to describe the expected occurrence of those groups using qualifiers.

Chapter four tackles the interesting challenges of using regex in an object-oriented language. Mr. Habibi describes the general principles of use of regex as similar to those used with SQL through the JDBC interface. These principles are the optimisimg of connections, batching reads and writes, storing patterns externally, Just In Time compilation of patterns and remembering that not every piece of String handling code needs to be written as a regex. All very useful advice.

Chapter five is the big examples chapter. All of the examples are intended to be practical; the kind of thing you might have to address at the day job. With examples covering Zip codes, telephone numbers, dates, searching text files and even validating an EDI document, he seems to have delivered on that assertion. There are further examples in Appendix C, if the afore-mentioned patterns aren't enough.

The writing and progression of material are good. The examples are very well thought out and explained. Many of the examples are built from first principles. Mr. Habibi seems to want to not only teach you how to use regular expressions, but also how to design them. He does this by working up from an understanding of the data until he has a working regex.

While it doesn't make any promises about being an encyclopedia of regex patterns, this book does contain enough of the normal business patterns to be a useful initial reference work, before turning to the Internet to search for patterns.

If you want an encyclopedic reference work on regex, then buy Jeffery Friedl's Mastering Regular Expressions which is published by O'Reilly. This is not that book, preferring to stick with the practical usage of regex.

This is a great starter book, for developers who are new to using regular expressions in Java."


You can purchase Java Regular Expressions from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
This discussion has been archived. No new comments can be posted.

Java Regular Expressions

Comments Filter:
  • When speed matters (Score:4, Informative)

    by SIGALRM (784769) on Wednesday August 02, 2006 @04:22PM (#15834717) Journal
    there were libraries like ORO that would provide regex support, but it wasn't built in and not many companies allow the use of 3rd party libraries
    For those who can utilize third-party libs, consider evaluating this DFA/NFA automaton [brics.dk], a regexp package that is significantly faster than java.util.regex.

    However, like many things in computer science, speed gains come at a price. In this case, the regular expression language supported is not quite as rich as the JDK implementation.
    • by Ryan Amos (16972) on Wednesday August 02, 2006 @06:21PM (#15835549)
      When speed matters

      ...you don't use Java.

      (I know, let the flames commence! :)

      • by CompSciStud4U (877987) on Wednesday August 02, 2006 @07:35PM (#15836016)
        I'll take the bait. When Java was introduced in 1995 almost all compiler research had been on static compilation, such as in C or Fortran. When the popularity of Java started to rise a lot of research effort, such as at IBM, was switched over to Just In Time (JIT) compilers. This was a pretty raw field at the time so the Java was horribly slow compared to C.

        Fast forward 11 years and the situation is quite different. I'm not sure about the Java compiler that comes distributed with the SDK, but a JIT compiler and virtual machine from another commerical sourse (I'll just stick with IBM) is now incredibly optimized compared to 1995. Large amounts of research have been done to catch up with the fact that statically compiled languages had a 30+ year headstart. And JIT compiled languages could one day be faster than a statically compiled one due to new dynamic compilation techniques that use system resource data, such as cache misses, collected by the VM to continuously reoptimize portions of the byte code.

        And even the overhead of garbage collection may soon be lowered dramatically due to research at the University of Massachusetts http://www.cs.umass.edu/~emery/pubs/f034-hertz.pdf [umass.edu]

        I'm not going to say that Java is faster than C (or in this case Perl, a language specifically designed for parsing regular expressions), but the speed gap between the two is constantly closing to the point where it doesn't really matter that much anymore.
        • Well, I'd love to see at least one application written in Java that is fast, but so far I haven't seen any.

          I've got a dual Athlon MP2000+, and Azureus still is horribly slow compared to everything else I run on it.
          • Are you using the default Sun HotSpot JVM? If so, you're not following the criteria provided by the parent poster. Lots of people can write their own C compilers, but they're not going to be as time- or space-optimized as gcc or Intel's compilers.
          • Azureus is slow? Azureus is Java? So Java is slow.

            Thanks for your insight.
    • Anyone know if there's a Java implementation of "structural regular expressions" [berkeley.edu] as seen in the Sam [wikipedia.org] editor on Plan 9?
  • Me: I'll have a Grande Cafe au Lait please.

    Starbucks Employee: That'll be an hour's wages please.

    Me: Thanks! /me hands over cash, takes careful first sip.

    Thats when you get to see my java regular expression.

    Generally it will be me wincing in pain because I just burned my tongue. Sometimes, if it's cooled enough, you'll hear a quiet "MmmMmmm" in the style of Family Guy's Herbert.
  • I tried to do a bit of recursion in regexes once, like ((\d+)\.)+, but that didn't work. It's too bad, because I don't think there's another way to dynamically match data in regexes. Other than this, they've served me very well all these years.
    • Re:Recursion? (Score:5, Interesting)

      by SIGALRM (784769) on Wednesday August 02, 2006 @04:34PM (#15834793) Journal
      Regular expressions aren't really meant for recursive solutions, but if we have recursive regular expressions, we can define our balanced-paren expression like this: first match an opening paren; then match a series of things that can be non-parens or an another balanced-paren group; then a closing paren. Turned into Perl code, this becomes:

      $paren = qr/(([^()]+|(??{ $paren }))*)/x;
      When this is run on some text like
      (lambda (x) (append x '(hacker)))
      the following happens: we see our opening paren, so all is well. Then we see some things which are not parens (lambda ) and all is still well. Now we see (, which definitely is a paren. Our first alternative fails, we try the second alternative. Now it's finally time to interpolate what's inside the double-secret operator, which just happens to be $paren. And what does $paren tell us to match? First, an open paren - ooh, we seem to have one of those handy. Then some things which are not parens, such as x, and then we can finish this part of the match by matching a close paren. This polishes off the sub-expression, so we can go back to looking for more things that aren't parens, and so on.
    • Re:Recursion? (Score:2, Informative)

      by Reverend528 (585549)

      I tried to do a bit of recursion in regexes once, like ((\d+)\.)+, but that didn't work.

      By definition, Regular Expressions are limited to regular languages [wikipedia.org], thus can be expressed by Finite Automata [wikipedia.org]. This prohibits them from supporting recursion, but generally makes them easy to optimize.

    • Re:Recursion? (Score:4, Informative)

      by Anonymous Coward on Wednesday August 02, 2006 @05:08PM (#15835036)
      Regular expressions are only for regular languages. They are the simplest type of language and use a simple state machine (automaton) to do their language recognition.
      Context free languages may have recursion. They use a state machine (pushdown automaton) and a stack to recognize thier languages.
      http://en.wikipedia.org/wiki/Context-free_language [wikipedia.org]
      This also contains links to other families of language and info on the automaton that can recognize them.
      Welcome to Theory of Computing!
      • And people wonder what a computing science degree is useful in the real world...
      • Gah, and to think I passed that class :P I just hadn't realised that all that theory about automata and K* and whatnot applied to the real world!
  • Wrong way round (Score:2, Interesting)

    by Tim Ward (514198)
    Regular expressions (regex to their friends) are an incredibly powerful addition to most programmer's personal toolkit of techniques. Programming using a language that doesn't support them can be frustrating if you need to do any amount of non-trivial string handling.

    Er, no. It is only for trivial string handling that the regex approach is useful.

    For non-trivial string handling (particularly if you feel like giving the authors of erroneous strings helpful error messages!!) I'll write a proper lexical analys
    • For non-trivial string handling (particularly if you feel like giving the authors of erroneous strings helpful error messages!!) I'll write a proper lexical analyser and a proper parser every time.

      You can outfit a regexp functor with error message handling, or exceptions, and if your project is embedded (certainly not trivial) or performance-dependent, I'm not sure that I'd write a lex/parser "every time". I guess it boils down to this: "trivial string handling" is semantic nonsense.

      • Re:Wrong way round (Score:3, Informative)

        by smallfries (601545)
        I'm not sure if you got the parents point (apologies if you did). By trivial string handling he's talking about recursive structures, and the erroneous strings he's mentioning are probably programs as input to a compiler. The 'non-trivial' strings are the class of strings that you would need a full grammar in order to parse, rather than a reg-exp. But yeah, not every time - horses for courses and all that.
    • Re:Wrong way round (Score:4, Insightful)

      by smitty_one_each (243267) * on Wednesday August 02, 2006 @05:16PM (#15835096) Homepage Journal
      I would assert that if your input data are sufficiently irregular that you require a parser/lexical analyzer, you may have exceeded the bounds of "regular" expressions.
      • Absolutely. Why this should be surprising, I don't know. The very nature of DFAs is that they don't support counting. Thus, the minute you find yourself dealing with recursion (ie, tags, brackets, etc), regular expressions break down.

        However, if you're just doing vanilla text parsing with data that's not overly complex, regexs are an absolute godsend, and are far easier to use than a full lexer/parser package.
        • not to mention, most lexers are going to use regular expressions to identify tokens for the higher-order parsing operations. So RE are a good first step for anyone getting into lexer/parser wares anyway.
    • "Trivial" is relative. The sort of string processing you do with String.indexOf and other simple matching functions is trivial compared to what you do with regular expressions. Besides, once you've started using lexers and parsers, you've graduated from "string handling" to artificial linguistics.
  • by LadyLucky (546115) on Wednesday August 02, 2006 @04:37PM (#15834815) Homepage
    Are you serious? What kind of company would do that? It's madness!
  • My main complaint (Score:5, Informative)

    by kbielefe (606566) <karl.bielefeldt+slashdot@gma i l . c om> on Wednesday August 02, 2006 @04:39PM (#15834825)

    My main complaint about java regexps is that all the backslashes have to be quoted with a backslash, making them completely unreadable compared to a language that supports regular expressions natively, like perl (no, a standard library is not technically native support). "\d" becomes "\\d" and so forth. Does anyone know a simple way around this? We just started using java regexp's at work, so the extra backslashes don't bother most people, but they are extremely annoying to those of us with a lot of perl experience.

    P.S. How many slashdotters thought they'd be rolling in their graves by the time they heard an example of where perl is more readable than java?

    • by Kesch (943326) on Wednesday August 02, 2006 @04:46PM (#15834873)
      P.S. How many slashdotters thought they'd be rolling in their graves by the time they heard an example of where perl is more readable than java?


      I'm still amazed to find 'readable' and 'regular expressions' in the same context.
    • Re:My main complaint (Score:4, Interesting)

      by Pxtl (151020) on Wednesday August 02, 2006 @04:48PM (#15834886) Homepage
      Well, does Java have a facility similar to C#'s @strings? In C#, a string prefixed with @ is literal, much like Python's """ strings - no escape characters. Very handy for regular expressions.

      In general, C#'s regular expression package is very nice, except for the whole "groups" and "captures" thing.
    • two slashes "\\" is nothing. the real PITA begins when you need to do "\\\\"

      effing java.
    • I haven't tried this, but I suppose you could stick the regex's in a .properties file.
      • Re:My main complaint (Score:5, Informative)

        by _xeno_ (155264) on Wednesday August 02, 2006 @05:02PM (#15834986) Homepage Journal

        Backslashes in a .properties file have to be escaped with (guess what?) a backslash.

        So it, unfortunately, solves nothing.

        If you don't mind XML, you can use the XML properties format, but you're still adding a lot of extra code just so you don't have to deal with escape characters. There's, unfortunately, no good solution in Java. (There are no raw strings in Java.)

    • My main complaint about java regexps is that all the backslashes have to be quoted with a backslash, making them completely unreadable compared to a language that supports regular expressions natively, like perl ...

      You're asking about Java regexps, but similar problems extend to other languages where the the syntax, features and usage are different enough so that anyone with a basis in Perl is similarly annoyed, if not dumbfounded by the awkwardness and limitations. Any systems administrator will tell you
      • Nothing in actual Java or the Sun libraries fixes this gripe. But, have a look at the Jakarta Commons project's org.apache.commons.lang.StringEscapeUtils class [apache.org], particularly the ScringEscapeUtils.escapeJava() methods [apache.org].

        It may be helpful, I haven't tried it. Would be particularly interesting to see if it'll correctly convert, say "\t" into "\\t" instead of a TAB. If it does, then you could use it to wrap the strings for the regexp pattern methods.

        • Okay, I've tested: should be good to use. Try this code out (sorry about the indenting, I can't figure out how to get slashdot to do <pre> style HTML... also, the string should be "\tXXX" in both cases, I don't know why slashdot has put that space in it in the call to escapeJava()...):

          import org.apache.commons.lang.*;

          public class EscapeTest{
          public static void main(String [] a){
          System.out.println(StringEscapeUtils.escapeJava("\ tXXX"));
          System.out.println("\tXXX");
          }
          }
          </blockquote

    • Re:My main complaint (Score:2, Interesting)

      by Deef (162646)
      I sometimes do this:

      Pattern foo = Pattern.compile("c:/foo/bar".replace('/','\\'));

      or just put the above in a library method that does it automatically:

      Pattern foo = PatternUtils.compile("c:/foo/bar");

      which is handy if other replacements are made by that library method also:

      Pattern foo = PatternUtils.compile("({number}):{number}:({identi fier})-{number}");

    • Put them in comments and use a tool to generate and test regular expressions. For the Eclipse IDE, there is QuickREx, it includes a paste function that automatically escapes the escapes.

      http://eclipse-plugins.2y.net/eclipse/rating_detai ls_plugin.jsp?plugin_id=964 [2y.net]

      A good idea is to include the regular expressions in a comment as well. Most of the time creating and testing a regular expression takes most of the time anyway. If you really hate the escaped regular expressions, just put them in a resource (e.g.
    • Does anyone know a simple way around this?

      I've done this with C on Windows when I had one library that borked whenever you tried to use / in pathnames.

      Pick unicode characters for your special strings, e.g. . Next, map some handy keystroke to that in your editor. Then write a script to replace that with a standard Java string. Since it's not standard java, give it a special extension and add the script and extension to your makefile or ant or whatever you use.
  • The missing Regular Expressions is what kept me off Java and on Perl for a looong while. I started using ORO and since their introduction into Java itself I almost completely switched over. I relly do hope Perl 6 will be released and lives up to its expectations.

    Having said that I really don't see why you have to devote a complete book on regex. A small tutorial does just fine.
    • > Having said that I really don't see why you
      > have to devote a complete book on regex.
      > A small tutorial does just fine

      I think it depends on how deep you want to go into regular expressions. Mastering Regular Expressions [oreilly.com] by Jeffrey Friedl is almost 500 pages but is an excellent treatment of the subject - by the time you're done reading it you'll feel comfy even with such madness as negative lookbehind.
  • Microsoft and regex (Score:4, Interesting)

    by truthsearch (249536) on Wednesday August 02, 2006 @04:40PM (#15834836) Homepage Journal
    Slightly off-topic, but...

    Back when my only experience was development on Windows I was very frustrated with the lack of good string handling in Microsoft languages (VB, T-SQL). If you didn't find a third-party library you had to write a lot of expensive code to do fancy string searches. Try writing recursion in VB6 without bringing your computer to a screeching halt.

    Then when I switched to linux and open source I was shocked to learn that something as useful as regex had already been around for many years. Most of the Windows developers I knew never even heard of it. It was tricky to learn but has paid off many times over in utility.

    Every developer is better of for knowing it. Even if they never use regex the thought process in understanding it is quite interesting and educational.
    • VB has had regular expressions available to it for about 8 years now. In VBScript, they're built-in, and in VB you just make a COM reference to "Microsoft VBScript Regular Expressions 5.5". See this article for details - http://support.microsoft.com/default.aspx?scid=kb ; en-us;818802 [microsoft.com]

      And don't let the date of the article fool you - though it was written in 2006, you've been able to use regexes in VB since the late 90's. That being said, I've always found Perl's implementation to be faster and easier to us
  • What? (Score:3, Interesting)

    by avalys (221114) on Wednesday August 02, 2006 @04:43PM (#15834853)
    Sure, there were libraries like ORO that would provide regex support, but it wasn't built in and not many companies allow the use of 3rd party libraries
    Who's boneheaded enough to do this? I want to know so I can avoid buying anything from them, because their products are going to be overpriced by at least 50% due to the wasted effort.

    I can understand restricting third-party libraries to those of a certain license, like BSD or LGPL, but a blanket ban without any exceptions for something as essential as regular expressions? That's just stupid.

    One of the biggest advantages of Java is the enormous number of high-quality third-party libraries available.

    Is this just something the submitter dreamed up to fill space, or do companies actually do this?

    • One of the biggest advantages of Java is the enormous number of high-quality third-party libraries available...

      ... that make up for the lack of high-quality useful first-party packages.

    • I'm working at a three-letter acronym this summer for an internship, doing development on another three-letter acronym. We use a third-party open-source (GPL- hey, that's a TLA too! or three!) somethingorother - not strictly a library, not really Java, but, well, kinda similar... anyway. We can't ship this open source somethingorother with the product, or our lawyers will explode (no, that's not a good thing; when lawyers explode, they get everywhere). Apparently, we can't even mirror the somethingorother.
    • by VGR (467274)

      Sure, there were libraries like ORO that would provide regex support, but it wasn't built in and not many companies allow the use of 3rd party libraries

      Who's boneheaded enough to do this? I want to know so I can avoid buying anything from them, because their products are going to be overpriced by at least 50% due to the wasted effort.

      It's DLL Hell [wikipedia.org] all over again. Every time you use a third-party library, the user has to make sure it's installed. And in the classpath, unless they installed it as roo

  • ...and not many companies allow the use of 3rd party libraries.

    Who are these companies and what can possibly be their justification for such a blanket policy. I can understand for some ultra-high security/uptime systems with incredibly strict standards and processes who would need to put third party code through an extensive and expensive audit. But for the rest of us? No jUnit? log4j? Is Boost allowed? Good lord, I can't imagine programming in such a world.

    I hope I never work for one of these firms.

    • If you're developing software for someone else and you use a third-party library, you need to a) ship it with your product or b) require the user get it separately. The latter is a hassle (and hassling customers is bad). The former will make your lawyers explode.
      • Licensing is a valid concern, but one that most third party libraries handle quite neatly. Also, the vast majority of third party libraries I personally use are open source (for instance, junit, log4j, boost, anything from apache, etc.). I wonder how prevelant this is...

        Taft

    • I beleive fear is the primary culprit here. Many places I've worked for/with only allow internally developed library use... And I'm sure half of it is swiped, stolen, or 'inspired' by popular, free, open source, 3rd party libraries.
    • Re:Wha-wha-what? (Score:2, Insightful)

      by JoshDM (741866)

      ...and not many companies allow the use of 3rd party libraries.
      Who are these companies and what can possibly be their justification for such a blanket policy.

      Actually there are a number of firms that contain multitudes of red tape that disable their employees from getting anything done without the barest of tools. I have witnessed major separations of "church and state" with these larger companies. This includes the company that did not allow the developers access to the servers, resulting

  • Somebody hasn't worked for "many companies." _Every_ company I've worked for allowed 3rd party libraries. (Sure, there are processes to make sure you don't do something stupid like ship a GPL library with a closed-source product, but that's just common sense.)
  • regex coach (Score:4, Informative)

    by mgkimsal2 (200677) on Wednesday August 02, 2006 @05:14PM (#15835087) Homepage
    I spoke about the "regex coach" tool from http://weitz.de/regex-coach/ [weitz.de] on my podcast (shameless plug!) http://webdevradio.com/ [webdevradio.com] - it's a great tool for helping visually walk through the regex creation process, especially for complex needs.
    • Re:regex coach (Score:2, Informative)

      This tool, by the way, was written in Common Lisp, using Edi's own library

      CL-PPCRE [weitz.de] - portable Perl-compatible regular expressions for Common Lisp

      A library which typically outperforms Perl's own regex engine.
  • by Heembo (916647) on Wednesday August 02, 2006 @05:21PM (#15835139) Journal
    One of the reasons we as programmers write code is to take a very complex idea, like a software application, and write something that a human engineer can understand. The KISS principle especially applies to coders.

    As I get older, my code has gotten more and more straightforward, cause I consider to maintainance cycle of code to be more than 95% of the puzzle. And these days, I have more than one security analyst who is not a senior software engineer poking around me code.

    RegEx's are not-so-readable and not-very-maintainable programming abstracts that should be avoided whenever possible. I prefer using string manipulation abstraction classes (such as my own version of StringTokenizer). They are not as fast and furious as other methods like lexical analysis, and the code is more bloated, but the code is Straight Forward And Easy To Read. There is a power is code of this nature, and my clients have thanked me more than once to not focusing on writing "cool code" but for writing "clean and simple" code. I just tried to paste in a few ugly regex samples, but slashdot blocked me calling them "junk characters" I agree! :)

    For example, take XPATH, this is a clean and simple way to address XML objects. Sure, there is an additional level of abstraction, but you can look at an XPATH query, even from a layman's point of view, and have a clear understanding as to what it is doing.
    • RegEx's are not-so-readable and not-very-maintainable programming abstracts that should be avoided whenever possible.

      If a regex isn't quickly comprehensible to you, either a) the regex is badly written, or b) you need more practice with regex's.

      Seriously, it's very rare for me to come across a regex I'm unable to comprehend. And for more complex ones, Perl certainly allows you to intersperse the regex with comments (I don't recall if Java allows this, though it does support a significant subset of Perl re
    • I used to think the same thing. Back in '99 a guy I was working with would produce a regex and I had no idea what that strange looking thing did. I got a book on Perl and spent quite a bit of time wrapping my head around regular expressions. That's probably the only thing I retained from Perl because I really don't like the language. I started using the ORO package in Java to do regular expressions and switched to the standard library when it was introduced in 1.4. Java's syntax is nearly identical to Perl'
  • I'm glad that it's there, and I suppose it was useful during my prototype phase, but a little profiling revealed that my app was spending half its time parsing input. Dumping out the input to String and sometimes char[] and doing the parsing myself in hand tooled code almost completely erased the speed hit I was taking on load.
    • HOORAY - another reason to avoid the ugliness and maintance nightmare that is RegEx. Thank you for your wise post. You are a paragon of godly inspired wisdom, my Java son! :)
  • Any company that doesn't allow, nay, embrace third party jarballs is missing 98% of the point of Java. The language is so-so, the built in libraries are nice, but not infinite - but the ability to load componentized, versioned, packaged third-party tools is priceless.
  • If I were to ask everyone to start programming in assembly language, I suspect that I would be laughed at. Yet with regular expressions that is exactly what we are doing. If you take a look at the history of regular expressions, you will find staring right back at you the guts of compiler theory with state machines, finite state automatia, etc. Instead of asking for regular expressions, programmers should be asking for higher level pattern matching facilities. Something as simple as finding the balanced
    • Something as simple as finding the balanced parentheses in the string: (a+b)/((c-d)+e) using a regular expression is difficult.

      It's not difficult. It's impossible. Perhaps you should start off by using the right tool for the right job.
    • ...is a parser. Invented about the same time. But those are typically based on transformation rules and regular expressions to tokenize your input.

      You could always build your own regular expression compiler. It's not unheard of. But I submit that the "language" is small enough that it's not worth it.

      • While technically correct, I think it's more to the point to say that parsers are based on tree structures or expansion rules. True transformation rules are so much more powerful, but with far less appealing computational properties.
    • > Something as simple as finding the balanced parentheses in the string: (a+b)/((c-d)+e) using a regular expression is difficult.

      It's in fact impossible in true regular expressions since it requires you to maintain a stack.

      > Yet there have been languages that have advanced string matching capabilities around since the 60's (start looking at Snobol -- which is still alive -- and some of it's descendants).

      Advanced matching is coming in Perl6 (which is runnable right now, http://www.pugscode.org./ [www.pugscode.org] Along
      • To name one, the most direct descendant of Snobol is Icon, though to some even that is "old", there is an OO version of Icon, Unicon, being actively developed, as well as an implementation of Icon that "compiles" down to the Java bytecode (jcon).

        Yes -- the point was that a regular expression doesn't handle such things as a searching for balanced parentheses. However even old Snobol had the facility for dealing with balanced parentheses without getting into full grammars and parsers

        --- [ full snobol exa

  • I recently wrote a small app based on "Filter Builder" by ActiveState. It's called Pattern Sandbox [arizona.edu] and has helped me rapidly prototype regexes for both Java and Perl (because the Java dialect is very similar to Perl's). I made Pattern Sandbox because it was so annoying to write a regex, compile, get to that part of the code/interface, and then finally try it just to find that it does not work correctly so I have to repeat this process until I get it right. If you are using Java regexes on a regular basis
    • Am I the only one that finds it quite easy to get regexs right just by, you know, typing them in? If a regex fails for me, 99% of the time, it's because my input data is in a different format from what I expected. But I've almost never needed any kind of "explorer" tool... that smacks of "tweak it until it works", which is never a good idea, IMHO...
      • > Am I the only one that finds it quite easy to get regexs right just by, you know, typing them in?

        Nope. But I develop spam filter rules all the live long day. These sometimes demand 10 or so very hairy regexes (zero-width assertions and all) all fire in conjunction, then they have to be tweaked slightly to work whenever the spam mutates slightly. You have no idea how convenient it is to have a tool like Pattern Sandbox that will light up the matches when you incrementally tweak a rule expression so y
  • Great things about the Java 1.4+ regex support, from my perspective, include that (1) it's nearly as full-featured as Perl's regexes (and thus far better than Javascript's); and (2) it's usable in web browsers and via embedded applets.

    Those were both key to helping me create Regex Powertoy [powertoy.org], a interactive visual regex tester, much like others mentioned in this discussion -- but fully implemented in a browser. It's in JavaScript and DHTML, with a Java applet for the full-featured and step-controlled regex m

  • If you only program in Java, and you have yet to use regexes, then I could see why you might possibly want this book. But how is it that much better than a general purpose regex book (of which there are several). I would think it would be more useful to have a book that covers regexes as a computing concept and then talks about the differences/limitations of different implementations (grep, sed, Java, JavaScript, Perl, etc.) Is Java still a big enough buzzword to sell books?
  • And now to celebrate this new-found ability to manipulate strings easily:

    s/trench,/trench/;

    Ah, I knew that would make me feel better.

  • not many companies allow the use of 3rd party libraries

    I assume the review author hasn't worked for many companies then. I have yet to find any company the doesn't use third party packages. Logging, XML parsing and unit testing are just the first three things that spring to mind when I consider what might require a third party package. As for the "DLL hell" that someone alleges in a post to this thread, it's virtually non-existant. You ship the third party packages with your application (as a single JAR

No amount of careful planning will ever replace dumb luck.

Working...