Stories
Slash Boxes
Comments

News for nerds, stuff that matters

XML Co-Creator says XML Is Too Hard For Programmers

Posted by CowboyNeal on Tue Mar 18, 2003 08:10 AM
from the harshest-critic-always-self dept.
orangerobot writes "Tim Bray, one of the co-authors of the original XML 1.0 specification has a new entry on his website explaining why he's been feeling unsatisified lately with XML and says his last experience writing code for handling XML was 'irritating, time-consuming, and error-prone.' XML has always a divided response among the technical community. The anti-XML community has several sites stating their positions."

Related Stories

[+] IT: Tim Bray Says RELAX 180 comments
twofish writes to tell us that Sun's Tim Bray (co-editor of XML and the XML namespace specifications) has posted a blog entry suggesting RELAX NG be used instead of the W3C XML Schema. From the blog: "W3C XML Schemas (XSD) suck. They are hard to read, hard to write, hard to understand, have interoperability problems, and are unable to describe lots of things you want to do all the time in XML. Schemas based on Relax NG, also known as ISO Standard 19757, are easy to write, easy to read, are backed by a rigorous formalism for interoperability, and can describe immensely more different XML constructs."
This discussion has been archived. No new comments can be posted.
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1) | 2
  • Too hard? (Score:5, Funny)

    by Ledskof (169553) on Tuesday March 18 2003, @08:11AM (#5535708)
    Sounds like visual basic programmers are complaining or something.
    • Re:Too hard? by Omkar (Score:2) Tuesday March 18 2003, @08:32AM
      • Re:Too hard? by Ledskof (Score:1) Tuesday March 18 2003, @08:57AM
        • Re:Too hard? by Ledskof (Score:1) Tuesday March 18 2003, @11:28AM
        • 1 reply beneath your current threshold.
      • Re:Too hard? by Pxtl (Score:3) Tuesday March 18 2003, @09:39AM
        • Re:Too hard? by Xerithane (Score:1) Tuesday March 18 2003, @10:44AM
        • 1 reply beneath your current threshold.
      • Re:Too hard? by kryonD (Score:1) Tuesday March 18 2003, @10:04AM
        • Re:Too hard? (Score:5, Insightful)

          You know, using VB is just code reuse. It's just reusing more code than you're use to. It's got some serious strengths. The app you write in a couple days the VB programmer can toss out after lunch. How about data aware controls? Those are a pain in the ass in C/C++, although you can make it easier by using third party components. Like ActiveX controls. Which are a pain in C/C++, but are painless in VB. On the other hand, your code won't be small, and you'll be linking to a massive runtime, and you're using a language who's syntax makes me feel dirty.
          Oh, and if you're making web-based apps, wtf are you using C for?
          [ Parent ]
          • Re:Too hard? by pyrrho (Score:1) Tuesday March 18 2003, @05:08PM
            • Re:Too hard? by arkanes (Score:2) Tuesday March 18 2003, @06:51PM
              • Re:Too hard? by pyrrho (Score:1) Wednesday March 19 2003, @12:28AM
          • Re:Too hard? by jd_esguerra (Score:1) Wednesday March 19 2003, @12:36AM
          • Re:Too hard? by dfn5 (Score:2) Friday March 28 2003, @10:48AM
          • 2 replies beneath your current threshold.
        • Re:Too hard? by Evil Grinn (Score:3) Tuesday March 18 2003, @10:51AM
        • Re:Too hard? by crazyphilman (Score:2) Tuesday March 18 2003, @11:00AM
          • Re:Too hard? by kryonD (Score:2) Tuesday March 18 2003, @06:02PM
            • Re:Too hard? by Caine (Score:1) Tuesday March 18 2003, @07:58PM
              • Re:Too hard? by kryonD (Score:1) Tuesday March 18 2003, @08:50PM
            • Re:Too hard? by Tet (Score:3) Wednesday March 19 2003, @10:13AM
        • Re:Too hard? by EriondII (Score:2) Tuesday March 18 2003, @11:06AM
        • 1 reply beneath your current threshold.
      • 2 replies beneath your current threshold.
    • Re:Too hard? by hatchet (Score:1) Tuesday March 18 2003, @08:39AM
    • Re:Too hard? by twitter (Score:2) Tuesday March 18 2003, @08:50AM
      • Re:Too hard? by twitter (Score:2) Tuesday March 18 2003, @09:40PM
      • 2 replies beneath your current threshold.
    • Re:Too hard? by WPIDalamar (Score:3) Tuesday March 18 2003, @09:08AM
      • Re:Too hard? (Score:5, Insightful)

        by khuber (5664) on Tuesday March 18 2003, @09:31AM (#5536141)
        It's stupid to have a general purpose XML parser, when you only need a small subset of functionality.

        Yeah, the world needs more half-assed barely functioning and noncompliant XML parsers.

        Seriously I think it's much more robust to just use a normal XML parser. You get all the character set support. If someone hacked up their own parser at work I would reject it in a code review. There's no sense in maintaining your own XML parser these days; they are a commodity.

        -Kevin

        [ Parent ]
      • Re:Too hard? by Billly Gates (Score:3) Tuesday March 18 2003, @10:50AM
        • Re:Too hard? by johnnyb (Score:2) Tuesday March 18 2003, @01:02PM
          • Re:Too hard? by Anonymous Coward (Score:1) Tuesday March 18 2003, @06:04PM
            • Re:Too hard? by markhb (Score:1) Wednesday March 19 2003, @12:26PM
      • But what subset are you using? Can't call it XML by Ars-Fartsica (Score:2) Tuesday March 18 2003, @12:05PM
      • Re:Too hard? by blibbleblobble (Score:1) Tuesday March 18 2003, @03:36PM
      • Re:Too hard? by Sayjack (Score:2) Friday March 28 2003, @08:03PM
      • 1 reply beneath your current threshold.
    • Re:Too hard? by sib183 (Score:1) Tuesday March 18 2003, @10:44AM
    • Re:Too hard? by crazyphilman (Score:2) Tuesday March 18 2003, @10:48AM
      • Re:Too hard? by ckaminski (Score:1) Tuesday March 18 2003, @11:03AM
        • Re:Too hard? by crazyphilman (Score:2) Tuesday March 18 2003, @11:15AM
          • Re:Too hard? (Score:5, Insightful)

            by EastCoastSurfer (310758) on Tuesday March 18 2003, @01:37PM (#5538161)
            The market for *real* programmers has been destroyed by corporate America.

            I think that the *real* programmers that you have talked about all write libraries now. These guys all have jobs at the tool makers like MS, Apple, etc...

            Businesses in general don't want (and generally don't need) *real* programmers, they want software engineers. They want someone who can sit down, work out some requirements and provide a timely, cost effective solution. It has taken me some time to fully realize this, but the right technical solution is not always the right business solution. The PHB could really care less if the app is written in VB, C, Java, as long as the application works to within their parameters. It is those parameters that are specified by the people paying for the software that will direct the language/technology you ultimately use.
            [ Parent ]
    • Re:Too hard? by murdocj (Score:1) Tuesday March 18 2003, @12:35PM
    • Re:Too hard? by whereiswaldo (Score:3) Tuesday March 18 2003, @04:53PM
      • Re:Too hard? by dwsauder (Score:2) Tuesday March 18 2003, @09:42PM
    • 5 replies beneath your current threshold.
  • Hah. (Score:4, Funny)

    by termos (634980) on Tuesday March 18 2003, @08:15AM (#5535723)
    (http://termos.devcave.net/)
    They should only be glad not to be coding cobol, intercal or befunge!
    • Re:Hah. by jkrise (Score:1) Tuesday March 18 2003, @08:29AM
    • Re:Hah. by Surak (Score:3) Tuesday March 18 2003, @08:52AM
      • Re:Hah. by desktopheap (Score:1) Wednesday March 19 2003, @03:44AM
      • 3 replies beneath your current threshold.
    • 2 replies beneath your current threshold.
  • Really? (Score:3, Interesting)

    Well, programming *is* a hard task, and simplifying it is about building layers and layers of better abstractions to machine code and binary data.

    Without XML, what would you normally do? Create a flat text file and read it using whatever syntax you'll like that day. I agree XML is ugly as hell to type in manually, but at least it's a standard, and every programming language in use today can handle it in a standard way - DOM, SAX, whatever.
    • Re:Really? by protocoldroid (Score:1) Tuesday March 18 2003, @08:23AM
    • Re:Really? (Score:5, Funny)

      by Anonymous Coward on Tuesday March 18 2003, @08:26AM (#5535773)
      To paraphrase:

      XML is like:

      • * SGML without configurability
        * HTML without forgivingness
        * LISP without functions
        * CSV without flatness
        * PDF without Acrobat
        * ASN.1 without binary encodings
        * EDI without commercial semantics
        * RTF without word-processing semantics
        * CORBA without tight coupling
        * ZIP without compression or packaging
        * FLASH without the multimedia
        * A database without a DBMS or DDL or DML or SQL or a formal model
        * A MIME header which does not evaporate
        * Morse code with more characters
        * Unicode with more control characters
        * A mean spoilsport, depriving programmers the fun of inventing their own syntaxes during work hours
        * The first step in Mao's journey of a thousand miles
        * The intersection of James Clark and Oracle
        * The common ground between Simon St. L and Henry Thomson
        * The secret love child of Uche and Elliotte
        * Microsoft's secret weapon against Sun's Open Office
        * Sun's secret weapon against Microsoft's Office
        * The town bicycle
      [ Parent ]
    • Re:Really? by phrantic (Score:3) Tuesday March 18 2003, @08:37AM
      • Re:Really? by captredballs (Score:2) Tuesday March 18 2003, @12:50PM
        • 1 reply beneath your current threshold.
      • 1 reply beneath your current threshold.
    • Oh please! (Score:5, Interesting)

      by gwappo (612511) on Tuesday March 18 2003, @08:50AM (#5535903)
      It's annoying when posters get presumptious. The people complaining in the article are by all means elite programmers, proclaiming xml is okay because "programming *is* a hard task" is non-sense and in the same league as "HLL's are for wussies, real men code in assembly" and other crap.

      The criticism on XML is accurate, correct, valid, if only for the simple reason that the code needed to interface with the libraries is 90% plumbing-work and 10% business-solution. That 90% plumbing-work leaves oppertunity for _a lot of bugs_ to be created and for any solution using XML to become a resource-hog.

      Having a standard interchange format like XML is a fun-thing, and "good", as it allows standardized processing of these formats. However, the article identifies a clear gap in the tooling and that gap needs to be addressed for XML to become a widespread success, instead of another buzzword hype.

      [ Parent ]
    • plist by hendridm (Score:2) Tuesday March 18 2003, @10:41AM
    • Programming *is* a hard task by Martin Spamer (Score:2) Tuesday March 18 2003, @11:46AM
    • Really! by GoatEnigma (Score:1) Tuesday March 18 2003, @12:20PM
    • Re:Really? by Feztaa (Score:2) Tuesday March 18 2003, @01:43PM
      • Re:Really? by Bedouin X (Score:2) Tuesday March 18 2003, @08:36PM
      • Re:Really? by juhaz (Score:1) Tuesday March 18 2003, @10:06PM
      • 2 replies beneath your current threshold.
    • 4 replies beneath your current threshold.
  • But XML is great for computers... by Max Romantschuk (Score:1) Tuesday March 18 2003, @08:18AM
  • A good point (Score:3, Insightful)

    by shish (588640) on Tuesday March 18 2003, @08:19AM (#5535737)
    (http://www.shishnet.org/)
    Sure it sucks, but it's a *standard* that everyone can use, and there are many libraries for it so you don't need to write your own parsing code
    • Re:A good point (Score:4, Insightful)

      by jilles (20976) on Tuesday March 18 2003, @09:13AM (#5536038)
      (http://www.jillesvangurp.com/)
      Not only is it a standard, it appears to be the only widely accepted standard. Not using it currently boils down to going back to the hacked together, generally incompatible data formats of the past. Reinventing the wheel still is a popular way of passing time but it has never been very productive.

      People often fail to see the point of widely adopted standards but the bottom line is that it makes it easier to reuse functionality that confirms to the standard. There are now both SAX and DOM based parsers for most common programming languages. Basically if you spend some time figuring out how these APIs work you can work with XML from almost any language.

      That is not the problem. What is a problem is that everybody is introducing their own xml based languages and in many cases forget to publish the appropriate xml schema/dtd.

      Now the guy who is complaining here is a perl programmer who has to process data that is passed to him in XML form. His point is that it easier for him to throw together a bunch of regular expressions to do his thing than it is to use some off the shelf validating parser with a generic DOM/SAX based API. Good for him that is job is so simple that a bunch of regular expressions do the trick for him. I'd hate to maintain his code though and I suspect he doesn't have much reuse beyond the odd copy paste.
      [ Parent ]
    • Re:A good point (Score:4, Insightful)

      Amen, and amen.

      Yes standards suck. But the suck in a way that is consistant and allows other sucky things to talk to other sucky things.

      I'll bet the 802.11b is a really crappy standards. But as long as I can pick up interchangable devices for $50 at the local computer store I'll live in ignorant bliss.

      [ Parent ]
    • Re:A good point by snakecoder (Score:1) Tuesday March 18 2003, @04:05PM
    • Re:A good point by Delphix (Score:1) Friday March 28 2003, @04:38PM
    • Re:A good point by Delphix (Score:1) Friday March 28 2003, @04:41PM
  • Don't Blink by Cnik70 (Score:1) Tuesday March 18 2003, @08:20AM
  • It's about tools, libraries (Score:5, Interesting)

    by Anonymous Coward on Tuesday March 18 2003, @08:20AM (#5535743)
    Well, first he chose a bad tool (Perl regexp) for XML processing, and then complains about his tools being insufficient.

    Using Perl regexps to parse XML is silly, because there's too much variability (e.g. attributes in any order, elements covering multiple lines) that regexps aren't good at handling. You can do it, of course, but it quickly gets messy.

    There's a number of tools and libraries (with Perl or other languages) beyond plain DOM and SAX that use proper XML parsers and are reasonably easy to use. He should use one of those, and stop complaining.
    • Re:It's about tools, libraries (Score:5, Informative)

      by kinnell (607819) on Tuesday March 18 2003, @08:34AM (#5535819)
      As he say in the article, the reason he uses Perl regexp is that the tools/libraries have to read the entire file. If this is a long stream, it's grossly inefficient - you have to load the entire thing into a massive tree structure in memory. If the job can be done serially with regexps without using a noticeable amount of memory or time, then it is often better. This is the point of the article - there is a choice between using a method which is often grossly innefficient for real world problems (XML libs) and a fast but messy method (Perl regexp). Neither of these is really satisfactory, hence the complaint.
      [ Parent ]
    • Re:It's about tools, libraries by skillet-thief (Score:2) Tuesday March 18 2003, @08:47AM
    • Re:It's about tools, libraries by PigleT (Score:2) Tuesday March 18 2003, @08:51AM
    • Re:It's about tools, libraries (Score:5, Interesting)

      by Sique (173459) on Tuesday March 18 2003, @08:51AM (#5535910)
      (http://127.0.0.1/)
      No. It is not. It is about basic computer science.

      XML is a grammar of Chomsky Type 2 (context free grammar). So you need a stack machine (or equivalent) to parse the whole (left or right) subtree to get your information. This may be fine for small data (like config files), but it takes a huge amount of memory space if you have real world data like the SWIFT file you have to parse for a special transaction. What he is complaining about is exactly this: Lots of parsing to get a simple datum.

      With regexp your parsing is much faster, because you can concentrate on substrings, you can parse them without using a stack, you can use them in stream context. But regexp are Regular Expressions (Chomsky Type 3 grammar), so they are in fact just a subset of XML and not able to parse XML completely.

      One of the links in the article points to another rant, where the author wants some regulations for a limited XML. Badly enough the ideas he is proposing are in fact context sensitive and such they are Chomsky Type 1 (context sensitive grammar) and a superset of XML instead of a simplified subset. Someone remembers the Early algorithm with something that can be described as a multi dimensional stack?

      Generic XML parsers are memory intensive and can't be as fast as regular expressions. That's just computer science. Deal with it.
      [ Parent ]
      • Re:It's about tools, libraries by protonman (Score:1) Tuesday March 18 2003, @09:34AM
      • Re:It's about tools, libraries by Anonymous Coward (Score:1) Tuesday March 18 2003, @10:01AM
        • Re:It's about tools, libraries (Score:4, Interesting)

          by Sique (173459) on Tuesday March 18 2003, @10:15AM (#5536459)
          (http://127.0.0.1/)
          No, I am suggesting, that in general you have to use a stack machine. Surely you can use degenerated trees instead of fully balanced trees to store your data. And a concatenation of elements is a regular expression (and a degenerated tree). But then you are already making assumptions about the data you get. But with such limiting assumptions you can easily streamline your code. But you are loosing the full power of XML on the way. And you need a grammar that makes sure you don't mix terminals and nonterminals.

          It starts out already if you are using escape characters to mark nonterminals and escape those characters with itself to mark them terminal. Those markings are still regular, but you loose already some speed ups. For instance \\ matches \\" and \\\", but one means just \ and the end of the string, and the other one means \" and the string continues. The only way to stay out of the mess is to make sure you are using an only left bound parser, first parse for all escape characters and then for the nonterminals, which makes your parser already a (local) 2-pass-parser.
          [ Parent ]
      • Re:It's about tools, libraries by Len (Score:3) Tuesday March 18 2003, @10:07AM
      • Re:It's about tools, libraries (Score:5, Informative)

        by Loma (584330) on Tuesday March 18 2003, @12:54PM (#5537794)
        You have used many big words, and you may have your language levels incorrect, but you are clearly wrong in one respect:

        Generic XML parsers are memory intensive and can't be as fast as regular expressions. That's just computer science. Deal with it.


        Well, I've written my own XML parser, as well as a compiler for a simplified version of C, so I think I'm somewhat qualified to talk on this. A generalized XML parser is not memory intensive, unless you are a very bad programmer. All you need is a depth-first stack, which will be as high as your XML tree is deep. And given that, a stack of size N is capable of handling a tree of size X^N, you are definitely going to run out of disk space before you run out of RAM. In other words, the memory required for parsing an XML tree is trivial.

        An XML parser is one of the simplest parsers imaginable. It's a sophmore task to create a state machine to process the generic L(1) (or is it L(0)?) XML grammar. And as you should know, a state machine for an L(1) grammar is as fast as you can get.

        Anything you do with regular expressions will be much more complicated. As I'm sure you know, regular expressions are turned into state machines before being used to process the input. And almost all regular expression state machines are much more complicated than the state machine you need for an XML parser. In an XML parser, definite boundaries exist on elements such as:
        '<' and '>'


        Regular expressions are not this smart. For example, looking for the substring "abc" in the longer string "abababaaabbbabcabababac" is already generating a statemachine that is more complicated than that needed for XML parsers.

        Back to the "memory" intensive nature of XML parsers. If you parse your XML tree into a nested hashmap structure, then the memory needed will be proportional to the number of nodes in the XML tree. Maybe this is what you meant by "memory intensive". However, this is totally unnecessary. You can easily construct an XML parser to look for the specific elements you care about. Then you only get those elements, and you only need to allocate the memory for the elements required.
        [ Parent ]
      • Re:It's about tools, libraries by rp (Score:1) Wednesday March 19 2003, @07:20AM
      • Re:It's about tools, libraries by Sique (Score:1) Tuesday March 18 2003, @12:38PM
      • 4 replies beneath your current threshold.
    • Re:It's about tools, libraries (Score:5, Informative)

      by PigleT (28894) on Tuesday March 18 2003, @09:01AM (#5535962)
      (http://pig.sty.nu/)
      I agree that it's about tools and libraries. And this is what I think about them, too.

      At work, I brush up against XML occasionally, mostly for documentation or data-resultset purposes. In my own time, I use it in my photo
      gallery - result-sets from database queries get converted to XML and then spat out through XSLT in Sablotron, straight to web. For all the hoops it goes through, it's actually still quite nippy.

      However, I also dislike it intensly.

      I've written a blog-like system-news announcement board using a Ruby CGI against postgresql as a backend. I can pull back a result-set - a
      simple table-thing with each row being a text announcment, half a dozen fields (when posted, by whom, etc). And I wanted to output this in HTML form for the web, in plain-text to send to a user who wanted it via email every day, and in s-exp form for my own gratification.
      However, the first problem you run into is the formatting. A textarea in an HTML form gives no line-wrapping (wanted for plaintext output,
      but only in specific fields) and embeds ^M characters everywhere. When the output is HTML, those ^Ms want to become br tags. When the output
      is plaintext or sexp, they want to become \n. Simple, if ONLY there were a way of doing either elementary reformatting or search-n-replace in XSLT. There is, but s/// is about 10 lines' worth, if my googling is to be believed. That makes it non-optimal for one of its primary uses: making transformations on big blocks of text-based data, and it can't even edit within a node correctly? Pathetic.
      Why shouldn't I just write 3 output methods in my Ruby CGI script that take the result-set directly to text, HTML or sexp formats, with the power of
      ruby to do a #gsub("^M", "\n") on just the fields I want, in a tiny few extra characters of code?

      Now to tackle what you've said:

      "Using Perl regexps to parse XML is silly"

      No, it's not. Perl regexps are a highly featureful, pre-existing, code. I'd be surprised if libxml *didn't* use regexps in its XML parsers, frankly.

      "e.g. attributes in any order, elements covering multiple lines) that regexps aren't good at handling."

      These things are not a problem. You can easily match an attribute occurring, as it does, within a n opening-tag, and pull out both the name and the contents. Using that to set a variable of given name in your program - a highly important part, given that XML is a data-transfer format and it's the internal representation afterwards
      that is its whole raison-d'etre - is trivial. Thus, perl wins.
      Multi-line matching is explicitly catered-for in perl, with /m or /s on the end of the regexp.

      "There's a number of tools and libraries "...

      Indeed there are. And you know what? When I've got a small paragraph (under 10 lines) of data that I want to transfer from A to B, the last thing I'm going to do is invoke a 600Kb library so I can use a pompous and fashionable set of functions to produce "XML", when perl/ruby/sh have all had
      perfectly valid "print" or "echo" commands for the past decade or more. If the output is valid XML, you've no reason to diss the method used to produce it.

      As a final example, I've also had a few documents to be writing, of my own, at work. I've had two options: either sit down, set up emacs to
      handle XML sources smoothly so I can open and close tags at the push of a key-chord the way I *want* to create the stuff, or program a
      small sub-language. Lisp, in the form of _librep_, won the day, with a few small functions to produce strings based on the input. And guess what? Because this is a programming language rather than a mere text-transforming language, I made a CGI out of it, and can embed programs within my "data", too, without feeling the urge to write to
      the W3C about it.
      Editing it is an absolute dream - opening and closing paragraphs of text is a piece of cake and fits the way I want to work. (Maybe you like looking at spikey angle-bracket characters, I
      dun
      [ Parent ]
    • Re:It's about tools, libraries by jgerman (Score:2) Tuesday March 18 2003, @10:31AM
    • Re:It's about tools, libraries by Polo (Score:2) Tuesday March 18 2003, @05:59PM
    • Tools: Castor, JAXB and Pull Parsers by kupci (Score:1) Tuesday March 18 2003, @06:24PM
    • Re:It's about tools, libraries by TekPolitik (Score:2) Friday March 28 2003, @05:16PM
    • Re:It's about tools, libraries by fymidos (Score:1) Wednesday March 19 2003, @01:15AM
    • 2 replies beneath your current threshold.
  • I tend to agree. (Score:3, Funny)

    by NetDanzr (619387) on Tuesday March 18 2003, @08:20AM (#5535744)
    The last book on XML I read and understood was XML for Dummies.
  • Alas, XML by Joe the Lesser (Score:1) Tuesday March 18 2003, @08:21AM
    • Re:Alas, XML by 6hill (Score:2) Tuesday March 18 2003, @08:37AM
  • Maybe he should have read Knuth (Score:5, Insightful)

    by thogard (43403) on Tuesday March 18 2003, @08:23AM (#5535756)
    (http://web.abnormal.com/)
    XLM parsing (just like the TeX language) has a problem that when there are problems in the input files, the situation diverges into two different caes, one requires an infinite memory and the other infinite time to deal gracefully with errors.

    None of this would have ever been needed had CS been tuaght properly. There are other concepts to describe how files are to be organized. Some of the systems date from the 1950's. BNF (which seems to work very well for programmers to describe file formats to other programmers) dates from the early 1960's. What was needed is a BNF type grammar that is machine readable.

    Would XLM have ever taken off if the web used something sane and not a hacked version of a nasty text formatting system from decades ago?
  • This does not bode well by fudgefactor7 (Score:1) Tuesday March 18 2003, @08:23AM
    • Re:This does not bode well (Score:5, Insightful)

      by JimDabell (42870) on Tuesday March 18 2003, @08:32AM (#5535801)
      (http://www.jimdabell.com/)

      Did you actually read the article?

      I can sum it up very easily:

      • Callbacks irritate him.
      • It's not always practical to build a tree in-memory.

      He's looking for a nicer api for processing XML, he's not looking to replace XML entirely.

      [ Parent ]
    • Re:This does not bode well by ChimChim (Score:2) Tuesday March 18 2003, @08:36AM
    • Re:This does not bode well (Score:4, Insightful)

      by Random Walk (252043) on Tuesday March 18 2003, @08:39AM (#5535843)
      After reading the article, I would say he tries to use XML for something it is not very suitable for, and argues that in this case the available libraries are not useful (surprise ...).

      XML is not a stream - it has a hierarchical tree structure, and IMHO is not useful for anything that (a) by its very nature is a continuous stream of data (say, a log file), or (b) wants to be processed as a stream (because it's big, and would require too much memory to be handled as a single data structure).

      The problem seems to be that XML is good for portability and standardization, and therefore is abused for things it's not well suited for (the well-known 'if all you have is a hammer, every problem looks like a nail' syndrome).

      [ Parent ]
    • Re:This does not bode well by sdcharle (Score:1) Tuesday March 18 2003, @10:06AM
  • Not Bode Well by Devil Ducky (Score:1) Tuesday March 18 2003, @08:23AM
  • My last XP with XML by YeeHaW_Jelte (Score:2) Tuesday March 18 2003, @08:24AM
  • o-xml (Score:3, Interesting)

    by barnaclebarnes (85340) on Tuesday March 18 2003, @08:25AM (#5535766)
    (http://dirtyeye.com/)
    My friend martin wrote an XML language called o-xml which can be found here [o-xml.org]. Me not being the best programmer in the world has actually found this somewhat easy to learn and use.
    • This is just awful by athmanb (Score:2) Tuesday March 18 2003, @08:52AM
    • Re:o-xml by twitter (Score:2) Tuesday March 18 2003, @08:53AM
    • Re:o-xml by Carewolf (Score:1) Tuesday March 18 2003, @10:27AM
    • Re:o-xml by molofaha (Score:1) Wednesday March 19 2003, @06:39AM
    • Re:o-xml by Pond823 (Score:1) Wednesday March 19 2003, @10:51AM
    • 1 reply beneath your current threshold.
  • did he miss the whole libxml thing? (Score:5, Insightful)

    by Arethan (223197) on Tuesday March 18 2003, @08:25AM (#5535769)
    (Last Journal: Thursday August 23 2001, @09:23PM)
    Too hard and error prone? What the fuck? The whole reason XML is nice is because it is a standard formatting. Meaning that you write the library to read/write it once, and it works for all cases. libxml is a good example of this. I've used it on a few occasions, and found it to be more than adequate for what I needed.

    Now, don't mistake me for being a pro XML monkey. I think the whole XML revolution is a bunch of hot air, and that people are getting all excited over the rediculous misconception that tagged text is a new data format. (Considering that it has been around at LEAST since the early 80s) And I absolutely do not want to get started on SOAP. Why anyone wants to lump RPC calls up with http traffic to make it more difficult to firewall is beyond me.

    But whining that XML is hard is bullshit. Use a library to do all of tha handling for you. That's what they are for.
  • Anybody can explain me by Anonymous Coward (Score:2) Tuesday March 18 2003, @08:27AM
  • by acomj (20611) on Tuesday March 18 2003, @08:28AM (#5535779)
    (http://www.plocp.com/)
    I looked into learning xml.
    Beyond the simple tagging theres sax, dom, xslt, DTDs, XML Scheme, XForms oil etc.

    Seemed like a royal pain.

    While the goal of having standard document formats is a great one.
    But where is the web repository of those standards for different types of documents? FOr example years ago I saw a simple DTD for drawing shaped/curves. This would be great for drawing programs. The document seems longs gone.

    What do I do if I have data I want to save in a format that others can read?

    Apple is starting to use XML for file formats (keynote, apples powerpoint software, documents are XML) Hopefully This will start to take off.

  • In other news... (Score:5, Funny)

    by DeepDarkSky (111382) on Tuesday March 18 2003, @08:28AM (#5535781)
    Programming is just too hard for programmers.

    Come on, XML is not THAT complicated. Besides, there's so many different facets to it. I think most people have the most difficult time figuring out what it is (similar in some ways, to the .NET puzzlement), what it can be used for, and which one of those uses to, um, use.

    As with anything worth doing, it takes a little time to learn and familiarize yourself with XML before you can really get into it and make it useful, just like programming itself.

    Of course, compared to HTML, which any grade school kid can write (I don't have any proof of this claim, it's all hearsay), XML's uses go far beyond edit-save-browse-repeat. I think everyone need to find their own little niche of usefulness in the XML universe before moving on to the other areas.

  • Not worth it by AppyPappy (Score:2) Tuesday March 18 2003, @08:28AM
  • Iron guys by buddha42 (Score:1) Tuesday March 18 2003, @08:32AM
    • Re:Iron guys by Minna Kirai (Score:1) Tuesday March 18 2003, @05:33PM
  • Short summary (Score:5, Informative)

    by Anonymous Coward on Tuesday March 18 2003, @08:33AM (#5535812)
    Tim Bray thinks that callback based XML apis are a bit awkward to use. He would prefer to use something like a pull parser (see for example http://www.xmlpull.org for examples in java) to the current perl xml apis.

    And he would probably want to be able to parse parts of documents ("XML Fragments"), rather than whole documents.

    I agree with his views (not using perl too much, though). But this is *not* the end of XML or anything. Tim just has some thoughts about how the xml api could be better in perl. Not very exciting, perhaps...
  • He is right, I think. (Score:4, Interesting)

    by expro (597113) on Tuesday March 18 2003, @08:34AM (#5535821)

    Among other things ...

    (1) They need to eliminate the doctype can of worms. Unfortunately, this cries out for an alternative solution for character entities.

    (2) Namespaces need to be simplified and better integrated into the core of the language. Expanding on this, there need to be much better mechanisms for modularizing parts of the markup so that it isn't necessary to parse and hold everything in memory to make sense of it.

    (3) There needs to be clean-up and standardization of element id's and references, integrating it with (1) and (2).

    Do others have more? Should this be done compatibly with XML?

    I believe that we really need a standard for arbitrary abstract data models, with XML as just one syntactic representation, but I would have to go into long details to justify this.

    • Re:He is right, I think. (Score:5, Insightful)

      by kalidasa (577403) on Tuesday March 18 2003, @09:03AM (#5535981)
      (Last Journal: Monday October 29, @09:37AM)

      1. Doctype is necessary. Perhaps you've never tried handling a very complex text (a big DOCBOOK text or a big TEI text). You need to know what kind of text you're dealing with, and there's no way to come up with one universal solution for all kinds of texts. The only character entities needed are the handful of named entities that are part of the standard: &lt; &gt; &amp; etc. The rest can be handled by Unicode (including the PUA) and transcoding (if you are using a ISO 8859 encoding and you need a character outside that encoding, then you need to rethink the encoding you've chosen to use. UTF-8 is your friend). Entities really are good for more complex units (strings, etc.), rather than single characters. What character entities have to do with DOCTYPE is beyond me.

      2. True

      3. Standardize element IDs? Element IDs are part of the text, not part of the structure. They're simply a way of simplifying the difficulty of accessing random parts of text.

      I believe that we really need a standard for arbitrary abstract data models, with XML as just one syntactic representation, but I would have to go into long details to justify this.

      So you're saying we need a meta-meta-language? The *MLs are a standard for arbitrary abstract data (and text) models (because not all texts are hierarchical like DBs).

      I think the problem here is that DB programmers (I'm excepting Bray from this) are overusing XML for very simple DB tasks that it wasn't intended for. If you're just doing a 40 field, 30,000 record flat DB, XML is NOT the solution. But it is the best solution for complex non-hierarchical data (i.e., books, etc.).

      As for Bray, I don't think he's saying XML itself (the markup standard alone) is too hard, that it should be abandoned. I think he's saying we haven't come up with simple enough ways of accessing XML data through APIs. But of course that wouldn't be a spicy enough meatball for the Taco.

      [ Parent ]
    • Re:He is right, I think. by Bazzargh (Score:2) Tuesday March 18 2003, @02:20PM
  • His idiom. (Score:5, Insightful)

    by palad1 (571416) on Tuesday March 18 2003, @08:40AM (#5535847)
    He's stating that he'd basically like others coders write more code the way he sees fit.
    [quote]
    while () {
    next if (XX);
    if (X|||X)
    { $divert = 'head'; }
    elsif (XX)
    { &proc_jpeg($1); }
    # and so on...
    }
    [/quote]

    Repeat after me: I will never leave parsing XML up to a regexp especially if my xml may contain CDATA and Comment sections. I will never...

    Unless you are 100% certain the file you are parsing is directly under your control, ie: no comments, no cdatas, params always in the same order, same indentation, same bloody encoding [pardon my french], well, you just will have to acces the data using some kind of DOM or abstract tree representation.

    I don't think he thinks no one uses XML, he seems to deplore the fact that some people don't get it at all and resort to heavy duty tools for trivial tasks [thus justifying his example above].

    Basically XML is quite simple, but that's not the matter, the problem is that XML bundles ACTUAL DATA, it's all about the complexity of those data, not the API used to access it [although writing a DOM implementation is a real pain]
  • XML is good (Score:5, Interesting)

    by Ender Ryan (79406) on Tuesday March 18 2003, @08:42AM (#5535855)
    (Last Journal: Monday November 27 2006, @04:43PM)
    I don't understand why so many people complain about XML so much. It's really quite useful for storing arbitrary data. We have several hundred thousand text-based documents where I work, and it has been a total nightmare, until I converted the whole thing(well, I'm not done yet...) to XML.

    The documents are generally displayed as HTML on the web, but they're also read by a couple different programs for different purposes. When I first started here, it was mostly a mess of poorly hand-written HTML, but thankfully there were *only* about 20k documents at the time.

    I was charged with the task of writing said programs to read these damn files. Unfortuneately, they weren't all marked up the same...

    Now that we have XML and standard libraries for reading XML, it makes handling these documents a snap. Any program that needs to read them can simply have an XML parser plugged into it. The integrity of the documents themselves is maintained by the fact that they don't work if they're not properly marked up. So all these documents work, 100% all the time, and writing programs to read said documents is very simple and not prone to errors.

    Yay for XML! :)

    So, to sum up, XML is doing what it was meant to do, no less. Unfortuneately, it's also probably doing a bit more as well, XSL anyone? Yeck, why not just have a stand XML scripting language, why the need for the language to be valid XML itself?

  • JAXB by Hellvetica (Score:1) Tuesday March 18 2003, @08:44AM
    • Re:JAXB by elsegundo (Score:2) Tuesday March 18 2003, @09:18AM
    • Re:JAXB by Y-Leen (Score:2) Tuesday March 18 2003, @12:08PM
      • Re:JAXB by Anonymous._.Coward (Score:1) Tuesday March 18 2003, @12:12PM
    • 1 reply beneath your current threshold.
  • "Load into memory" vs. "Callbacks" (Score:4, Informative)

    by itsallinthemind (659911) on Tuesday March 18 2003, @08:44AM (#5535872)
    (http://www.blueoxide.com/)
    Say what you will about Microsoft - and many of you have - but they really got it right with their XmlReader class in .NET. It streams the document like SAX (the "callback" interface Tim mentions in his comments), but allows the programmer to cursor over the document manually rather than having to handle everything in thrown event handlers (which I agree can be a real headache, especially in highly variable or deeply nested documents.)

    XML is just one of the tools in our collective toolbox. Use it where it helps you solve a problem. Don't bother if it doesn't.

  • XML is not a programming language... (Score:3, Insightful)

    by borgdows (599861) on Tuesday March 18 2003, @08:45AM (#5535879)
    ... it's a convenient format to store and retrieve hierarchical information, that's all.
  • Yea can be hard by Anonymous Coward (Score:1) Tuesday March 18 2003, @08:45AM
  • by g4dget (579145) on Tuesday March 18 2003, @08:47AM (#5535883)
    I have to agree that XML has serious problems.

    Now, I have to say: a universal syntax for tree-structured data is very useful: experience since the 1970s with one such universal syntax, Lisp, has shown that. It is unfortunate that XML is about the worst imaginable implementation of that idea. XML combines being a nuisance to type with having comparatively complex semantics and lots of redundant features.

    What is ironic is that the same "real world programmers" who wax ecstatic about XML also condemn Lisp as too complicated and too difficult to read. The universal syntax that XML aspires to, Lisp syntax delivered many decades ago. It's just that prejudice and ignorance caused people to re-invent the wheel (and in square form, too) in the form of XML.

    I am pretty torn between whether XML is a blessing or a curse. We really need something like it, but XML is so bad that it may not even live up to the level of "poorly designed industry standard but better than nothing".

    • I agree, of course... (Score:5, Insightful)

      by alispguru (72689) <bane@nOsPAM.gst.com> on Tuesday March 18 2003, @09:27AM (#5536125)
      (Last Journal: Thursday November 13 2003, @03:44PM)
      Given my .sig, how could I disagree?

      XML got one thing right over unadorned S-expressions - document packaging, specifically versioning and character-set labeling. XML inherited this from SGML, and it's one of the few things it took from there that was actually worth keeping.

      For a good laugh, read the Origin and Goals [w3.org] section of the XML spec. Of the ten goals for XML listed there:

      XML shall be straightforwardly usable over the Internet.

      XML shall support a wide variety of applications.

      XML shall be compatible with SGML.

      It shall be easy to write programs which process XML documents.

      The number of optional features in XML is to be kept to the absolute minimum, ideally zero.

      XML documents should be human-legible and reasonably clear.

      The XML design should be prepared quickly.

      The design of XML shall be formal and concise.

      XML documents shall be easy to create.

      Terseness in XML markup is of minimal importance.

      I'd say two of them were met, but were bad ideas (SGML compatibility, terseness unimportant), and five of them were completely missed (ease of use, human legibility, quickly designed, formal and concise, ease of creation).

      Thirty per cent is a failing grade, folks...

      [ Parent ]
    • Re:XML: bad implementation of a good idea by Ed Avis (Score:2) Tuesday March 18 2003, @10:54AM
  • ANSI X.12 by wardk (Score:1) Tuesday March 18 2003, @08:47AM
  • In related news... (Score:3, Funny)

    by arvindn (542080) on Tuesday March 18 2003, @08:48AM (#5535891)
    (http://arvindn.livejournal.com/ | Last Journal: Monday June 16 2003, @12:39AM)
    It's now official. C++ creator admits it was all a hoax! Read on for the details of the stunning scoop...

    On the 1st of January, 2003, Bjarne Stroustrup gave an interview to the IEEE's 'Computer' magazine.

    Naturally, the editors thought he would be giving a retrospective view of twelve years of object-oriented design, using the language he created.

    By the end of the interview, the interviewer got more than he had bargained for and, subsequently, the editor decided to suppress its contents, 'for the good of the industry' but, as with many of these things, there was a leak.

    Here is a complete transcript of what was was said, unedited, and unrehearsed, so it isn't as neat as planned interviews.

    Interviewer: Well, it's been a few years since you changed the world of software design, how does it feel, looking back?

    Stroustrup: Actually, I was thinking about those days, just before you arrived. Do you remember? Everyone was writing 'C' and, the trouble was, they were pretty damn good at it. Universities got pretty good at teaching it, too. They were turning out competent - I stress the word 'competent' - graduates at a phenomenal rate. That's what caused the problem.

    Interviewer: Problem?

    Stroustrup: Yes, problem. Remember when everyone wrote Cobol?

    Interviewer: Of course, I did too

    Stroustrup: Well, in the beginning, these guys were like demi-gods. Their salaries were high, and they were treated like royalty.

    Interviewer: Those were the days, eh?

    Stroustrup: Right. So what happened? IBM got sick of it, and invested millions in training programmers, till they were a dime a dozen.

    Interviewer: That's why I got out. Salaries dropped within a year, to the point where being a journalist actually paid better.

    Stroustrup: Exactly. Well, the same happened with 'C' programmers.

    Interviewer: I see, but what's the point?

    Stroustrup: Well, one day, when I was sitting in my office, I thought of this little scheme, which would redress the balance a little. I thought 'I wonder what would happen, if there were a language so complicated, so difficult to learn, that nobody would ever be able to swamp the market with programmers? Actually, I got some of the ideas from X10, you know, X windows. That was such a bitch of a graphics system, that it only just ran on those Sun 3/60 things. They had all the ingredients for what I wanted. A really ridiculously complex syntax, obscure functions, and pseudo-OO structure. Even now, nobody writes raw X-windows code. Motif is the only way to go if you want to retain your sanity.

    Interviewer: You're kidding...?

    Stroustrup: Not a bit of it. In fact, there was another problem. Unix was written in 'C', which meant that any 'C' programmer could very easily become a systems programmer. Remember what a mainframe systems programmer used to earn?

    Interviewer: You bet I do, that's what I used to do.

    Stroustrup: OK, so this new language had to divorce itself from Unix, by hiding all the system calls that bound the two together so nicely. This would enable guys who only knew about DOS to earn a decent living too.

    Interviewer: I don't believe you said that...

    Stroustrup: Well, it's been long enough, now, and I believe most people have figured out for themselves that C++ is a waste of time but, I must say, it's taken them a lot longer than I thought it would.

    Interviewer: So how exactly did you do it?

    Stroustrup: It was only supposed to be a joke, I never thought people would take the book seriously. Anyone with half a brain can see that object-oriented programming is counter-intuitive, illogical and inefficient.

    Interviewer: What?

    Stroustrup: And as for 're-useable code' - when did you ever hear of a company re-using its code?

    Interviewer: Well, never, actually, but...

    Stroustrup: There you are then. Mind you, a few tried, in the early days. There was this Oregon company - Mentor Graphi

  • Still good for some things by krygny (Score:2) Tuesday March 18 2003, @08:49AM
  • too hard (Score:5, Funny)

    by PhilipMatarese (609325) on Tuesday March 18 2003, @08:52AM (#5535916)
    Admitting something is too hard is too hard for programmers.

    Now I'll go read the article.
  • SSAX by Anonymous Coward (Score:1) Tuesday March 18 2003, @08:53AM
    • 1 reply beneath your current threshold.
  • by BeerSlurpy (185482) on Tuesday March 18 2003, @08:53AM (#5535925)
    We use XML heavily in a project I'm working on at my company. Some genius decided that everything should be in xml, and that we would use XSLT for a lot of the data manipulation. Naturally we also make heavy use of DTD and SAX. Lots of XML related technologies.

    I can tell you now that XML is a Bad Thing. It strives to excel at too many things at once, and becomes inefficient and complex as a result.

    XML tries to eliminate the step of writing parsers for data, although writing parsers has never been a significant part of application development to begin with. Its rigidity instead forces you to waste time taking the output of the parser (a complex tree) and putting it into meaningful form. XML document tree traversal = 10000x more complex than getting column data out of a ResultSet... Unfortunately it is also a billion times slower to parse XML than it is to perform a medium compexity database query.

    The real problem is that XML only partly addresses the problems that relational database solved years ago (organizing and data accessable), but it does it without any of the efficiency benefits of a well designed database server. In my opinion, 90+ percent of the places where XML is being used today would be better served by using columns in a relational database table to store object fields. You get indexing, you get universal, simple and efficient searching, and you get speed.

    XML has too many faults to really list in one short post. The truth of the matter is that it tries to do too many things and DOESNT DO ANYTHING WELL. Sort of like if someone tries to be skilled in all musical instruments but ends up being, at best, mediocre in a few of them.
  • XML - I don't mind by Sir Runcible Spoon (Score:2) Tuesday March 18 2003, @08:54AM
  • XML is a MARKUP language (Score:4, Insightful)

    by kahei (466208) on Tuesday March 18 2003, @08:54AM (#5535930)
    (http://www.hwacha.net/)
    ...and for doing generic markup in a relatively simple way, it's good.

    For storing arbitrary data, and use as a message format (as in SOAP), it's not so good because it has markup-like features, such as the distinction between attributes and elements and the distinction between text and element nodes. (The latter in particular is a huge pain, I wish people would agree to only use text nodes in leaf elements.)

    This is why XML parsers/generators, once they get into entities and DTDs and so on, become really a lot more complicated than they would need to be if XML just stored a tree of elements.

    However, it's the standard, so we might as well just shut up and use it.

    My opinions have no special importance but it *is* important to remember that XML is a markup format that is being used mostly for things other than markup.

  • Let's ban the use of XML in public. by Boss, Pointy Haired (Score:2) Tuesday March 18 2003, @08:55AM
  • marketing hype/uncritical technophilia detector by moregan (Score:1) Tuesday March 18 2003, @09:00AM
  • similar problem with MathML (Score:5, Insightful)

    by e**(i pi)-1 (462311) on Tuesday March 18 2003, @09:00AM (#5535958)
    (http://www.math.harvard.edu/~knill | Last Journal: Thursday May 29 2003, @08:11PM)
    It might be too late to correct some things in XML.
    Good about XML is, that whatever will emerge in the future,
    it will always be possible to convert old documents into any
    new form, using simple tools.

    There is a point with critics: Unlike Latex or HTML which
    can be written easily by hand, XML can become too bloated to
    be authored directly by humans.

    Similar problem with MathML:

    Latex: $x^5+3x-9=0$

    MathML:

    <mrow>
    <mrow>
    <msup>
    <mi>x</mi>
    <mn>5</mn>
    </msup>
    <mo>+</mo>
    <mrow>
    <mn>3</mn>
    <mo>&InvisibleTimes;</mo>
    <mi>x</mi>
    </mrow>
    <mo>-</mo>
    <mn>9</mn>
    </mrow>
    <mo>=</mo>
    <mn>0</mn>
    </mrow>

    You can write complicated formulas in Latex directly but it is
    almost impossible to do so in MathML, where one has to rely
    on tools to generate it (i.e. export it with Mathematica or
    TeX -> MathML converters). Wouldn't it be nice if browsers
    would understand a basic version of LateX? (That it is possible
    has been shown with IBM's texexplorer plugin).
  • hard to parse .. by ciupman (Score:1) Tuesday March 18 2003, @09:02AM
  • K.I.S.S. by xyote (Score:1) Tuesday March 18 2003, @09:03AM
    • Re:K.I.S.S. by Frans Faase (Score:1) Tuesday March 18 2003, @09:57AM
  • XML is only the beginning by ojQj (Score:1) Tuesday March 18 2003, @09:06AM
  • reusable loop structure by Anonymous Coward (Score:1) Tuesday March 18 2003, @09:09AM
    • 1 reply beneath your current threshold.
  • Eating Your Own DTDogfood by Minix (Score:1) Tuesday March 18 2003, @09:10AM
  • Huh? by Sparky69 (Score:2) Tuesday March 18 2003, @09:13AM
  • XML Problems by withak53 (Score:1) Tuesday March 18 2003, @09:13AM
  • Processing arbitrary data is error-prone.... by MrBandersnatch (Score:2) Tuesday March 18 2003, @09:14AM
  • XML is simple and powerful... by parabyte (Score:2) Tuesday March 18 2003, @09:16AM
  • XML parsing models (Score:4, Informative)

    by HalfFlat (121672) on Tuesday March 18 2003, @09:17AM (#5536060)

    If I understand it correctly, the author is lamenting that neither of the standard ways of parsing XML in a scripting language fit the straightforward model of scanning for something relevant and then acting upon it, where the two models are: 1) read in whole file and make a tree (take sup too much memory, is slow, etc.); or 2) use a callback interface.

    The style of perl script he was seeking was a simple loop model:
    while () {
    next if /ignorable/;
    if (/thing-one/) { ... }
    elsif (/thing-two/) { ... }
    ...
    }

    To me the thing that distinguishes this the most from the provided XML parsing interfaces is that it has a minimal amount of state.

    So isn't what is needed a corresponding structure to the while () above that iterates over the tree-nodes of the XML-encoded data structure, in a depth-first preorder traversal (to avoid having to build the whole tree first)? One could imagine a parser object that scans through the XML file returning nodes (and their parent history) while maintaining an absolute minimum of state. If one wanted to build an in-memory representation of a subtree given a node, then one can always do so when one finds the node one wants.

    Such an interface wouldn't be good for integrity verification or the like, but for the sort of application the author was talking about, it would seem ideal. Much less flexible than the normal models, sure, but much easier to work with when the problem fits this sort of description. Perhaps I'm underestimating the difficulty of the task, but it doesn't sound too hard to write, given that it is doing so much less than the fully-featured XML parsing interfaces.

    The other problem is the awkwardness of the use of XML in O-O languages such as addressed in the article [fawcette.com] linked-to by Tim Bray in his article. Though I haven't used this particular program, this seems to be the problem that FleXML [sourceforge.net] is trying to address. When you don't need all of the flexibility that XML can provide, but instead have a fixed schema that your XML-representation follows, why not have your parser automatically built to read it? People have used lex/flex for scanning text files for decades --- in these days of XML Schema, it should be even easier. If FleXML lives up to its promise, it will be. Has anyone here used FleXML and are willing to comment on how well it addresses these sorts of problems?

  • Proposed interface to satisfy his requirements by Gollum (Score:1) Tuesday March 18 2003, @09:19AM
  • This guy must be dum then! by Idimmu Xul (Score:2) Tuesday March 18 2003, @09:20AM
  • C#?!? by ultrabot (Score:2) Tuesday March 18 2003, @09:22AM
  • What XML can't do.... by twoslice (Score:2) Tuesday March 18 2003, @09:31AM
  • WSDL - ugh! by TheSync (Score:2) Tuesday March 18 2003, @09:36AM
  • WIMPS by Bill, Shooter of Bul (Score:1) Tuesday March 18 2003, @09:37AM
  • by cdthompso1 (648972) on Tuesday March 18 2003, @09:37AM (#5536197)
    (http://www.agilegraphics.com/)
    Tim Bray's article, if you didn't read it, is right on the money. The last paragraph basically states that XML is the best alternative to the data interchange problem because it provides a consistent format. Some of you guys who are rounding up the mob and lighting buildings on fire calling for book burnings and the downfall of all XML have to read the article! You're not in agreement with Tim when you say, "Sure, I think XML sucks, too."

    So to be clear, XML is here to stay. (An example of XML penetration: there is a working schema for using XML in the farming industry [agxml.org]!) Just imagine the chaos that will insue once MS Office saves all documents in true XML.

    My take on the problem Tim's really talking about: inconsistency and the proliferation of people who want to be the next prodigy in their area of expertise. There are so many parsers and interfaces, even within a language domain, because vendors want to put their own spin on everything. The alphabet soup that results confuses the hell out of people. This has even happened in the open source world, where I can do a Google search on "php xml parsing" and read articles on no less than 10 different approaches. For the average guy who has been told by a project manager, "We need to take these XML files from our business partner, extract and store the data in our database," you need a standard approach. Not to stifle thought and innovation, yes, you should take the initiative to understand whether an event-driven approach (SAX parser) or an in-memory object model approach (DOM parser) is right for the job. After all, you do get paid to do this, so earn your keep! But the XML community hasn't done a good job of specifying best practices and leading people by the nose to a solution. Every XML book I've seen furthers the confusion, with each other offering his opinion with a slight variation of how to do things, leading programmers/scripters/whatevers to use the approach they most recently read about, and not necessarily the one that time has proven out to be the most efficient.

    Part of this is the divide between the .Net guys, the Java camp, the Perl/PHP folks, etc., but in the spirit of interoperability, maybe the XML promoters just need to dumb things down a bit to get some simple concepts and best practices into the hands of Joe Sixpack Programmer. Maybe a central authority, a la java.sun.com or php.net?

  • An idea... by Fnkmaster (Score:2) Tuesday March 18 2003, @09:42AM
  • XML is bad like Democracy is bad. It's just better than the alternatives.

    I had a problem at work when we switched from AutoCAD to Solidworks. Our manufacturing software couldn't read the new BOM files, which were Excel's .xls. Without ever looking at our system's BOM files before I wrote a program that read the .xls and built a proper XML BOM file our system could read. If our system wasn't using XML, who knows how long it would have taken me to figure out the intricacies of a proprietary file format.
  • Ease of SAX by porter235 (Score:1) Tuesday March 18 2003, @10:04AM
  • XPath makes XML bearable.... by tcopeland (Score:1) Tuesday March 18 2003, @10:11AM
  • RFC822 by semanticgap (Score:2) Tuesday March 18 2003, @10:16AM
  • XPATH by boatboy (Score:1) Tuesday March 18 2003, @10:17AM
  • not hard... just slow. by AssFace (Score:1) Tuesday March 18 2003, @10:26AM
  • REXML by leoboiko (Score:2) Tuesday March 18 2003, @10:27AM
  • Since when is XML a "programming language" by callipygian-showsyst (Score:2) Tuesday March 18 2003, @10:27AM
  • Maybe it's just me... by Gibble (Score:1) Tuesday March 18 2003, @10:30AM
  • Python iterators solve this problem by Internet Dog (Score:1) Tuesday March 18 2003, @10:43AM
  • Why not abreviate the element close tag? by Zaiff Urgulbunger (Score:1) Tuesday March 18 2003, @11:11AM
  • XML in PHP by horza (Score:2) Tuesday March 18 2003, @11:13AM
  • @#$%! XML by QuackQuack (Score:1) Tuesday March 18 2003, @11:16AM
  • Replace fgets by Spazmania (Score:2) Tuesday March 18 2003, @11:26AM
  • XML in a (very tiny) nutshell by tupshin (Score:1) Tuesday March 18 2003, @11:26AM
  • XML Technology by hackus (Score:2) Tuesday March 18 2003, @11:43AM
  • Java XML Parsing (Score:3, Interesting)

    by SurfTheWorld (162247) on Tuesday March 18 2003, @11:46AM (#5537225)
    (http://www.wxnet.org/ | Last Journal: Thursday November 18 2004, @08:24PM)
    Let's decompose the XML parsing "problem" (if one actually exists) into smaller components that we can reasonably discuss. XML parsing is too broad a topic to intelligently discuss, but if you limit it to XML parsing in Java you suddenly have a topic small enough to be manageable. So let's discuss Java parsing in XML.

    When XML was first introduced, there were no standard libraries in the JDK to facilitate parsing. What's more, the few projects out there varied wildly in how you actually used their DOM tree or SAX callback mechanism. This isn't necessarily a Bad Thing (tm), it's the same problem every emerging technology faces: immature tools. This is basic biology - lots of competing implementations (life forms), each struggling for community (resources).

    So, time goes by, and eventually a handful of implementations emerge dominant. Some dominate due to performance, and some dominate because of ease of use of the API. The victors in this game then sometimes go through a merging process of their own, where the performance victors lend technology to ease of use API victors. After a lot of merging (and flames usually), one or two projects emerge out of the XML kingdom as the dominant players. In my opinion, in the world of Java these are Xalan (Xerces) and Dom4J.

    During the maturation process, Sun comes along and looks at the technology and says "Wow this XML stuff is really here to stay. What implementations are out there, and what similarities exist between them? How can we facilitate growth of these projects?" They realize that certain classes (like org.xml.sax.InputSource) are common entities in both projects (even if the class InputSource doesn't exist), and they standardize it. For a reference to all of the XML standards implemented in the JDK, do a search on java.sun.com for JAXP, JAXM, and JAXB (just to name a few).

    At this point, the XML projects come back and work in support so that they can be "JAXP compatible" (again this is part of the biological process of evolution). This insures that the projects works well with whatever Sun ships in the JDK.

    In the end (which is really where we are now) you end up with a pluggable architecture, where the JDK provides some common functionality or interfaces that are implemented by open source projects.

    Java XML parsing was damn hard back in the day - you had to marry your code to a specific project. But these days with the standardization that has taken place (thanks Sun!), as long as you write code that makes use of the JAXP specification you can plug in any JAXP-compliant parser into your app and things *should* work.

    The difficult problem is getting other entities (Application Servers for example) to get up-to-date with the standards. WebLogic 6.1 comes with a non-JAXP compliant parser, and thus doesn't work with the latest JDK, Xalan, etc.

  • XML is a bad database by crustBro (Score:1) Tuesday March 18 2003, @11:56AM
  • Complex solution to a complex problem by 3247 (Score:2) Tuesday March 18 2003, @12:04PM
  • There is an alternative... by oren (Score:2) Tuesday March 18 2003, @12:13PM
  • XML is hard by mugnyte (Score:2) Tuesday March 18 2003, @12:25PM
  • XML::Twig by amoe (Score:1) Tuesday March 18 2003, @12:47PM
  • YAML is one by whytheluckystiff (Score:1) Tuesday March 18 2003, @12:52PM
    • Re:YAML is one by Colonel Panic (Score:2) Tuesday March 18 2003, @01:55PM
  • GREAT! by Pope Raymond Lama (Score:1) Tuesday March 18 2003, @01:00PM
    • 1 reply beneath your current threshold.
  • Newline delimited text is bad too! by gammoth (Score:1) Tuesday March 18 2003, @01:04PM
  • Document format by mivok (Score:1) Tuesday March 18 2003, @01:09PM
  • Boring.... by shadowpuppy (Score:1) Tuesday March 18 2003, @01:10PM
  • Hooray! by tedgyz (Score:1) Tuesday March 18 2003, @01:13PM
    • 1 reply beneath your current threshold.
  • Java Properties is a nice alternative by tedgyz (Score:1) Tuesday March 18 2003, @01:15PM
  • JAXB to the rescue! by hieronymouSteve (Score:1) Tuesday March 18 2003, @01:17PM
  • XML processing is hard, but XSLT is lot harder! by tungwaiyip (Score:1) Tuesday March 18 2003, @01:37PM
  • What about YAML? by Colonel Panic (Score:2) Tuesday March 18 2003, @01:59PM
  • what about a flat perl markup language? by tr!p!x (Score:1) Tuesday March 18 2003, @02:10PM
  • What's the big deal by fanatic (Score:2) Tuesday March 18 2003, @02:21PM
  • Not hard just misused by awol (Score:1) Tuesday March 18 2003, @02:52PM
  • Push/pull low-memory hybrid - plug for my XMLIO by Stele (Score:1) Tuesday March 18 2003, @03:52PM
  • My project is using XML. by sirrube (Score:1) Tuesday March 18 2003, @04:43PM
  • Re:xml (Score:5, Informative)

    by Pyromage (19360) on Tuesday March 18 2003, @08:23AM (#5535757)
    (http://slashdot.org/)
    XML isn't intended for web pages. That's what you missed:

    It's biggest use right now is data interchange. Moving bits between one magic widget and another. And for that, HTML sucks. It just can't represent arbitrary data. Programming languages (C++, Java) are for instructions, not data.

    XML fits in perfectly where it's at use-wise. Tim Bray is talking about programming for it: The available interfaces are very counter-intuitive, and that's what Bray's getting at.
    [ Parent ]
    • Re:xml by Covener (Score:1) Tuesday March 18 2003, @08:33AM
    • Re:xml by expro (Score:2) Tuesday March 18 2003, @09:39AM
    • Re:xml by frisket (Score:2) Tuesday March 18 2003, @09:49AM
    • 1 reply beneath your current threshold.
  • Re:xml (Score:3, Informative)

    When you're writing an application and you have to decide what format messages should be written in, or what type of file configuration data should be stored in, most people say, "Why, XML, of course. That way we're guarenteed that it is extensible, transformable, and readable by anyone who would ever need to read it." Granted, there are lots of other document formats in which that is the case, but they are not industry standard. As long as there is a schema, everyone will accept it. And if it's not in the format that they would like, they are free to run it through an XSL transformation. Easy as pie.

    XML is not hard, but it is a discipline. It requires a lot of reading and a fair amount of practice, but once you have it down, that's it. And from now on, your document storage design decisions (barring any space/memory constraints) are made for you.
    [ Parent ]
  • Re:xml by BFKrew (Score:2) Tuesday March 18 2003, @08:29AM
    • Re:xml by SoupIsGoodFood_42 (Score:2) Tuesday March 18 2003, @06:52PM
    • 1 reply beneath your current threshold.
  • Re:xml (Score:3, Insightful)

    by Uller-RM (65231) on Tuesday March 18 2003, @08:31AM (#5535795)
    (http://www.poweredbyg.nu/)
    Since you apparently know nothing about XML, try reading the article. You'll learn something new, and you won't have to talk out your ass on this topic.

    XML's not a language -- it's a grammar, a guide of sorts, for hierarchical data storage. You design file formats that conform to XML. The goal is that it's easy to read that file format in any language or platform (given a XML processor/parser for that platform), since your data is stored in plain human-readable UTF8-encoded text.

    Might as well poke fun at the rest of your idiocy -- as it happens, HTML 4 is pretty close to being XML-conformant, and the W3C's now pushing XHTML which is fully conformant.

    Granted, a lot of people treat XML as another buzzword, the way that OOP once was. It's not a magic bullet -- it's just a guide to making cross-platform file formats, and it works pretty well for that.
    [ Parent ]
    • Re:xml by FireAtWill (Score:2) Tuesday March 18 2003, @09:17AM
      • Re:xml by Uller-RM (Score:2) Tuesday March 18 2003, @09:33AM
  • Re: of course there is!! by borgdows (Score:1) Tuesday March 18 2003, @08:37AM
  • Re:of course there is! (sorry for the prev post) by borgdows (Score:1) Tuesday March 18 2003, @08:39AM
    • by borgdows (599861) on Tuesday March 18 2003, @08:42AM (#5535858)
      arggh!!! fuck'in XML tags!! lol

      <?xml version="1.0" encoding="bork">
      <troll>
      <sovietrussiathing>In SOVIET RUSSIA, XML standardizes YOU!!</sovietrussiathing>
      <offtopic>Let's bomb the french!</offtopic>
      <flamebait>Anyway, XML is for loosers!</flamebait>
      </troll>
      [ Parent ]
  • Re:xml by mystery_boy_x (Score:1) Tuesday March 18 2003, @08:41AM
    • Re:xml by cicho (Score:2) Tuesday March 18 2003, @10:46PM
  • Re:Let me get this straight... by Saddam Hussein (Score:1) Tuesday March 18 2003, @08:41AM
  • WTF? (Score:5, Informative)

    XML isn't a replacement for Java or C++. Neither is HTML. You're looking at three seperate areas there.
    HTML is a page description language.
    C++ and Java are data processing languages.
    XML is a data description language.

    You can certainly describe a page using XML, and I see no reason why you couldn't construct a programming language using XML syntax, but how on earth are you going to store data in C++ or Java?
    [ Parent ]
    • Re:WTF? by _xeno_ (Score:2) Tuesday March 18 2003, @03:04PM
    • 2 replies beneath your current threshold.
  • 30 replies beneath your current threshold.
(1) | 2