Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Alternatives To .DOC As Standard WP Format? 205

D. C. Sessions asks: "I'm on the Software Task Group of a standards body (JEDEC) which is, among other things, responsible for the DDR memory standard. You may have heard of it. Currently standards drafts must be submitted in an editable word processing format, which right now is interpreted as FrameMaker or MS Word. I find not only offensive, but dangerous that these standards -should- outlive the current MS software that can manipulate them. I've gotten some sympathy on 'bit rot' from the rest of the committee based on showing what current flavors of Word do to documents saved with older versions, but the problem is this: What do I propose as a replacement?" Two that come to mind right off of the top of my head are LaTeX and, of course, HTML. Any other formats that can work just as well as .DOC in most situations and are cross-platform to boot?

"It should (obviously) be an open file format, preferably with an open source tool to access it. It absolutely must be usable on LoseBlows, should be usable on Mac, and (for my own sake) on Linux and Solaris. It must be capable of structured documentation, numbering, tables, and embedded vector graphics. I just don't know of such a beast at present."

This discussion has been archived. No new comments can be posted.

Alternatives to .DOC As Standard WP Format?

Comments Filter:
  • In the end, I have found that the only cross platform useful document formats are TeX, RTF, and SGML/XML DTD's. However, your mileage will vary no matter what (each editor seems to save RTF in its own way and you really cant have StyleSheets that cross over.)

    To put it bluntly.. other than ASCII (7bit) you are going to lose some people, mangle documents, and just about have someone complain about not being able to see stuff. That sadly is the cold hard facts of life..

    I have found that Microsoft found out that the desktop killer app was word processing and killed anyone else in it :)

    Happy holidays
  • PDF is better. Smaller, better font embedding, better cross platform support, incremental download/display for web download etc.

    But, much as we hackers love to edit postscript by hand, normal people don't. The original message said editable

  • by Eloquence ( 144160 ) on Saturday December 23, 2000 @09:03AM (#541784)
    • HTML print results are unpredictable, formulas are hard to layout, and page design is impossible.
    • LaTeX is bad at handling images, and there are no easy editors for the Windows platform.
    • RTF has been killed by Microsoft with dozens of different implementations. (Some of them omit important things like footnotes.)
    • SDW (Star Office) is just as proprietary as Microsoft's DOC, but supported by fewer platforms.
    • PDF is a print format, text extraction is more difficult, and it's bad for PDAs.
    • TXT is insufficient for most tasks.

    XML may be a way out, but there's no XML-based document format on the horizon. (I don't know about this Open E-Book stuff, though.) All in all, the OSS community has failed to provide an open, flexible document format that could compete with MS Word. I'm as unhappy with that as you are, but if you want to change it, all word processor developers must get together and formulate a standard. Is this ever going to happen? Note that most closed-source word processors want to bind their users to their product by using a proprietary, closed format.

    --

  • I think a bunch of you are missing the point. We shouldn't want to choose an office suite and make everyone use that. Having StarOffice's document format as the "standard" is just as bad as using MS's .doc. What we need is a file format that all the suites can use. This community's software is more about freedom --choices-- than anything else, so I don't really think we should be attempting to limit ourselves to one particular application to do all our word processing.
  • What's the point, Microsoft will just come out with a new version

    Too right! I used to try too keep up with .doc 2 .txt (or rtf or html) translators, but I've just given up. Now if I receive a .doc attachement, the sender gets an angry "This is a non-Microsoft site, please resend you mail in a readible format." To one persistent re-offender I sent as an attachment the output of cat /dev/urandom > urgent.doc.

  • Postscript is a spectacular presentation language for the final product, but it isn't much for editing. This guy wants something for "living" documents that people are going to have to edit...
  • LaTeX is awesome, and should run on all platforms. Of course it is not your mom's format, but what is more powerful then TeX? Easier then HTML, renders the same on all machines, and automates much of the problems associated with larger projects. With a brilliant WYSIWYG editor for LaTex/TeX, even your mother could use it to make dazzling slides, books, research papers, or grocery lists! Of course, you could use Nedit, Vim, or even (stinky) Emacs. The perfect tool, if only more would use it!
  • What I noticed about USB was that it emerged from a rather obscure ghetto after Windows 98 came out.

    All my Pentium motherboards have headers on them for USB, and that means motherboards from long before the imac came out. They're easy to plug a cable/connector onto, and voila! USB on all my PC systems.

    The zeal of Mac users continues to amaze us.
  • by pen ( 7191 )
    In case you're wondering, DocBook is here [oasis-open.org]. Or you can read the text only [oasis-open.org] version.

    --

  • I seem to remember that MS developed RTF as a way to exchanging documents between Macs & PCs. As the original poster stated, MS has changed RTF quite a bit over the years, usually to follow the changes that they've made to Word. But at least the changes have been documented and are available on the web. A quick search with google will turn up several of the RTF specs. Most word processors that I know of will support RTF and there is at least one open source word processor (Ted [nllgg.nl]) that uses RTF exclusively. I've used it and it's pretty good.

  • i've asked this so many times in threads like this, but i always seem to get in too late to get any responses. i'd like to ask once again what's wrong with PDF documents?

    it's my understanding the PDF is an open format. in fact, i've even heard that part of the reason why Apple used DisplayPDF in MacOS X is because they would have had to license Postscript from Adobe while PDF was royalty-free. if this is the case, why is it that opensource advocates hail Postscript, but denounce PDF?

    i know that PDF is the format when you want to ensure that pages are printed correctly. that being said, they're still able to store text-content, they're compressed so they're a resonable size, and they're cross-platform (lots of programs can read PDFs these days, not just Acrobat)

    now for the topic at hand, i understand that standards definitions would be best presented in a format that doesn't waste so much space on presentation: content is what should matter, which is why a Framemaker file format or XML might be best. but for casual documents, why don't we use PDF? it's surely a lot better than DOC.

    so i'm asking: what's wrong with PDF? why can't more programs write to PDF as an export option? why can't more programs read PDFs for editing? am i missing something here?

    please, somebody knowledgeable: enlighten me.

    - j

  • I like RTF too - but did Microsoft really author it? I recall first seeing RTF around 1991 as an internet-centric file format. MS seemed to push the .DOC format over .RTF, and I thought that was because .DOC was spawn of MS, while RTF was not. I could be wrong.
  • Latex is a great program -- but latex is for typesetting not for word processing ... its true most latex users combine the two distinct phases into one messy process ...

    There is however a windows tex word processor who's name escapes me -- but it reads / writes tex files as its native format and allows you to write latex files in a interactive format, which IMHO is alot better then the "edit and compile" paradigm ...

    This would really be the best of both worlds ... the unix heads can have their programming language latex, and the windows-bred secretaries can have a program that they can work with as well ...

  • IMHO, most of the responses here are galactically missing the point. The question is (or, at least, should be) "What is a format we can move to that everyone can read?" and not "What text editor and markup language should we force everyone to start writing in?"

    All the posts arguing for TeX, DocBook, XML, Star Office or Pathetic Writer are forgetting that a group that demands submissions in .doc format is obviously receiving them from people using Word and turning them over to other people using Word. Forcing everyone to use LaTeX or XML (or to write LaTeX or XML in Word) is a guarantee that the whole thing will grind to a halt.

    HTML is an option; XML is not until Microsoft adds it as a "Save As" option.

  • by Anonymous Coward
    Yes, XML appears to be the most viable alternative; and if I recall correctly Linus Torvalds suggested two years ago that it should be the native format for any Linux word-processor. However, since then KWord and others have come out, and I still haven't seen any attempts to support it... :(

    In my own little start-up project, I am in desperate need of an Excel-file format replacement. I am contemplating over XML, but besides being a lousy programmer I am even worse at reading specs... :(

    But anyhow seems to be coming. My work will abandon development of Java GUIs (on top of C++ programs) in favor of XML GUI! That way any program can be called from within browsers, without the need for platform specific virtual machines!

    My suggestion is: go for XML!
  • The problem with picking a standard is that file formats change over time. Any WP format more complicated than an ASCII text file has gone through numerous invokations as features are added/fixed through various revisions of software. Short of comming up with a new standard, you'll have to make compromises somewhere, be it in portability, reliance on a propriatary package or capability.

    Perhaps it's time to actually have some standards body define a standard format for word processing, that presents an acceptable minimum of functionality. The cries of XML! XML!, while partially missing the point, as XML itself isn't up to the task, might be a start, since, at least in theory, an XML-based format would be both extensible and maintain backwards compatability, and have the added bennefit of being relatively easy to write implementations of.

    Of course, opinions ( ie (_)*(_)s )what exactly constitutes the minimum acceptable functionality may vary but, as you know, committees are good at making sure that nobody is any happier with the results than anybody else.
    Why doesn't such a standard exist already? Simple, no company wants to write a stable, open spec. for a minimal document, and if somebody were to attempt to do so other companies would not likely give it acceptance. This is why some committee, be it an organization, such as ISO, or a group of 'interested parties' agreeing to work together would be the best situation. This is probably outside of JEDEC's charter, but y'all may be able to pass the suggestion onto the appropriate parties.

    So, in closing, such a spec would need to be:
    • simple
    • powerful
    • open
    • extensible
    • ensure backwards compatability
    • ... and most importantly set in stone
    As you mentioned in your write-up, RTF fit most of these, but under the sole control of MicroSoft it kept changing. Perhaps, with their blessing, RTF would be a good basis for the new spec, as long as you can solidify the format, and keep it fixed for a length of time, ensuring backwards compatability between revisions.

  • The Internet Engineering Task Force [ietf.org] (IETF) publishes all its standards (the RFCs [rfc-editor.org]) for the Internet in American Standard Code for Information Interchange (ASCII). You can also submit the document in PostScript, but the ASCII is the primary reference.

    ASCII is searchable, printable, indexable, and forward compatible essentially forevermore. Everyone can display it correctly, anywhere. There is no better format for any kind of International standard. The IETF debates the question of superceding ASCII as the standard format about every other year, but we've yet to identify any other format that has ASCII's advantages.

    HTML might one day replace ASCII in this capacity, but it needs to be stable for longer than it has been, and the HTML generators out there never generate correct HTML (ever looked at web pages in iCab [www.icab.de]? It has a built-in syntax checker, and even slashdot comes up with errors, all the time). Until those problems are fixed...

  • Well LaTeX files can compile into nice looking .pdf files which are viewable on any platform, plus they look exactly the same on every platform. Postscript also prints out very nicely and can be handled by just about any printer and platform. LaTeX is all I use for all my papers and documents I need to write.

    There's also XML. I'm not sure how portable and consistant documents look using XML but it's supposed to be a portable document format.

  • Of course PDF is the solution to transmitting finished documents, but we're talking editable documents here.

    Unless today's word processors can load & save PDF as if it were their native format, I don't see PDF as a solution here.

  • One of the things that the Sun/StarOffice project is doing, is to create this "OpenOffice" set of standards, with the current StarOffice codebase being the reference implementation. The set of standard document formats you wish for is one of their specific goals. Formats rich enough to handle the needs of business documents, but open enough to be vendor-neutral. Initially, the OpenOffice formats will be implemented by both StarOffice (the open source office suite) and StarPortal (Sun's upcoming online version).

    I think this would be a good place to start. To make it even more buzzword-compliant, the OpenOffice formats will be XML-based. I'd be very happy to see the OpenOffice formats adopted not only by Star/Sun, but also by Abi, Gnumeric, K-office... who knows, maybe someone could even write a plug-in for MS Office to load and save documents in OpenOffice format. (If it's successful enough, MS will eventually have to do it themselves.)
    --
  • Just FYI, Word 2000 document format is backwards compatable with 97. And yes, it runs under Win95.

    Microsoft makes this claim with each new generation of its office products. It has always turned out to be a lie [slashdot.org].
    --

  • correct link [slashdot.org]
    --
  • A lot of the posts I see in this thread talk about XML, etc. You have to keep in mind that not everyone knows how, or wants to learn how to code just to write up a damn document!! I know I like doing some HTML / XML / ETC, but for your average joe that just wants to throw together a document, resume, or whatnot, anything other than Click Here, Format this with pretty fonts, and all that jazz is just too damn difficult!
  • TeX (and the LaTeX frontend) runs on about as many platforms as linux.
    Actually, TeX and LaTeX run on far more platforms than Linux ...
    many people think the learning curve is high, but this isn't necessarly so.
    Trying to make the masses learn TeX or LaTeX is a big mistake -- they'll just go back to Word. The trick would be to write a WYSIWYG word processor that saves documents as TeX. There's already a few out there, but they're not ready to take on Word yet.

    TeX is a good idea. XML is probably better, and far more likely to actually happen. Of course, there's a zillion different ways that a paper could be stored in XML, so XML alone isn't the magic bullet. But it's a good start.

  • Don't use HTML, at least use XHTML making sure that you segregate style from content.

    Whats wrong with HTML? I think it should be used as a standard for document interchange. In fact, guess what! It already is! The prejudics you have against HTML seems to be based on some sort of beutiful idea of stylistic perfection. Well, I don't give a shit about that - HTML is here, its now, and it can be read by loads of apps and its an open standard. HTML is the solution to your portable document problems.You're reading this alright, aren't you?

    KTB:Lover, Poet, Artiste, Aesthete, Programmer.

  • Specify .doc AS the standard .... and start a standardization process on it .... take it out of M$'s hands so that it becomes a non-issue
  • In a market place, it doesn't work. You do not, in fact, have a "solid piece of wood in your hands" -- if you don't give people what they want and what they believe to be most useful, they will go elsewhere.
    As is often the case, what people believe they want and what they really want are two different things. (Consult any Zen master, or software requirements analyst, for further enlightenment.) It can take some forceful education and interrogation to get people to realize this and tell you what they truly want and need.

    People working or large documents need tools and formats that focus on document structure. A bunch of very smart people looked deeply into the problem years ago and came up with the idea of markup languages.

    If you want to displace .doc as a standard, you have to be willing to give people the tools they want to use, and not the tools you think they should use.
    Actually .doc got to be a de-facto "standard" exactly because managers gave their employees the tool the manager thought they should use.

    Tom Swiss | the infamous tms | http://www.infamous.net/

  • None of the alternatives to .DOC solves the problem.
    • HTML seems reasonable, but HTML documents are collections of files, not a single file, which makes moving them around a problem. There's also no entity-oriented graphics system (a "draw program") for HTML. (Well, there's Flash, but that's overkill for simple diagrams.)
    • TeX and its derivatives are too programmer-oriented and don't handle images, diagrams, or tabular material easily.
    • DocBook is text only.
    • StarOffice has the right feature set, but doesn't have enough market share. That may change. (I was a StarOffice fan for a while, but the early version sucked, and I had to give in and buy Word 97.)

    There's a window to do something about this right now, because Microsoft is tightening the screws on Office 2000 pricing. The amount of money companies have to spend on Microsoft Office is about to increase substantially. Technically, documenting the StarOffice format very well and encouraging other efforts to use it would be a good way to get started on the problem. From a business perspective, VA Linux or Red Hat ought to try pushing a desktop distribution that takes one install and provides the tools needed by, say, 70% of office workers. (Hint: support the first few companies that try this in a big way, to find out what's needed.)

  • Openoffice is seeking to address this. This may be of no consequence for someone needing a solution to today, but I thought I'd mention this.

    the link is xml.openoffice.org [openoffice.org]. Draft formats are available for download, and you can follow the development on the mailing list there as well.

  • by QuoteMstr ( 55051 ) <dan.colascione@gmail.com> on Saturday December 23, 2000 @08:46AM (#541819)
    Staroffice has all the features you describe, and is portable.
  • by martin-k ( 99343 ) on Saturday December 23, 2000 @08:47AM (#541822) Homepage
    RTF (Rich Text Format) could have been a sensible cross-platform, cross-application solution, were it not for Microsoft continually "enhancing" this format by globbing on new features in uncoordinated ways.

    -Martin

  • Did XML kidnap your cat?

    Nope, just trying to clear up some issues.

    I think it's safe to assume that defining a DTD was implied. It's simply easier to say "Use XML" than to say "Write a good DTD to use with XML"

    I don't think it was implied. It was mentioned casually. But that wasn't my point. Choosing to use XML is like choose either a binary or text document format. Just saying "use XML" doesn't mean a whole lot. The format itself is really the DTD that's used. Whether or not writing a good DTD was implied, it is certainly a whole lot more complicated than the poster was making it out to be. XML is no magic wand.

    How could it possibly be device-dependent? This is just text, we're talking about.

    It's waaay too easy to make things device-dependant. For instance, think about printing a modern, full-featured HTML page. It is a device-dependant language; it's meant to work within a browser, of a certain size rage, with a certain colour depth, etc., etc.. It will look great in your browser, but it doesn't lend well to printing. So you have to choose your language/DTD carefully.

    Easy rendering has nothing to do with the XML DTD or document, that's the responsibility of the XSL that would accompany it, or the application that parses the document.

    Okay, sorry. So, if you want to use XML, you'll need a good DTD, *and* a good XSL(or a good application). :)

    Easy editing is pretty straightforward. Just edit it. This goes along with comprehensive. A good DTD can be comprehensive, but it can also leave room for extension without breaking that document. It is, after all, the extensible markup language.

    Now, I'm only going to argue semantics on this one. "Easy" is subjective. You're right, it's easy to look into the document and edit it, but that doesn't make editing easy. I can easily look into a MS Word document and edit it. That doesn't mean I'll do anything useful, nor does it mean it'll be fast.

    I wouldn't say XML without a DTD is useless, but I will say XML without a DTD is silly. It's a simple, logical assumption that if you're writing XML documents, they should have a DTD, so you know what's allowed. Like I said before, it seems like this would be implied.

    Well, you obviously know what you're talking about :) The reason why I replied to that post was that while it might be implied to you and me, it might not be implied to everything. The tone of that post struck me as, "Use XML - it's easy and simple," whereas using XML is not necessarily so simple nor easy. Lots of work to be done if you'll be writing your own DTD, and lots of learning to do if you don't.

    Thanks for the reply, though :)

    Dave

    Barclay family motto:
    Aut agere aut mori.
    (Either action or death.)
  • by tobyjaffey ( 132850 ) on Saturday December 23, 2000 @08:50AM (#541824)

    Use a nice SGML/XML application like DocBook. Tools for manipulation are free, anyone can write DocBook, with or without specialist tools (it looks a lot like HTML to the layman).

    Don't use HTML, at least use XHTML making sure that you segregate style from content. If you must use HTML, use stylesheets so that formatting is consistent.

    But, my recommendation would be to use DocBook (SGML) and use stylesheets and nice free parsers to output TeX, ASCII, RTF, HTML and whatever else people want.

  • Anything with source that is plain text (HTML, SGML, XML, RTF) and that is based on a published standard should be the requirement. That guarantees too things. The first is that there will be tools in the future that can read it even if the format itself is abandoned at some point by all of its users. The second is that it is documented in a publically accessible way for the whole world to see.

    TeX doesn't meet that second requirement as much as I love it.
  • I apologize, and you're right :) The poster didn't mention that a good DTD would need to be written(a lot of work, I might add), and I didn't mention that a good set of XSLs would need to be written :)

    Barclay family motto:
    Aut agere aut mori.
    (Either action or death.)
  • I just happened to buy StarOffice 5.2 for $40 two days ago. Then I went to work yesterday to discover that the company documents were now in Word 2000 format. I still have a Window 95 box at work for MS Outlook and Word. No one was sure if Word2K would run on Win95, so that meant I would have to a) "upgrade to Win98 or Win2k, and b) up[grade to a new machine. So I installed StarOffice instead, which supports word2k format. I installed it on Win95, Linux, and Solaris. I can even use it from my FreeBSD box as an X application on Solaris.

    Try that with Word [97, 2k, 2.001k, etc etc).
  • Watch your tone. We're having a good discussion here. Of course, I'm assuming that you're posting anonymously because you don't like cookies - not because you're a troll.

    The poster didn't answer the question that had been asked very well. They talked about XML as a good thing, but they didn't talk about the bad things(which you must know about when trying to make an informed decision). I was just trying to clear the issue up a bit. The bad things about XML being that you've got to write a good DTD, and good XSLs, etc., etc..

    Dave

    Barclay family motto:
    Aut agere aut mori.
    (Either action or death.)
  • Microsoft documents the .DOC format, probably as well as they understand it themselves, and there are reasonably good converters/readers for it, some even open source (OpenOffice contains one of them).

    The problem with .DOC is the typical Microsoft problem: Microsoft beats other people to market by "just getting the job done". They hack up what needs to be done, if it works 90% it's OK, and maybe they document it later. They are even proud of that and seem to think it's the right way to go because they actually beat everybody else to market; let's hope the customers will wake up to this and stop buying.

    The latest .DOC format is supposedly XML (with embedded binary). That will help somewhat, in that it will at least make the text and other important information accessible without a complex OLE infrastructure. But to take full advantage of the information found in .DOC will still not be possible. The .DOC format often contains scripting and all sorts of other extensions, and the actual semantics of those can depend heavily on the environment or a buried deep inside some MS Word module.

    Note that inside Microsoft, there now seems to be another approach, NetDocs [cnet.com]. If it delivers what it claims to, a fully XML-based standards-compliant, end-user document and information management system, I have my doubts that that will catch on--it is way out of character for Microsoft.

  • I think that tex would be the best format as such. I would rather see it be tex than HTML, and certainly rather see it be tex than doc. You don't want to give a company control over the format, especially not for a hardware setup. DOC is WAYYY to volatile. It also seems a bit bulky. As for HTML, also kind of bulky, good at what it's used for, but certainly not a replacement for tex.

    BTW, I wouldn't think of it as a replacement for the DOC format, I would think of it as doing things right from the start. Doc is good for what it does, but what I think you are describing is MUCH more suited to tex.

  • Agreed. LaTeX is bound to be a frequently suggested alternative, but IMHO it's the wrong answer: XML has been designed for exactly this purpose, and it fits your needs perfectly.

    XML can easily (dare I suggest it, trivially) be transformed into XML documents - in fact, this is the approach my current employers take for a number of types of business documents - XML is the format for representing the data, and LaTeX or HTML or whatever can become the format used for making this available to the user - XSL transformations allow us to take a language-independent set of data and translate it into a document in a suitable format.

    If you want true independence from propietrary data formats (and open source applications can have data formats that are just as restrictive as closed source applications to most users) then XML is the only real choice right now - a well defined XML document should be readable even *without* a parser, and with a well-defined DTD and a series of appropriate XSL files, you can select your own viewer application. What could possibly be better? Certainly not Word, StarOffice, LaTeX or any of the other competitors in this arena.
  • Both LaTeX and HTML suffer from the fact that they are evolving standards. You would have to also pick a version, and face the fact there might come a day when there is a loss of backwards compatability.

    I think the best idea is something that is extra simple, and unlikely to change in the future; that is ASCII.

  • Sorry, but as an old TeX-head, I can tell you with satisfaction that writing one document in it does not make you literate in its varied commands. TeX does have a ridiculously high learning curve, and added to which it is only a display markup language - it does nothing to infer context and meaning in the content itself, which we've all learned by now is something you want to preserve.
  • Now, I'm only going to argue semantics on this one. "Easy" is subjective. You're right, it's easy to look into the document and edit it, but that doesn't make editing easy. I can easily look into a MS Word document and edit it. That doesn't mean I'll do anything useful, nor does it mean it'll be fast.

    Granted. I imagine things will get easier as editors become more widespread that are geared towards editing XML documents. The editor could make sure you stay within the DTD, speed up the writing time involved.. Until then it's all being done by hand.

    Well, you obviously know what you're talking about :) The reason why I replied to that post was that while it might be implied to you and me, it might not be implied to everything. The tone of that post struck me as, "Use XML - it's easy and simple," whereas using XML is not necessarily so simple nor easy. Lots of work to be done if you'll be writing your own DTD, and lots of learning to do if you don't.

    Yeah, looking over the original post again, he probably should've been more clear. It sounds like he's been using XML for awhile, and forgot about the issues involved in actually learning it. :) Fortunately most of the work is initial stuff.

    Been fun. :)
  • My reason for suggesting a published standard was not a slur on TeX. As I said, I love it. The advantage of standards is that, in theory at least, they are not under the control of a single person, company, or reference implementation.

    I agree with you about TeX's stability. After using several different incompatible tools through the 80's for my resume, I finally put it in TeX and stayed with TeX for a decade. I'm considering HTML or XML now, but I haven't made the switch.
  • The difference is in the purpose of the markup - XML is (generally, with a good design) syntactic markup. LaTeX is entirely structural markup, specifying not *what* a particular element is, but how it is to be displayed.

    I think you are confused. LaTeX *is* designed with with generalized structural markup in mind. (OTOH TeX focuses on specific markup.) In LaTeX you use commands like \section and \chapter and \emph, and (generally) not layout markup commands like ``itallics'' or font sizes.

    ``LaTeX is, to a large extent, an example of a `generic markup language' (GML). Thanks to the class file mechanism, the visual style of the various document elements are described in a single place outside of the source document itself'' (The LaTeX Companion, 7).

    I hope that clarifies things.

    --Ben

  • by bhurt ( 1081 ) on Saturday December 23, 2000 @11:51AM (#541851) Homepage
    Consider using TeX/LaTeX, postscript, or an XML/SGML variant, like DocBook or HTML.

    Basically, what you want is a format the fits the following criteria:
    1) The original text can be easily gotten out of the format. This way even if the programs that read the file go the way of the dodo, future programs could still recover the data.
    2) The specification is fully open and documented, and preferrably stable and mature.
    3) At least one open-source program handles displaying/converting the format. I would recommend storing a copy of this program in the same place as the standards themselves- including shipping source with standards CDs.

    You've gotten over the hardest part already- you've realized you have a problem.

    Brian
  • You mean people edit those MicroSoft Word documents at the byte level? I didn't know that, I was always under the impression that they cheated with some program called "Word". Well, apparently such cheater programs are not allowed...
  • You don't want it working exactly like Word, that would destroy the whole idea. Having a nice GUI is fine, great in fact. However, it should work semantically. And, no, the people typing don't have to worry about DTD restrictions and XSL - that's handled by the application. Look on freshmeat for Morphon to see a good XML application for end-users.
  • SDW (Star Office) is just as proprietary as Microsoft's DOC, but supported by fewer platforms.
    Ahem? Staroffice (well, Openoffice) can parse its own files and is GPLed. Its format is documented as well.
  • by tobyjaffey ( 132850 ) on Saturday December 23, 2000 @09:18AM (#541863)

    HTML is great, XHTML (or at least HTML >= 4) is better.

    The problem with HTML is that it was designed to be a markup language for simple documents, so it has heading, subheadings, titles, paragraphs etc. However, as people wanted to do more and more stylistic things with it, the language was extended by the w3c. But, most people kept just bastardising it by using heading tags to make things big and bold tags to emphasize things.

    HTML is a big, nasty mix of structured document and stylistic tags. What HTML 4 strict does is to say that HTML is just a structure language with no formatting info. Then you use CSS or XSL to do the style work, which is a much more sane and portable approach.

  • Unfortunately though, RTF cannot be structured, at least as most programs use it.
  • As previous posters have pointed out, TeX runs on more than just Linux =)

    There are a few problems with using TeX/LaTeX. The first is that TeX tries to do paragraph-by-paragraph layout, and often winds up in tight spots that it doesn't need to. The average user wouldn't have a clue about what to do with overfull hboxes.

    Another problem is that it's not really possible to do WYSIWYG, and those people who use lots of spaces instead of tabs (even with variable width fonts, heh) will have a rough time adjusting to that. People will complain about things like "well in Word the line wraps this way..." BTW this is a problem with Word itself; it's figure placement is really screwy.

    Finally, those of us in academia who write papers in LaTeX can no longer look down on those whose use of Word is obvious by the terrible aesthetics of their papers.

    Obviously, there are lots of advantages, and for Microsoft, possibly the nicest thing about TeX is that there are no known bugs. (not that Microsoft will have any problem adding some...)
  • Most biz owners, small biz or enterprise, usually have the latest version of word.

    My experience is the opposite. Where I work, Office 97 is the standard, along with Windows 95 and Windows NT 4.0. The company (Fortune 50 corporation) is conservative about upgrading to new versions of software. They don't want to spend money on new software unless there is a compelling reason to do so. I don't know anyone who has Windows 2000 or Office 2000 on their work PC.

  • by Fantastic Lad ( 198284 ) on Saturday December 23, 2000 @12:19PM (#541872)
    I still love RTF despite the flaws.

    For straight wordprocessing where no layouts are required, it's great. It's ascii with the expressive power of italics and extended symbol recognition. For straight word smithing, that's all anybody needs.

    Here's what I do:

    1. Do all wordprocessing using a very basic text editor which saves very basic RTFs.

    2. Import those files to whatever layout program is needed. (Quark, Pagemaker, whatever.)

    The stability of RTFs lies in two areas; Firstly, there will ALWAYS be a selection of homemade editors available upon which to do your writing, and second, there is no financial incentive for big software manufacturers to make RTFs un-importable to their suites and layout packages.

    This means that doing all your basic work in RTF will make files readable for a long time to come.

    In any case, particularly in the print publishing field, today's software is finally about as good as it needs to get. There's no real reason to switch tools. Unlike with graphics technology, Times Roman simply doesn't need to motion blur and bump map for a writer to work his or her craft. As long as we all keep our old copies of Wordperfect and Pagemaker, everything is fine.

    Fantastic Lad

  • As far as I know, staroffice is not available on a PPC platform, only x86 and sparc. The author specifically said he wanted it to work on MacOS
  • Anywho, DOC is a biz standard. This isn't going to change.

    Till the next version of Word is released, then...it changes!

  • Comment removed based on user account deletion
  • That's true, but how often do people use these things?
  • 1. It's itself not a standardized standard, and the different dialects are evolving continuously.
    2. A document can be rendered quite differently by different browsers.
    3. You can't even get things like page numbers in HTML documents.
  • by dbarclay10 ( 70443 ) on Saturday December 23, 2000 @09:28AM (#541884)
    I'm sorry, but I have to disagree with you.

    XML is nothing more than a concept - you store data and text within "tags". The tags can be of pretty much any name. The data can be anything. This isn't a standard, it's not even a format.

    Basically, XML boils down to: store it in a text file, delimit data, fields, and content by tags. Sorry, that doesn't cut it. You have to do more.

    No, if you want to think about using XML for this, you need to talk about the DTD, not XML itself.

    So, the question becomes, which DTD? In order to compete with the competition(LaTeX, HTML, PostScript), it has to be: device-independant, easily rendered, easily edited, and extremelycomprehensive.

    Don't shout "XML!!". XML, without a DTD, is almost useless, especially for this application. The DTD has to be all those things I mentioned, plus(for this application), it needs to be standard.

    Dave

    Barclay family motto:
    Aut agere aut mori.
    (Either action or death.)
  • Just a slight correction, the DoD standard for documents is the 2 latest revisions of MS's .DOC format.

  • by Speare ( 84249 ) on Saturday December 23, 2000 @09:29AM (#541887) Homepage Journal

    I hadn't heard of DocBook, so I went fishing on docbook.org [docbook.org] for some basic info.

    The state of the documentation for this product is fairly lacking. (Hey, it's a DOCUMENT application!) There's no "getting started with DocBook" stuff. There's no official tutorial.

    The closest thing to a tutorial I found is this page: DocBook intro [lanl.gov]. I'll excerpt the front page.

    • DocBook intro
    • Here is my tutorial on DocBook. I never completed it, but it is still useful, since others don't focus on a complete beginner tutorial.

      Last modified: Mon Jul 27 11:19:57 1998

    Frankly, this sums up my issue with many Open Source projects: making a technically superior tool is not enough to generate wide user acceptance. There has to be an easy migration path from what the user's already got.

    DocBook needs at least ONE of the following to get people going:

    RTF/DOC/FrameMaker/TeX to DocBook converters, supporting at least a good 75% of basic features,

    A usable migration tutorial that assumes the user already makes RTF/DOC/FrameMaker/TeX documents,

    A usable editor that shows the results, even if it has to be two-paned to show both source and results.

    I'm not flaming Open Source in general, but this is not the first time I have heard of a tool that would fit my needs exactly, except they put very large barriers to entry in my path.

  • by iapetus ( 24050 ) on Saturday December 23, 2000 @10:08AM (#541888) Homepage
    The difference is in the purpose of the markup - XML is (generally, with a good design) syntactic markup. LaTeX is entirely structural markup, specifying not *what* a particular element is, but how it is to be displayed. As a result, XML is easier to tailor to a particular application's needs - from the XML document you can trivially create the equivalent LaTeX document. The same does not hold true the other way round.
  • The whole document writing process has to be as transparent as selecting fonts, size, justification, etc. with a simple mouse clic on an icon or scrolling menu.

    No. First, we start with unlearning past mistakes. It is often handly to have nice, solid piece of wood in your hands at this point, as we teach "You do not want to change fonts and sizes. You want to think about your document's structure and mark it up accordingly."

    Yes, we don't have to beat that into "the average John and Jane Doe" or "the average secretary" who just wants to type up a one page letter, but when people are creating real documents structure should be in the front of their minds. Otherwise they're fscked from the start, regardless of technology choices.

    Tom Swiss | the infamous tms | http://www.infamous.net/

  • There's a running joke at my office on my constant threats to start doing all wordprocessing in HTML,
    Why leave it as a joke? Last contract I had, I wrote up all my intra-team proposals and documents in HTML. (These were, I should note, short documents, three or four printed pages tops, so lack of large-structure layout wasn't a problem.) Didn't have to leave the comfort of my Emacs window, didn't have to worry that they'd be unreadable two years from now when M$-Word was no longer backwards-compatable, didn't have to deal with dancing paperclips or crashing Macs.

    Tom Swiss | the infamous tms | http://www.infamous.net/

  • The only way that DOC files could be made a standard would be the public release of the internal file specifications so that everyone can use them.

    . . . . right.

    I can see M$ going for this one right now. (HA HA HA!)

    This means that the file format would have to be made a part of the public domain.

    IANAL, but I think this would take a prodigious amount of legal wrangling.

    I personally prefer a format like xml or html where you can see the tags, etc. and figure out what is going on, if someone made a mistake. Mind you, this is just me, just a personal preferance.

    I also wonder about designing a file format for the future, given the various changes in technology. As an example, there is a new technology that has been demonstrated providing 3d displays in shocking detail, no special glasses needed. Not a Moving Picture yet, but you get the idea. How to incorporate this? The file format has to be scalable and adaptable.

    MS word does really horrible at things like books, where it is better to use a page layout program like Pagemaker.

    so it looks like we have to re-invent the wheel here, and include all of those features that make the best sense. Yet another Open Source project for the masses.

    Don't look so enthuthiastic now!

    ;-)

  • by aphr0 ( 7423 ) on Saturday December 23, 2000 @10:18AM (#541893)
    Thanks for showing the maturity everyone has come to expect from the linux community.

    Hey linsux users - grow up.
  • When I'm at a conference or using a laptop in general, I do write in html because a) it gives you massive cred to any shoulder-surfers and b) it can be less power-intensive and less prone to laptop mouse problems.
  • ...but you've missed the point of XML (yes, it IS a standard and yes, it IS a format - I think you'll find that all tools for working with XML are very consistent) and an important mention in my post.

    Certainly, you have to assign meaning to tags in order for data to be formatted correctly. The whole point of XML is that data carry traits and structure (which of course, can be inherited).

    This is where the concept of a template would come in. I had mentioned this but you must have looked over it.

    You have a set of rules defined that determine what certain tags do. Very similar to HTML now (table, p, b, div, etc. are all assigned functionality). With XML, these templates can even be a part of the document with tags that flag them as such. The trick is to put as little of this in the hands of the word processor itself.

    I never said "XML! XML!" all by itself. XML is fairly abstract. Obviously we need everything that works along side of it and I'm talking about all supporting technologies if I'm talking about XML. If you read the article again, you'll notice the question was about document formats, not whether or not we'll need templates to go along with our XML formatted data.

    :-P

  • Surely, if you expect the average user to type commands like

    \raisebox{-12.8mm}{% \setlength{\unitlength}{1truemm} \includegraphics[width=50 pt,height=50 pt,keepaspectratio=true]{logo2.bmp} }

    you're right, but I don't. Positioning, scaling and using graphics within LaTeX is far from easy. And we don't have to discuss in which ways MS Word sucks -- it will never find its way onto my harddisk. (I personally use TXT, LaTeX, HTML and StarOffice, depending on the task at hand.)

    The question is not whether something is possible in LaTeX, the question is how long it takes the average user to do it.

    --

  • Yeah.. I can see how it's easily portable with graphics, doing chapters, PAGE BREAKS, headers and footers on pages (which may or may not be common) and have you ever pulled in HTML to edit on any of the above?

    Ok, I'll bite.

    Firstly, you seem to be missing the main point of the question. This isn't about finding a generic format for page layout - it's about how to best transfer specification documents so that they can be written anywhere and read anywhere. HTML works wonderfully for this.

    Secondly, _yes_, you can do all of the above, when it makes _sense_ to do so.

    Page breaks? Easy. Have a set of linked documents instead of one big page. This is useful under some circumstances (like dividing a large document into sections).

    Chapters? Um, you _have_ the tools to emphasize chapter headings, and you _have_ page breaks if you really feel you have to use them. Where's the problem?

    Graphics? If I need a figure, that's what the image tag is for. If I want to have anything fancier than an image in a box... then I should have someone else write the standards document. Again, we aren't making magazine articles here - the goal is to find a format suitable for a *technical description*. Visual gingerbread is _counterproductive_; it distracts the reader.

    Headers/footers? Frames work fine for that, if you have a real reason to use them. I personally can't think of any, for this application. For my own documents, if I'm writing something that must be pretty, I use a script to prettify things consistently.

  • What about Emacs/XEmacs?

    Show me an out-of-the box installation of Emacs for Windows that not only does decently install & configure the program without much user interaction but also gives you all the info you need to know to write letters, including an easy interface to select templates for common tasks.

    No, you can't expect the average user to acquire this info by themselves. Emacs is even too much for a geek like myself.

    --

  • What, exactly, is the problem with the Word document format? Couldn't open source programmers add features after Microsoft stops using it (anaylise it, "emulate" it, build off it)?
  • If you want true independence from propietrary data formats (and open source applications can have data formats that are just as restrictive as closed source applications to most users) then XML is the only real choice right now - a well defined XML document should be readable even *without* a parser, and with a well-defined DTD and a series of appropriate XSL files, you can select your own viewer application. What could possibly be better? Certainly not Word, StarOffice, LaTeX or any of the other competitors in this arena.

    I'm not sure why you include LaTeX in this list. I'm not sure which, LaTeX or XML, would be best for the proposed use, but LaTeX most certainly *is* readable even without a `parser.' The other aspects of XML and LaTeX are where the two formats differ but both are designed as structured markup saved in ASCII.

    --Ben

  • Did XML kidnap your cat?

    No, if you want to think about using XML for this, you need to talk about the DTD, not XML itself.

    I think it's safe to assume that defining a DTD was implied. It's simply easier to say "Use XML" than to say "Write a good DTD to use with XML".

    So, the question becomes, which DTD? In order to compete with the competition(LaTeX, HTML, PostScript)

    That's just the point. It doesn't need to compete with other formats. The process goes something like this: Write a good DTD that fulfills all your needs, and allows for easy extension and specialization later on. Then, write XSL for exporting the format to whatever other formats are useful. HTML obviously for web display, PostScript for printing, maybe one for PDF, even. (Though encouraging the use of PDF probably isn't any better than encouraging the use of Word's DOC)

    it has to be: device-independant, easily rendered, easily edited, and extremely comprehensive.

    How could it possibly be device-dependent? This is just text, we're talking about.

    Easy rendering has nothing to do with the XML DTD or document, that's the responsibility of the XSL that would accompany it, or the application that parses the document.

    Easy editing is pretty straightforward. Just edit it. This goes along with comprehensive. A good DTD can be comprehensive, but it can also leave room for extension without breaking that document. It is, after all, the extensible markup language.

    Don't shout "XML!!". XML, without a DTD, is almost useless, especially for this application. The DTD has to be all those things I mentioned, plus(for this application), it needs to be standard.

    I wouldn't say XML without a DTD is useless, but I will say XML without a DTD is silly. It's a simple, logical assumption that if you're writing XML documents, they should have a DTD, so you know what's allowed. Like I said before, it seems like this would be implied.
  • The key issue, IMHO, this company needs to decide is what they want most from documentation: presentation or content.
    Microsoft .doc format (and StarOffice's .sdw format) are very presentation-centric. The only thing that matters to it is how the printed page will be. PDF, PS, and many other formats share this limitation. Ideally one should focus on the content of the documentation, and allow it to expand without massively reformatting the page every time. My company has run into this issue already. We open up our Product Requirement Documentation to modification as needed, and thereby lose all the formatting the Product Management staff has worked hard to get in there. Ever tried adding a paragraph on a page with an image anchored to a page position in MS Word? You get my drift. If you choose to use the DocBook DTD (Document Type Definition) with XML (Extensible Markup Language) or SGML (Standard Generalized Markup Language), you can use an off-the-shelf DSSSL (Document Style Sheet Specification Language [I think]) or create your own to customize how the "compiled" raw SGML/XML should look. An earlier poster said there is no good documentation on DocBook and SGML/XML. Bull Hockey, there's a full-fledged guide on how you can create standards-compliant, flexible DocBook available as the "LDP Author's Guide" at http://www.linuxdoc.org. [linuxdoc.org]

    Matt Barnson

  • by FattMattP ( 86246 ) on Saturday December 23, 2000 @10:33AM (#541919) Homepage
    Maybe if you had bothered to look around docbook.org a little more you would have noticed that there is an entire O'Reilly book available online and for free [docbook.org] about Docbook and how to use it. You can also purchase the dead trees version from your local bookstore.
  • My two favorites: Rich Text Format (.rtf) and HTML. HTML has obvious advantages, but the disadvantage that it really wasn't designed for word processing as such. RTF was a format that, I believe, DEC came up with as a software-independent storage format for word-processor documents. I've found it does most everything needed to keep formatting and such intact, it's readable and writeable by most WP software ( MSWord, WordPerfect and StarOffice that I've confirmed by use ). It's also a plain-ASCII format, if you've no word processor you can pull it up in a text editor and get at the actual text if you really have to. And it hasn't had changes made to it in many years, stability is a definite plus for a long-term storage format.

  • You might want to investigate XML, with a suitable DTD, e.g. DocBook for technical manuals. Also, SVG is an XML-based format for vector graphics, which always seemed to be the point at which SGML-based efforts had trouble.

    Tool support for this combination may not be so good or inexpensive, but you can be fairly sure the content will survive and be usable in many different environments.
  • There are numerous things that make HTML a poor choice for documentation.

    First, there's the aforementioned kluge of the HTML standards, but if one is writing documentation, he should stick to pure structuring (at least at first) anyway. If I write an entire document using <p> and <hX> tags, sure it'll be portable, 100% compatible with the W3C guidelines and so on, but there's more than that.

    HTML, unlike many other more complicated mark-up languages, has poor support for "book" features. Headers, footers, generation of table of contents, page numbering, margins, cross-platform printing support. The list goes on and on, but if you're doing anything but looking at it in a browser, HTML is not a good choice for documentation.

    So that's why HTML is not the best choice for documentation, not because of any grandiose "stylistic perfection" ideas. Furthermore, HTML is no more or less open than may other mark-up standards (e.g. SGML, XML, TeX), and they're all roughly on the same line in terms of portability (if you get the right tools, that is).

    Basically, HTML makes a good "quick and dirty" documentation tool, but if you want your options open (wide open), SGML (or maybe XML in a few months) is the way to go.
  • by Weezul ( 52464 )
    Seriously, how can anyone consider anything diffrent from LaTeX for serious writing (unless they have a publisher with trained monkies to rewrite it in TeX)? Hyperlinks are the ONLY feature missing from LaTeX, but LaTeX is about the only system with a good clean way to handle the old fassion hyperlinks (i.e. index, figure numbering, etc.).

    The point is that you must use LaTeX if you want your work to ever appear respectable in print, so the question is: dose your publisher want to TeX it themselves from your draft or do they want you to TeX it, i.e. it's a question of money. If your an autonomous institution which dose it's own publishing and dose not have ass loads of money then you really need to make people TeX it.

    Now, there are SGML systems which produce TeX and HTML, but they may not handle pictures propperly, so you should be very careful. Actually, there are ways to include hyperlinks in LaTeX. The resulting dvi file can be compiled to an HTML file. This is almost shurly the very best way to typeset your documents since you can write a TeX macro to treat images propperly for conversion to postscript OR HTML. It would work something like this: your images would be compiled to both .eps and .jpg, the TeX macro would embed the .eps and the URL for the .jpg into the dvi, the dvi could be converted into both .ps and .html without loosing the pictures. There are some issues regardling the placment of the images when convered to HTML, but nothing a LaTeX hacker could not fix.

    Jeff
  • XML isn't just "text files with tags". XML comes with standards for describing the grammar of those files, with a standard language for describing transformations, and with standards for performing physical layout. In terms of tools, there are standard libraries for many languages to read and write XML, and standard APIs for manipulating XML once it has been parsed.

    XML is pretty verbose and ugly. It's not the most convenient format to type in. But, in some sense, it finally extends the traditional UNIX approach to more complex data types. UNIX used to give you scanf, printf, and plain text files with fields. XML now extends that to parsing, generating, and transforming tree-structured types. That's really great, and it is really useful.

  • TeX output is pretty good, and LaTeX markup is pretty good, too.

    Where TeX falls way short is in the way it is programmed and extended. The TeX processor is more like a machine language, with registers, lots of side effects, hooks, and global variables. Doing non-trivial transformations in TeX is incredibly hard, and even the best macro packages often don't get it quite right.

    XML's approach is both more modern and much simpler: you describe transformations on the parse tree. XSLT and XSLT:FO are what corresponds to the programmable guts of TeX.

    Most likely, what is going to happen is that many documents will be authored in XML, many document styles will be described in XSLT and XML Schema, and TeX will be used not for defining macro packages, but merely for performing the last stages of physical layout.

  • Hi,

    I'm a maintainer/lead coder on a couple of OS Office Projects: AbiWord (http://www.abisource.com) and wvWare (www.wvware.com). I've written quite a few import/export filters for AbiWord.

    AbiWord is an excellent OS word processor which already handles lots of existing formats that you speak of: DOC, XHTML, DBK, RTF, et. al.... They each have their own mertits, advantages, and disadvantages.

    XHTML is not a good layout language. It has all of the same problems that HTML and thus web pages have: i.e. WYSISYG formatting is next to impossible to achieve.

    DBK is nice, except that DBK wasn't really meant to be a WP file format. Its tags carry with them lots of semantic information that WPs generally just don't care about. Its layout tags leave much to be desired for a WP. There just isn't a clean mapping of DBK->WP tags.

    RTF is really slick (even though it is kinda old). Basically, anything that the AbiWord format can represent, RTF can do. This is a really good choice for your format.

    ABW (or your WP's favorite native format) is always nice because it maps neatly onto your feature-set.

    DOC support isn't really all that bad anymore. If you know what wvWare is (if you don't, see www.wvware.com), you know that it can convert DOC files into just about any format that you'd like. It can do this through either the command-line version of through the 50KB associated library. AbiWord uses wv to import DOCs. The importer is about 1100 LOC. I'm currently also writing DOC export support into wvWare (and thus AbiWord). Our DOC importer is *significantly* better than the one that OO has released. That will probably change soon, since Sun hired wvware's ex-maintainer to work on OO ;) Our DOC exporter currently exports something that looks like DOC at about 10 paces - i.e. it's not really DOC format, but it's getting there.

    Anyway, hope that this helps,
    Dom
  • Lots of places (esp. US DOD & auto-industry) faced this problem years ago and came up with a stable, reasonable solution:
    SGML
    It's open, cross-platform, flexible & has a long heritage. If you want to embed graphics call out to a Postscript fle.

    Framemaker speaks it, WordPerfect speaks it, I dunno about MS Word, and of course it can be pumped out into lots of other formats (eg HTML, XML, etc.)

    It's not a perfect solution but it's widely availiable and fairly future-proof. Your specs should be about content anyway, let the reader concern themselves with presentation.

  • The advantages of using cross-platform and implementation independent standards are three fold:

    1) XML separates your content (XML) from your structure (DTD) from your presentation (XSL) leading to far more concise and rational documents.

    2) They are open standards, unlike Word, FrameMaker or other proprietary formats.

    3) The tools for document creation are open (and closed,) cross-platform and not dependent on the largesse of any single source.
  • I vote for HTML. Yes, it isn't great for fine layout control, but you don't *NEED* perfect layout control. We're writing standards specs here, not doing graphic design.

    The advantages: HTML is readable on any platform under the sun (and quite a few in caves), and most word processors can export using it.

    If the documents have figures, they can be saved as one of .gif/.png/.jpg, and read by most browsers.

    This is the only way I've found to get MS Word-users to give me readable documents, among other things.
  • I'd use DocBook. DocBook is a system for writing structured documents using SGML and XML. DocBook, provides all the elements you'll need for technical documents of all kinds. A number of computer companies use DocBook for their documentation, as do several Open Source documentation groups, including the Linux Documentation Project (LDP). With the consistent use of DocBook, these groups can readily share and exchange information. With an XML-enabled browser, DocBook documents are as accessible on the Web as in print. The format is used by O'Reilly and Associates and they were one of the original creators of the specifications. You can find more information at these links:
  • by Bob Clary ( 224900 ) on Saturday December 23, 2000 @10:59AM (#541953)

    DocBook is your friend

    DocBook is a lot to digest at one time, but it is well worth the effort. Personally I prefer DocBook XML and use Norm Walsh's XSLT stylesheets to transform the XML to anything I want... HTML, PDF, whatever.

    Here are some resources for your reading pleasure.

    DocBook is Open Source, freely available on all platforms of interest, can be used for simple documents to complex books, separates presentation from content, and is extensible. What more could you want from a document format?

  • Postscript. You can get programs to read it free for any OS, it's device independent, ANYTHING can render into it, and it gives you supurb control over the content of your document. It is truly the lowest common denominator.
  • Reliable reverse-engineering of the doc format *has* been performed. Both Staroffice and Abiword can work with doc files just fine.
  • Your examples really aren't relevant in this case. Firstly, it's actually easier for an automated process to generate text from RTF than the other way round, since the RTF document must include additional mark-up. Your second example is even further from the mark - you obviously can't create the required PNG file from a text file (or not easily), so it's not a good storage mechanism. However, given a well-defined XML document structure it becomes trivial to use XSL to transform it into a LaTeX document representing the same data, or an HTML document, or some other custom format. And XML is intended to be used for this very purpose.

    I've actually been through this process at work - we shifted from using a proprietary file format for our invoices to using an XML representation which can then be used to generate a range of views, from HTML for viewing on the intranet to LaTeX for printing and sending to customers. It's a great solution, and it means that we're not tied down to using LaTeX - at a later stage we can change to another document format, and all that needs to be changed is the XSL document for all of our invoices to be available in the new format. If the documents were originally stored in LaTeX format, we would not be able to do this - a change in the output format would require all the invoices to be re-entered (as was the case when we switched *from* the first proprietary format) or a large amount of custom code to be written.
  • by Gendou ( 234091 ) on Saturday December 23, 2000 @08:59AM (#541961) Homepage
    ...but I think XML is the clear answer here. XML is already very mature, can be used in a number of situations, and can incorporate more than just text.

    You can even embed binary data in an XML document (with a tiny bit of creativity) for all those people who like to populate their files with custom fonts, clipart, graphs, etc. (This is accomplished through something, say... <BINARY CLIPART><DATA>[image data]</DATA></BINARY CLIPART>. You get the idea.)

    How about special configuration parameters? You could incorporate tags that would handle the way a document is viewed by different people ("are you a techie, marketing drone, webbie, etc" -> certain data becomes visible).

    The biggest advantages here are obviously the standards provided by XML (thank you W3C). It's uses are broad. It's got high quality interpreters on ALL platforms (especially JAXP for Java - it's a joy to work with *g*).

    The only standards we'd really have to focus on would be which tags would be considered "key" tags.

    What else do you need? Doesn't OpenOffice already use XML as it's standard document type?

    Sure I could be wrong on this, so don't berate me too much. I've just had a lot of positive experience working with XML for sooo many different applications.
  • At the university writing lab where I "work" (well, they pay me, and sometimes I do things that involve effort), students are constantly bringing in Word documents that they can't open. There's a million reasons. They wrote it in Works. They took the disk out while the Word had the file open. Or the #1 reason: they wrote it in Word 2000. So they tour the university computer labs in search of something that will open their document. They try Word 97 on our machines, Word 2000 on the ones upstairs, different Word 2000 on the NT boxes in the engineering building. It goes on and on. Sometimes it works, sometimes not.

    Often I try to convince the English department to teach their students to use something more compatible, like HTML or RTF. There is little demand for images and tables, and when there is it is really part of a Powerpoint sedative or spreadsheet. But they always say, we have to use Word because it is the standard. Meaning that it is the universally compatible format that everyone can read. Now am I just crazy here? Don't answer that. If Word were in fact any kind of standard, why do we have the Tower of Babel with all these incompatible Word documents? Word may be the standard word processor, but there is no standard Word format. There are a dozen different Word formats.

    Everyone might as well use whatever weird word processor they like, and pass along a second copy in plain text every time they try to move the document to antother machine. The net effect would be the same.



  • by Anonymous Coward
    Lasts hundreds of years, and OCR scanning keeps getting better and better.
  • I ran into this at work last year. We let people submit to us as DOC/WPD/PS (PS usually come from LaTeX) and we convert them all into PDF. There is an Acrobat reader for Win32/MacOS/Linux/Solaris etc etc. It has served us very well, as there is already a cross platform suite of tools available, and it allows for embedding graphics with text (which was a huge concern for us).
  • How do you embed graphics, usable independent spreadsheet imports, audio among other things required for presentation / documents.

    XML might be great for text but keeping binary data in it isn't the best of plans. Size being the other issue, but if always saved / restored with gzip or something...
  • by gimpboy ( 34912 ) <john,m,harrold&gmail,com> on Saturday December 23, 2000 @09:00AM (#541979) Homepage
    You shouldnt have to worry about the standards outliving TeX - Donald Knuth designed it what that in mind.

    TeX (and the LaTeX frontend) runs on about as many platforms as linux.

    the output of a tech document is quite frankly spectacular. you just dont get this kind of quality with the word processing programs that are out there today.

    many people thing the learning curve high, but this isn't necessarly so. my advisor says that LaTeX has a one paper activation energy. ie it takes you about one document to learn most of what you need to know to get things done... and once you use it you will find it hard to use anything else in the future.

    use LaTeX? want an online reference manager that
  • I must add my hearty approval. As the maintainer of the Bugzilla Guide (http://www.trilobyte.net/barnsons) I am immensely enjoying writing documentation in SGML instead of some lame proprietary format.
    Another positive benefit of using SGML: All Department of Defense (IIRC) documentation must be SGML. So if you're ever going to have to maintain government documents, SGML is a great choice!

    Matt Barnson

  • The way most Word documents are embedded with objects, you'd almost need to reverse-engineer the entire Windows OS. Embed this Visio graph, this equation, this COM object. Bleh!

One man's constant is another man's variable. -- A.J. Perlis

Working...