Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
User Journal

Journal jd's Journal: Why is wordprocessing so primitive? 12

This is a serious question. I'm not talking about the complexity of the software, per se - if you stuffed any more macros or features into existing products, they'd undergo gravitational collapse. Rather, I'm talking about the whole notion on which word-processors, desktop publishing packages and even typesetting programs such as TeX are based.

What notion is that? That each and every type of writing is somehow magical, special and so utterly distinct from any other type of writing that special templates, special rules and special fonts are absolutely required.

Of course, anyone who has actually written anything in their entire life - from a grocery list onwards - knows that this is nonsense. We freely mix graphics, characters, designators, determinatives and other markings, from scribblings through to published texts. If word-processing is to have maximum usefulness, it must reflect how we actually process words, not some artificial restraint that reflects hardware limitations that ceased to exist about twenty years ago.

The simplest case is adorning the character with notation above it, below it, or as subscript or superscript to either the left or right. With almost no exceptions, this adornment will consist of one or more symbols that already exist in the font you are using. Having one special symbol for every permutation you feel like using is a waste of resources and limits you to the pitiful handful of permutations programmed in.

The next simplest case is any alphabet derived from the Phoenician alphabet (which includes all the Roman, Cyrillic and even Greek alphabets). So long as the program knows the language you want to work in, the translation rules are trivial. The German esset is merely a character that replaces a double s when typing in that language. A simple lookup table - hardly a difficult problem.

Iconographic and Ideographic languages are just an extension to this. You specify a source language and a destination language, and provided you have such a mapping, one word gets substituted with one symbol. You could leave the text underneath and use it as a collection of filenames for grabbing the images, if you wanted to make it easy to edit and easy to program. As before, you already have all the symbols you're ever likely to want to overlay, so you're not talking about having every possible image in a distinct file. Anything not provided can be synthesized.

Other languages can be more of a bugbear, but only marginally. A historical writing style like Cuneiform requires two sizes of line, two sizes of circle, a wedge shape and a half-moon shape. Everything else is a placement problem and can be handled with a combination of lookup tables, rotations and offsets. Computationally, this is trivial stuff.

If the underlying engine, then, has a concept of overlaying characters with different offsets and scales, rotating characters, using lookup tables on regular expressions, and doing simple substitutions as needed, you have an engine that can do all of the atomic operations needed for word-processing or desktop publishing.

This method has been used countless times in the past, but past computers didn't have the horsepower to do a very good job of it. Word-processing has also been stifled in general by the idea that it's a glorified typewriter and that it operates on the character as the atomic unit. What I'm talking about is a fully compositional system. Each end-result character may be produced by a single source symbol, but that would be entirely by chance, as would any connection between any given source symbol and what would be considered a character by the user.

If it's so good, why isn't it used now? Because it's slow. Composing every single character from fundamental components is not a simple process. Because it's not totally repeatable. Two nominally identical characters could potentially differ, because the floating-point arithmetic used is like that. That's why you don't use equalities much in floating-point arithmetic. Because it puts a crimp on the font market. Most fonts are simple derivatives of a basic font, and the whole idea of composition is that simple derivatives are nothing more than a lookup table or macro.

If it's all that, then why want it? Because it makes writing any ancient or modern alphabet trivial, because you can do more in 20 fonts than you can do on existing systems with 2,000, and because it would bugger up the whole Unicode system which can't correctly handle the systems it is currently trying to represent. (The concept behind Unicode is good, but the implementation is a disaster. It needs replacing, but it won't be until someone has a provably superior method - which is the correct approach. It just means that a superior method needs to be found.)

This discussion has been archived. No new comments can be posted.

Why is wordprocessing so primitive?

Comments Filter:
  • I'm not sure I get your main point (actually, I'm pretty sure I don't), but I do want to clarify something about the German character ß (eszet). It does not replace all ss combinations [everything2.com].

    Under the old rules (which are still applicable until July, 31 2005), one uses a ß over ss when the sharp s sound is at the end of a word, before a hard t, or after a long vowel or diphthong sound. For example, Wasser (short vowel), außen (long vowel), heißen (long diphthong), Fluß (end of the word),

    • by jd ( 1658 )
      My main point is that the "atoms" you use in writing are not whole characters - they are a set of segments (in the case of most cursive scripts) or a set of symbols (in the case of virtually everything else) that are compounded to make a character. For convenience, it is handy to think of a character as a whole thing, but that is neither how it is written nor how it is read.

      If you had an underlying engine that manipulated text the way humans manipulate text, it would:

      • Be more flexible -- you can build an
      • I'm fine with plain old Times New Roman. Don't need nothin' else. Hell, I was happy back when we only had one choice of font and we didn't even have lower case letters. :) Actually, in a lot of ways, things were simpler then.

        I'm sure you see something here, but it's beyond me.

        • by jd ( 1658 )
          Actually, I'm not particularly artistic. I come from a programming background, and am simply looking at writing as nothing more than a hierarchy of primitives and operands. Which, really, is all it is. However, in the same way that no sane programmer ever writes the same block of code multiple times (if they're going to use something multiple times, they lob it in a macro or function and then refer to it by name), no sane calligraphic word-processor would require you to compose things more than once. In fac
          • Again, I'll refer to TeX and LaTeX.

            Yes. An awesome system.

            WYSIWYG didn't exist until later and as it was general practice to work on content first and presentation last, nobody gave a damn whether the print fonts could be shown on the screen. It simply didn't matter.

            This is one reason that LaTeX is so awesome. It frees you up from worrying about how something looks and lets you focus on the content - until you're ready to really think about how it looks. One problem with WYSIWYG (at least for me), is th

            • by jd ( 1658 )
              The physical would be the appearance - be it layout, stylizing, or whatever. The logical is the text you are actually wanting to enter. When you print in final form, you see what is logical, in the manner and form described by what is physical. Yeah, the terminology is a little screwy, but it gets to the heart of things.

              So, if I were to have a paragraph of text, in which some word in it is set in italics, the logical layer (using this terminology) would be a block of text. No formatting, just text. One or

              • So, you're wanting a pipe-lined LaTex? Is that the idea? I suspect there's more that I'm still not getting, but perhaps that's because I'm already happy with LaTeX.

                I recently watched "What The @#$^ Do We Know?" on Google Video, and it had an interesting story (probably not true, but interesting nonetheless) about native Americans and the first European settlers. They said that for days the masts of the tall ships were visible on the horizon, but the native Americans simply did not see them because the mas

                • by jd ( 1658 )
                  Imagine a pipelined LaTex where you just type the text. No section tags, no font tags, no escaped special characters. Just text.

                  The next layer in has just tags, no text, and doesn't repeat common structures. It's pure layout, nothing else.

                  The third layer through to the second to last are your style sheets, but written in a clean, functional way and not the splodge that so many regrettably are, and contain pure LaTeX commands for splicing atomic font elements together to build all the regular and special cha

            • by jd ( 1658 )
              Almost forgot. IMAO is given in the Usenet guides as "In My Arrogant Opinion", but if you think my insanity is worthy of a different interpretation of the A, then please feel free. If my writing were a little clearer, there might be a little less insanity, but if you think there's a nugget of an idea in there that might be genuine gold, then don't let me get in the way of you calling both the trash and the gold for what they respectively are.
  • "Why is wordprocessing so primitive?"

    Message -- Backwards Compatability -- Media-{predominately print/includes some broadcast}

    Word Processing Programs are inherently primitive, because they are connectors between work produced on computers and 2 dimensional output formating to ease distribution. In veneration of otiose ontological opacity; it is another of the many delimiting artifices indicating we are experiencing the end of the digital age. There is also a 'lowest common denominator' of intended aud

    • by jd ( 1658 )
      Wow. Now, that was a response and a half. (It was fractal.) I even understood most of it, which is scary. :) Seriously, it's given me quite a bit to think about. Thanks.
      • if I could communicate my thoughts this way. Fractal is a good description, order from the chaos. It was an attempt to show that the source for the primitive is largely language itself. I don't really have any good solutions for it though. I threw in the boolean and the definition list, because I figured you'd be able to understand what I was driving at, but many without a science/math background would be flailing at comprehension if I played semantic games like that with them. Not so much with a simple

"The medium is the massage." -- Crazy Nigel

Working...