IBM Unveiling New Transcoder Technology 61
JavaNPerl writes "This Infoworld article states that IBM is about to beta a transcoder that would translate content based on the client. This may get rid of
some of the headaches in coding HTML and JavaScript for different clients one day and also make more content available for handhelds. " It's like the Holy Grail - keep seeing glimpes of potential systems, but this sounds like it may be the real thing.
XML (Score:1)
Re:HTML x.0 is NOT portable! (Score:1)
The HTML 3.2 spec was a braindead way to make the common use of HTML standard i.e. lets pollute HTML with the crap Netscape and Microsoft came up with. Then W3C had to remove the same crap in HTML 4.0 *sigh*.
If you had a clue what HTML is (or at least tried to be until "webdesigners" had any influence) it wouldn't come as a surprise different user agents present information in different ways. Or maybe you can explain to a blind user why they should care about your colors or typefaces?
If "webdesigners" didn't misuse HTML they wouldn't have to spend weeks of headache time and the rest of us wouldn't have to suffer from their work.
Repeat after me - HTML is NOT a layout language.
Re:Duh (Score:1)
Otherwise, HTML's habit of treating the entire file as a single page creates terrible viewing problems which only get worse on smaller screens.
This would be fixed by a standard which treated your browser window/viewport as an entire page, and formatted accordingly.
This might also get rid of some of the problems with people printing out stuff all the time -- they do that precisely because there's no document format which presents each view of the document as a complete page, with document progress indicators and links and everything.
In other words, if you change the width of your window, the document instantly repaginates.
I suspect I'm not being very clear at this point -- let me know if I'm making sense to anyone.
-Billy
Re:retry: why HTML alone isn't sufficient (Score:1)
IBM's solution clearly fails to provide that (there's no way a server-side solution could), which is why I said it falls short.
But that doesn't mean that the optimum solution is a single web page -- the problem there, as always, is that on each monitor screen (or printout page) you get partial chunks of information. This can be as trivial as half-visible lines or as nasty as huge tables sliced into incoherent parts.
The true solution is on the client side.
Ooops, I said you were wrong. Well, really, you're not. Single-document is the way to go whenever possible. It's just that it isn't the solution to this problem, only an essential part of the correct solution.
-Billy
retry: why HTML alone isn't sufficient (Score:2)
This is because, among other problems, people don't like to be shown part of a set of information. HTML implicitly assumes that every web page (HTML file) is a single viewer page, but not even my 21" screen will show all of most web pages (so most pages can't possibly be a single page).
The problem only gets worse when you add small screen sizes, such as the Pilot.
The solution is to be a little smarter about displaying information -- you have to think about how much your user can see at once, and be careful not to show them any half-lines of info or half-table cells. Essentially, you should do a single page of repagination every time the user presses the down arrow or pagedown.
Instead of doing this, modern file display programs either assume that their only job is to display the entire document (HTML, less, MS Word in draft mode) or only display optimally for a single page size which almost no users have on their monitors (Acrobat, Word in page preview, ghostscript).
This IBM server is interesting because it forces a current format to be reasonably presented on multiple device -- or window -- sizes. It doesn't remove the need for a real document reader, but it sure is better than we've had before.
-Billy
This is much more than a transcoding app. (Score:1)
Check it out at http://www.almaden.ibm.com/cs/wbi/ [ibm.com]. You can download a free but close-source :( version for non-commercial use.
Beer recipe: free! #Source
Cold pints: $2 #Product
This is possibly very clever (Score:3)
What this may do (I couldn't tell from the article) is clean up dirty HTML to make it portable. If so, then yes, it's a fairly clever little piece of software.
It's not just limited to HTML, either, and may take input in a number of different formats (Word, PDF, SGML, XML, more?). The thing they have to ensure is that their transcoding backbone is extensible enough to cover all possiblities. If they fall into the trap of aiming for the lowest common denominator, it'll be doomed. Hopefully, IBM are smarter than that...
Buzzword generator on overtime (Score:1)
Re:missing the point (Score:1)
This product seems to want to take care of the second problem so you don't have the first problem.
Right?
Re:HTML x.0 is NOT portable! (Score:1)
I think you are missing the point. HTML is meant to look different in different browsers, or with different style sheets. It specifies the structure of the document, and then it's up to the browser to display it. Hence the name, Hypertext Markup Language rather than Hypertext Display Language.
If you want pages to look exactly the same everywhere, use PDF or some other display language.
Re:HTML 4.0 (Score:1)
Not really - it specifies how to render tables, for example, but it's entirely up to the user what font to choose for headings. This is what style sheets are for.
... (Score:1)
--
Re:I believe I've found a /. bug... (Score:1)
On Fri, 25 Jun 1999, Chris Abbey wrote:
> Rob, would it be possible to add & constructs (> é " etc...)
> to the allowed html formating list for comments? Thx. -=Chris
I think if you post your messages in "Plain Text" mode, those charachters
are encoded for you. You can't really mix and match, but it works if you
need those charachters.
some points missed (Score:2)
Browsers are *one* target recipient, but by no means the ONLY one... palm pilots for example have very little use for full blown HTML 3.x - let alone CSS and Embeded frames, but this technology can target them*. [ *note: I am assuming that this is the same technology that was demonstrated by the SanFrancisco project at javaOne this past June, where both a palm pilot and a browser recieved the same content tailored to their UI and ran the same application logic. If it is then what it can really do will blow your socks off.... if it isn't then I can't wait to see what Research has up it's sleave. ('cause ya's just KNOW us developers don't get license to play with stuff this cool on corp's time...). ]
Think about this as an *Information Developer* (not an HTML developer, or a "webdesigner")... what do you really want to accomplish? Seperation of content from format? Yes. Targeted formating for a wide variety of presentation systems? Yes! Maintenance of sanity in the process? YES!
OK, so let's play a hypothetical... you need to put up a simple content component (the building block of a larger information presentation). Let's say you want this component to be called a slashdot poll. So you script up the first polls content:
[topic name="best PHB tormentor"
code="phbtorm"
choice1="frabble-do-hickie"
choice2="whakka-loofla"
choice3="nimrod-doodle"
choice4="source code to current project"]
(yeah, totally made up grammer... I'm sure it doesn't look anything like that.)
then you write the first conversion sheet, with a target of text/html
[b]Slashdot Poll[/b]
[FORM action="http://slashdot.org/pollBooth.pl"]
[B]$name[/B][BR]
[INPUT type=hidden name=topic value=$code]
[INPUT type=radio name=aid value=1]$choice1[BR]
[INPUT type=radio name=aid value=2]$choice2[BR]
[INPUT type=radio name=aid value=3]$choice3[BR]
[INPUT type=radio name=aid value=4]$choice4[BR]
[INPUT type=submit value=Vote>[/FORM]
of course you'd really need a lot more (just look at what really is wrapping up the poll) and also be a tad more generic so that you could have a counter that says how many options, and a loop to ittereate and build the form and all that presentation gorp that means nothing to INFORMATION DEVELOPERS. Then you'd turn around and create a second conversion sheet that tells your phonemail system how to present this as a VRS. "Today's slash dot poll is $name. Press or say one for $choice1. Press or say two for $choice2. [...]" (I can already hear the voice of Stephen Halking asking what the past tense of ping is....)
Now once you've written all those conversion sheets, you're done with them. (unless you want to change you display style for a given target) From then on you can update your information in one form and gaurentee that it will be "properly" presented in all your target platforms.
Some of you may start down that tired old line that this is what HTML is for, and that new features like CSS give you this. Well yeah, HTML _was_intended_ for this kind of thing - then the "webdesigners" got their hands on it. At that point you have to resort to half assed hacks like CSS to even attempt to preserve the format independent nature of any SGML.
HTML is OK for the role it has been lead into, but it isn't fullfilling some of the niches people hoped it would because it has been bent to far into another "niche" - the web. Some of the varied devices (in addition to html and VRS mentioned above) that are potential targets include:
(*> our old friend the green screen - no, it is NOT dead!
(*> page readers - enabling tech. for the blind
(*> custom viewers - imagine having the poll as a captive tk/tcl app on your enlightenment docking bar?
(*> translation systems - the I in IBM is never forgotten... imagine the hassles in translating a billion web pages from English to say Hebrew (right to left) or Kanji (top to bottom, and (I think) right to left)
(*> PDAs - for those who think PDAs will consume HTML lay off the frapacinno for a while and get a firm grip on reality. Without trying to much to sound like the Linus sound byte about the Nokia 9000s 'mediocre phone, lousy PDA, miserable web browser' - try reading
But will anyone use it? Well, let's take two case studies of places that COULD have used it... for the first let's look at a major hardware company that lost out on a $250K deal because their web site people had "revamped" their entire corp. web presence to use all the nifty new toys and didn't have the time/resource to update all the old product datasheets... so they dumped them off the servers completely instead: "so they wouldn't clash" with their "consistent face to the consumer".
For a second let's take good ole Hollerith's Analytical Legacy... last spring they decided to change their page design for all external pages
!--Left Navigation here... --
Now that I've written my content I want to go back and reformat it for HTML... but alas I gotta choose either to show html tags (Extrans) or use html tags
Re:This is great for the industry (Score:1)
IBM has been on the XML bandwagon for some time, with their development of xml4j, xml4c (Java and C++ XML parsers) and LotusXSL (a Java XSL implementation). See IBM AlphaWorks [ibm.com] for more info.
Re:HTML x.0 is NOT portable! (Score:2)
A large part of the problem is that authors are focused too much on the visual presentation[1], rather than the semantic meaning of the data being presented.
People forget that denoting something as a list (be it ordered, unordered, or list of definitions) is more important than the list being displayed indented with little swirly bullets next to it.
Remember -- different page renderings are good -- not everyone has the same needs or wants from data presentation.
[1] This is especially silly when there's no guarantee that a page will be rendered visually
A Bad Thing(tm) (Score:1)
XML within XML (Score:1)
One question: XML makes no provisions for security/certification. Will this remain a problem for a lower layer, or will a DTD for this also be designed? Can one nest XML within XML?
XML/XSL and DTD Hell (Score:2)
Maybe this is what IBM has done... created a replacement framework for these teditious steps.
There are many mediators on the Web. (Score:1)
What you're talking about is called a mediator [lfw.org].
I've been doing this for years. In 1995 i came out with Shodouka, which is a mediator that replaces JIS codes with GIF images so anybody can see Japanese text even if they don't have fonts installed. In 1996 i created MINSE [lfw.org], a semantic expression language with a mediator that lets anybody see math expressions directly embedded in web pages. MINSE was special because, like this so-called "transcoder", it would translate the equation into an appropriate form for the browser: use Netscape, and you get nice antialiased GIF images of the equations; use Lynx, and you get a good attempt at ASCII art. It is still the only way to easily put math on the Web that anyone can view.
In 1997 i did Crit [crit.org], which enabled anybody to make public annotations on any Web page for the first time. You might want to check that out too (source code is available). It makes all links bidirectional and allows you to make links from your document to a specific phrase in the target document. As a side benefit it shows you some useful metadata about the document which browsers often hide. Again, since it's a mediator, anyone can view or create annotations using any browser -- people running Lynx can attach annotations to anything too.
The whole idea of "Web middleware" has been around for a long time. I'm pretty sure i was first, but many others have done similar things since then (e.g. Anonymizer, babelfish, and so on). Rohit Khare wrote a nice paper [objs.com] in 1998 summarizing the idea and various applications.
-- ?!ng
Re:More or less (Score:1)
Re:What about this? (Score:1)
Re:Multiple browsers? (Score:1)
Re:Content Enhanced (Score:1)
All the html I write is "content enhanced" and designed to be seen in any browser. I don't have problems with backward compatability because I stay away from using tags that are designed to rigidly control the display on the *user's* browser. That was the whole point to HTML and the browser, wasn't it? So that the user gets to control the display of the content? I absolutely stay away from any tag that is browser or platform specific. I want the greatest potential audience for my *content*.
I guess I'm old guard. In the earliest days of the internet (as some will remember), your access could be terminated for trying to sell something to another person. Just think how much bandwidth we would have available now if there was no spam or "get rich quick...", and worse, chain email.
Newer is not necessarily better. I have conveyed my message here and didn't use a single HTML tag.
This has not been intended as flamebait, just and expression of *my* opinion.
Russ
Re:HTML x.0 is NOT portable! (Score:1)
Microsoft and Netscape added those "evil" tags on demand. Companies wanted a medium to produce their pretty sites in. Nothing else existed, so they used the web. Of course, the "better" course of action would've been to design a medium for this purpose... probably using vector-based graphics and exact specification of display. But, at the moment, HTML is a layout language -- just because Tim Berners-Lee didn't intend or expect it to be used as such doesn't mean it's illegal to use it that way.
UNIX wasn't designed as a gaming platform, or to be used as a home system.. who cares?!? What can it be used for?
Many advertisers don't care about the tiny demographic of blind users, or users without the latest browser. The percentage is insignificant to the advertiser. These advertisers choose to make decisions like that, based on projected return. Many of us make our living building these sites. However, although we tend to prefer building good quality sites, sometimes we can't. The budget doesn't allow. This makes us restrict our sites to the latest browsers, or one particular branch of browsers (like Netscape).
Now what can we do about it? Fight a war between those who want the twiddles and those who don't? OR, how about building a new medium that allows both groups to co-exist? The idea of translating a bad format for compatibility reeks. GIGO.
We've got all of this noise about XML and XSL, but how much of this is designed by "webdesigners" for "webdesigners"? The concept of "building the right product, rather than building the product right" seems to be completely lost on the W3C. Until they realise that their output isn't necessarily used they way they intended, and then sit down and create something generic, flexible and powerful, we're screwed, and we're going to have this gibbering tower of babel in perpetuity.
In my opinion, the idea of separating content from presentation is a sound one. However, what won't work is disregarding the importance of presentation over content. Sometimes presentation is more important than content: e.g. for persuading PHBs. It's a sad, odious fact that PHBs control the $$$s. The $$$s pay our bills.
It's all really about target audiences... the advertisers have target audiences (and I'm afraid geeks who don't want to look at pretty sites are the minority at the moment), and the technologies have target audiences. Things like Flash have particular client target audiences, and those client target audiences have their own consumer target audiences. Use the right tool for the right job. Unfortunately, HTML is the wrong tool for any job right now. What's even worse is that there's no alternative.
(Apologies for the rambling... I'm sure this post has some valuable points hidden inside!)
Re:webmethods (Score:2)
Duh (Score:2)
Re:XML within XML (Score:1)
webmethods (Score:2)
Re:HTML x.0 is NOT portable! (Score:1)
KV
Re:Yes and No (Score:1)
Joe User loves pretty eye candy (usually to the expense of content). So an app that's based on HTML is easy to make eye candy for, and thus "easier" for the user. Plus, it makes things a bit easier for the back-end designer - it's easy to refresh the eye candy to the "now" look than to modify a program to do the same (just a few bits of HTML here and there...).
Unfortunatly, lots of eye candy, no content. What's worse is javascript...
It would also be a nice place... (Score:1)
This is great for the industry (Score:1)
But hey, that's not the point. If I am not mistaken, the thing builds on the philosophy of XML (if not on XML itself). It's about time that more XML complient tools appear on the market, and hopefully these tools will be extendable, and can interact via XML. Seeing that IBM not only jumps onto the XML bandwagon (which is not surprising, as they used SGML for a long time), but also starts to deliver the tools, is a great thing!
Yes and No (Score:1)
Static web sites are pretty rare these days.
Kludge (Score:1)
HTML 4.0 (Score:1)
As far as I know, HTML 4.0 is pretty precise. The page can be rendered only in one way correctly.
With CSS, it's possible to create layouts, which render OK on older browsers.
I'm not really in the web development business, but I try to create pages, that render perfectly on standard compilant browsers and OK on the rest.
In my opinion, this is the only solution. If customers don't see a difference between browser, why should they upgrade to a standards compilant one?
Re:HTML 4.0 (Score:1)
Excuse for my stupidness.
This is not for fixing javascript & html... (Score:3)
HTML x.0 is NOT portable! (Score:4)
The reason is quite simple: people don't upgrade their browsers. Look at www.gnu.org [gnu.org] for pete's sake! That page is specifically designed to be Lynx 2.0 compatible because use of "novelty tags" like (included in the HTML 3.2 spec) will break those clients. As a result, the page is fairly ugly.
Choose an involved combination of "standard" tags and it's a fairly safe bet that Netscape 3.0 will display it differently than Opera, which will display it differently than IE4, which will display it differently than Netscape 4.5, etc, etc.
The human is the bottleneck. People don't see a powerful incentive to upgrade their browsers, so they don't. Hence webdesigners like Rob Malda spend weeks of headache time on making their pages BassAckwards 2.7 compliant.
This transcoder, if it works, will really be a boon.
-konstant
missing the point (Score:1)
This makes me feel nauseous. Please just write to spec, people! Don't encourage the fragmentation. Web Standards Now! [webstandars.org]
I believe I've found a /. bug... (Score:1)
When I previewed that previous comment, it looked just great... but when I posted it, I guess the lt and gt entities got re-parsed :-) Here is paragraph #3 without entities:
Sorry about that, folks!
Not just compliance, but well-formedness (Score:2)
You're on the right track if you're saying that this technology wouldn't even be needed if we could convince people to stick to standards. But you also need one other thing -- well-formedness.
You can stick to HTML 4.0 [w3.org], even the "Strict" dialect (which I encourage everyone to do!), and still have pages that completely blow up when pulled up outside one of the Big Two. On the website that I accidentally deleted some time ago, I had struggled for some time to not just reach for HTML 4.0 compliance but for well-formedness.
It meant using elements for what they were intended for. It meant never using a table for anything other than tabular data. It meant using when I wanted emphasis and it meant using when I wanted to mark up code fragments (mind you, I'm stuck using right now because /.'s HTML filter doesn't permit !) It took some fiddling.
But I turned out with a set of pages that were not only easier to maintain, but CSS applied very cleanly to them, making them pretty and consistent-looking, and they rendered perfectly on any hand-held or speech-synthesizing device you could throw at me. My information was useful to everyone, and that was the best high of all.
I strongly encourage everyone to pursue well-formedness. The more important stuff that is well-formed instead of hacked, the better browsers we'll get, too!
Useful (Score:2)
Also, such middleware for multiple deployability will allow those who are delivering fat content with thin design to do so very easily by coding the content in XML and then transponding...
Nothing earth-shattering here, but useful...
We don't need this! (Score:1)
All we really need is one (or both) of two things:
Something like this IBM thing is overkill. My guess it will be used by cell phone companies on their proxy servers and otherwise die on the vine. Or, at least, I hope so.
Jack
Sounds ok (Score:2)
Imagine in the future, two web ports, 80 and 81, 81 for low bandwidth and handheld devices
More or less (Score:1)
Re:Multiple browsers? (Score:1)
Do we have to keep IBM's implementation? (Score:2)
Will the framework be available as an API (or preferably an open standard), so that in the future programs (RealEncoder, generator, etc..) could be written to automatically include information in their output for specific formats (i.e. text-only, audio-only, mixed).
This could be a huge step forward, especially in the days of PCS-based browsers, but can it be done in such a way as to not lock a company down to a specific vendor??
What about this? (Score:1)