Feature:Whatever Happened to ANDF?

Posted by CmdrTaco on Tuesday October 20, 1998 @07:06AM from the stuff-to-mess-with dept.

Bruce Stephens has written in with a writeup on something he considers pretty cool- perhaps you'll agree. It's about something you may not have heard of:ANDF. Looks interesting for the hardcore.

Whatever happened to ANDF?

Once upon a time, there was a neat technology called ANDF: Architecture Neutral Distribution Format. It even merits a place in the GNU Project's task list: provide a decompiler for ANDF. (I'm not sure whether this item is still there; it's probably not high on the list of tasks!)

I haven't seen it mentioned much in years, however.

How does ANDF work?

ANDF is a format: it's a flattened representation of the abstract syntax tree for a program. Programs are compiled using a number of tools:

A producer, which produces target-independent ANDF from source
A linker, which links together some target-independent ANDF capsules
An installer, which combines target-independent capsules and target-specific ones, and knows how to produce target-specific code.

Much of 3 can actually be fairly portable. Many optimizations can be cast as portable manipulations of an abstract syntax tree. What's more target-specific is which of these manipulations you use. More than that; ANDF can represent a range of levels of detail, so even quite low-level things can be done using code that is shared between targets.

So, if you want to draw a line between developers (who compile code to produce binaries) and users (who just want to use the code), you have a range of places where you can put it. Somewhere between 2 and 3 seems obvious, but it might fit somewhere inside 3, depending on what optimizations the developer wants to make, and what kinds can sensibly be left to the user to do on installation.

APIs

To make all this properly work, you need to abstract APIs. A C program which #included stdio.h would tend to include lots of details about the size of FILE (and probably internal details, assuming some functions were macros which delved inside FILE). That's going to be hard to install on a completely different target.

So, such APIs are represented explicitly. #include stdio.h makes available tokens like FILE and so on, with well-defined properties, but with undefined implementation. When installing, the installer provides implementations. Tokens are nodes in the abstract syntax tree (with subtrees as arguments, possibly), and these are substituted at installation time with their implementation on the specific target.

And, obviously, the installer may be able to perform further optimizations on the subtituted tree (since it's still a well-defined abstract syntax tree).

The potential benefits

With suitable coordination and standardisation, binaries could be distributed which users could install wherever they wanted. The substitution of macros for the various paths (prefix and so on) could be deferred, to be provided by a simple capsule which gets linked on installation.

Different implementations of some things could be provided. For example, programs could use MMX style operations (using a suitably defined API), and users with suitable processors could install with a capsule which uses the MMX features of their processor. Users without such processors wouldn't lose anything, since they'd install with software implementations of the API. These would not (necessarily) be function-calls, they'd be proper macro-like things, so further optimizations could be performed. Thus, there'd probably be no disadvantage for people without MMX processors compared to native compilation (with conditional macros, say).

Portability could be greatly improved; people would program to well-defined APIs, and users with those APIs available would be able to use the results. They'd even be able to get and use the results in binary (without having to compile it themselves). In a different operating system. On a different processor (with different endianness, with a different word-size, even).

An excuse to hide source

Wouldn't this just encourage concealing source? After all, the Windows world has lots of "freely available" programs. The catch is that they're binary only, so you can't improve them, or learn from them.

Possibly. But there are good, pragmatic reasons for wanting to share source, and these would remain. Apache would still have been developed, and would still be being distributed in source, even if we'd had this technology years ago. And people will sell binaries without source, regardless of the obstacles you put in their way; this technology would simply change things a little.

P-code, JVM, ...

The format isn't much like previous universal intermediate languages. Largely, they were abstractions of machines; this is an abstraction of programs, or of programming languages. i.e., it retains all the interesting information that you need for performing optimizations; it just loses details like variable names.

Vaporware

"This could be cool. But it doesn't exist/costs money/is proprietary and closed source."

Judge for yourself: TenDRA. I'm not a lawyer, but that copyright looks pretty Open Source(TM) to me. (There are probably patents covering it, but who can tell, with software patents?)

You can download source, documentation, tools
APIs for ISO C, POSIX, XPG3/4, X11R5 and probably a few I've forgotten
Producers for C and C++; installers are claimed to be good for Intel ELF and SPARC, and a number of others (possibly of lesser quality) are included; performance seems acceptable compared with egcs on my Linux box

What it doesn't include is a C++ library. (A port of libstdc++ would be nice, if anybody wants something to do.) The C++ looks pretty complete in other respects, however, although I'm not a C++ expert, either. The whole thing looks cleanly written. The C checker is worth the download on its own (IMHO).

The C and C++ producers can write out their symbol tables, in a documented format. Such information would be helpful when writing a source-navigator (!), which is what I'm (slowly) working on at the moment. (I'm writing a Perl module which will read in and parse such files, which people could then process in wacky ways. (My plan is to write the information to a relational database like PostgreSQL, but I'm a little way off that.))

EGCS

I just don't see where this technology fits, given that there's already egcs, which produces pretty good code, on more targets than just about anything else.

The hacker in me feels it darned well ought to fit somewhere, though. This is cool technology; it deserves to be played with.

Feature:Whatever Happened to ANDF?

Whatever happened to ANDF?

How does ANDF work?

APIs

The potential benefits

An excuse to hide source

P-code, JVM, ...

Vaporware

EGCS

Feature:Whatever Happened to ANDF? More Login

Feature:Whatever Happened to ANDF?

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot