F4Codec - Slashdot User

Journal Journal: A New Data Encoding Scheme

Journal by F4Codec on Friday March 14, 2003 @08:10AM

A New Data Encoding Scheme

The computer industry from its very inception has used the binary system to represent all types of information. This system has much to recommend it's use. It is simple to construct hardware based upon this system and all the normal arithmetic and logical operations can be performed upon it.

However, it is the authors opinion that this system is not as optimal as it could be and that some rather simple changes to the method of encoding data can bring about large increases in storage, communications and reliability.

The Idea

The basic ideas stem from the seemingly simple idea of replacing the binary method of encoding by a different scheme. In essence, all that is required is to take the value normally represented by a 1 or true state, and replace this with two 0's or false states. e.g.

decimal binary new scheme 9 1 0 0 1 00 0 0 00 13 1 1 0 1 00 00 0 00

At first sight there doesn't seem to be much of an advantage in this scheme of encoding but as the rest of this paper will attempt to prove, the benefits are enormous when applied in the proper way.

This method, whilst not binary is also not strictly unary. It has therefore been christened sesquinary (from sesqui - one and a half).

Applications

File Storage.

An intelligent operating system can make great use of the encoding. As an example, a file need not be stored as the complete set of bits. All that is required is for the operating system to keep count of the ``number'' of zero's in the file. In the case of the UNIX operating system this would mean that the entire disc would consist of inodes. Each inode, instead of referencing blocks would keep a count of the number of zeros. For large files, double and triple indirection could be applied - see the section below on compression. Obviously, for small files, the single indirection is more cost-effective but with larger files it would pay to move towards more indirection as a saving of space. A flag in the inode could keep count of the number of indirections currently performed.

This scheme does have some overhead in the updating of random access files, in that the operating system must first ``unpack'' the file, perform the update, and then repack the file. This could probably be done in virtual memory for most operations though.

Networking.

In networking, this method really comes into it's own. To begin with, there are practically no bandwidth limitations. The problems inherent in normal communication over serial and phone lines stems from the ability to detect the transitions between two states. Once this transition is removed, and the data is in effect transitionless, the bandwidth of the circuit is only reliant on the speed with which zeros can be pumped down the line by the hardware (and the rate at which they can be received of course).

Another advantage comes in the standard ethernet environment. Normally an ethernet transceiver must wait for a clear slot to arrive, transmit the packet and detect if a collision occurred, if so it must retry. With the all zero encoding method transmission can take place at any point, there is in effect nothing on the ethernet that can scramble the signal as all hosts are transmitting zeros.

Compression

As hinted at above the possibilities for compression are fantastic. You can forget Huffman encoding and Lempel-Ziv can take a walk! The compression techniques can reduce any amount of data to 1 number, although that number may be larger than the convenient word size of a given architecture. The basic algorithm is outlined below.

while (length(data) > 1) { data = count_zeros(data); iteration ++; } return iteration;

This can be also be changed to do essentially the above but in N steps for large files.

Hardware

It is expected that there may be some implementation problems associated with the hardware of this device. However, the benefits appear to outweigh the drawbacks in many ways. To begin with, the memory using this technique should be simple. There is no need to invert bits or to even sense the bits - they should all be zero anyway. Memory failure can be detected very easily, no need for complex CRC checks - any 1 bits are obviously due to failing memory.

Another advantage is that all memory is effectively permanent, as there is no state to be saved. This means computers built using this model should be unaffected by power-outs and be impervious to crashes.

Encryption

This scheme also seems to lend itself to data encryption. The details have not been fully worked out and may appear in a second paper once the decryption algorithms have been straightened out.

Parallelism and Data-flow

Again, this method has more advantages for parallel hardware. Shared memory is particularly easy to implement for the same reasons that the ethernet is easy - effectively there are no changes in memory state so collisions can't happen (unless defective memory is present).

Implementation

There have been some doubts raised about the hardware realisation of this technique, but in general this can probably be attributed either to the resistance to change generally found, or by manufacturers protecting their own interests. The vast benefits that this method seems to have though should mean that once it is taken up it will clean up in the computer industry.

Slashdot Top Deals