Journal davidwr's Journal: iso/tar/zip/whatever on-the-fly builder 1
(c) 2005 davidwr of slashdot
iso/tar/zip/whatever builder:
Purpose:
Allow web sites to "store" many slightly-different customized archives or CD-images
without storing the actual images on the server.
Patent potential:
This is patently obvious and took less than an hour to cook up a blueprint.
Besides, this or something very similar has almost certainly already been done.
No patent potential.
Commercial applications:
No reason commercial environments cannot benefit from these ideas.
Variations:
Variations are endless.
Input:
Type of output file
Source files and instructions to get/unpack source files
List of files to package
Location of destination file
block-level post-compression command and blocksize
Output:
Output file broken down into pieces, with instructions on how to get each piece OR contents
of the piece.
Example output:
controlfile.txt:
#===============
DATA:
#startoffset-endoffset - action which sends output to stdout
000000-00ffff - include directory.header
010000-012345 - interpret file1.txt.instructions
012346-02468a - interpret file2.txt.instructions
02468b-02468f - bytefill 00
024690-0246a0 - include checksum.txt
POSTPROCESSING:
compression command '%1 | blocklevelcompressor' # %1 is a block of data
compression index compressionindex.txt
compressionindex.txt:
========================
#If the user starts any part of a compressed block,
#rebuild the entire compressed block on hte fly and send only
#the parts that are needed.
#
#uncompressed block start-uncompressed block end - compressedfilestart-compressedfileend
000000-00ffff - 000000-007352
010000-01ffff - 007353-00f8ab
020000-0246a0 - 00f8ac-012fa0
directory.header:
================
[binary data representing the header to the file]
file1.txt.instructions:
======================
#!/bin/sh
##shows file-by-file compression
#tar -xf archive.tar file1.txt | gzip {gzip output to stdout}
## do not use... |gzip {output to anything but stdout}
file2.txt.instructions
======================
#!/bin/sh
#cat file2.txt
checksum.txt:
============
[checksum goes here]
****end example output****
End-user action:
User goes to web site, runs a configuration program to get only
certain files, then is given the url to his tarball, iso, or what-have-you.
When the user asks for {ftp, http}://somesite/hiscustomfile the
file is generated on the fly using the controlfile.txt file.
It can even be retrieved "in the middle" a la ftp-resume using the
offsets in compressionindex.txt first then those in controlfile.txt
NOT SUITABLE FOR:
This is not suitable for compressed files that rely on the entire file being compressed in a
non-block fashion, i.e. where the previous parts of the file affect the next part of the
file.
Improvements:
Instead of a bunch of loose text files, controlfile.txt,
compressionindex.txt, and the other files can themselves
be part of an organized file. The old-style Microsoft
file with [section headers] seems obvious, as does a.tar, .zip,
or.tgz file. XML also seems obvious and very "webbish."
See related slashdot comment (Score:1)