Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Encryption Security

Fast Random Number Generation For Encrypted FS? 24

Signal 11 asks: "I've been reviewing different filesystem-level encryption schemes and so far, I have found only one solution - applying a kernel patch and using the loopback device. The problem is in generating large amounts of random data to seed the initial filesystem - it takes about 16 hours to create 20GB of pseudo-random data from /dev/urandom. Is there a faster (and equally secure) way to generate large amounts of random data?" Any clues? I'd figure a kernel patch to turbocharge /dev/urandom (while not losing too much security) might be another route to take.
This discussion has been archived. No new comments can be posted.

Fast Random Number Generation for Encrypted FS?

Comments Filter:
  • Hrmm, try multiple file writes to the same file?

    Like...

    cat /dev/urandom >> /home/signal11/bigfile &

    cat /dev/urandom >> /home/signal11/bigfile &

    followd by a few more...

    Maybe that would fill it up faster? I'd try it out for ya, but I only have about 3 gig left and I'm not about to go munching on it :-) Or maybe I'm just dumb and missed the point of the question?

  • by nweaver ( 113078 ) on Friday June 23, 2000 @10:01PM (#979422) Homepage

    One question to ask is how much randomness do you really need? Depending on the application, you either really have no choice but to sit down and let the system's mix of pseudo random number generation and physcial system measurements generate enough bits. IIRC, linux systems use a mix of measurements (disk timing, keystrokes, mouse behavior) and a pseudo random number generator to create these bits. This makes them pretty GOOD random bits, but slow to gather.

    But often you can get away with a GOOD pseudo random number generator, that is well seeded, to generate the mass of bits you need. You only actually need, say, 256 truely random bits to get the job done.

    In which case, the easiest thing to do is grab a bunch of 8 sided dice, and roll them a bunch of times to seed your favorite cryptographically secure PRNG, and then use that to get all the pseudorandom bits you want. (8 sided dice are nice, because each roll generates 3 bits of entropy).

    For information on PRNGs, go look through Applied Cryptography or a similar text.


    Nicholas C Weaver
    nweaver@cs.berkeley.edu

  • by Daffy Duck ( 17350 ) on Saturday June 24, 2000 @12:27AM (#979423) Homepage
    My main question is: what is the reason for filling the file with random data before creating the filesystem?

    And the reason I ask is that I've just installed the international patch and started to mess around with the loopback filesystem, and I've noticed that even after the random initialization the encrypted data isn't completely random-looking.

    As a simple test, I made a 1MB encrypted filesystem using the serpent cipher, and inside that I created a 900K file full of a repeating 8-byte sequence. Then I unmounted the filesystem.

    The 1MB container file is compressible down to 320K by gzip. This doesn't seem right to me - shouldn't there be almost no redundancy in the encrypted file? Further investigation shows that about 930 of the 1024 1K blocks in the container occur in repeated groups of four blocks. The data inside each block are certainly scrambled beyond human recognition, but isn't this exposure of the cleartext's redundancy a sign of something wrong?

    Is it my fault? I followed the encrypted loopback HOWTO to the letter. Anyone know what's up?

  • On a related note, I was wondering if anyone is aware of a fractal algorithm (non-repeating, chaotic) that requires only integer math and truncation? I've got an idea for an encryption algorithm, and the basis of the algorithm is that it needs a reasonably quick fractal algorithm...
  • by drig ( 5119 ) on Saturday June 24, 2000 @06:56AM (#979425) Homepage Journal
    Linux's random device uses a SHA hash to mix the seed and produce decent random numbers. SHA may be a bit more than you need. If you were to change linux/drivers/char/random.c to use MD2 instead of SHA1, you'd probably get a noticeable speed up.
  • One big issue with cryptography is the need to reduce patterns in the ciphertext. One prime qualty of a fractal algorithm is that it makes repeated patterns. As such, most fractals would not be directly applicable to cryptography. Of course, if there was some way of hiding the patterns (maybe by concentrating on the borderland areas), this wouldn't apply.
  • I don't have my copy of Applied Cryptography handy, so take all the specifics here with a grain of salt. I think that such behavior is not the fault of the loopback system or the cipher itself, but is caused by the chaining mode being used. I doubt it uses ECB, but I would guess that it uses CBC or CFB which would produce this behavior as well. CBC and CFB xor the plaintext (ciphertext?) of a block with the plaintext or ciphertext of the previous block prior to encryption in order to try to disguise the fact that two blocks have the same plaintext. Unfortunately, if you have a long string of blocks that are the same (since your 8-byte sequence is exactly the same as the block size) the system will stabilize quickly to repeating the same ciphertext block. This can probably be fixed by using one of the more essoteric chaining modes, but the problem with most of these (and CBC) is that they don't let you do the decryption in parallel (like on MMX), so you have to take the good with the bad.
  • Maybe you should look at hardware designed to deliver random numbers. I just found this page [auckland.ac.nz] which has some listed (use Find to jump down to the Random Numbers section).

    I think many of these are serial port attachments, obviously a lot slower than what /dev/urandom is producing so dumping straight to disk isn't an option, but I think what such hardware gets you is a reliable, high-quality source of random bits to seed a pseudo-random process. Looking at the man page for urandom on OpenBSD (I'm assuming the one in Linux is no different), it doesn't check the entropy pool for quality so without a high-quality source of randomness, in 20GB you're entropy pool's quality is pretty likely to run low, relying just on system activity.

    /dev/urandom isn't *that* slow, producing 364KB/sec (unless you meant 20Gb, then it's only about 45KB/sec). I don't know how cryptographically sound it would be but you could consider using a CD-ROM of random data (see the link above) as a starting point. A CD-ROM drive should be able to deliver a lot more KB/sec for /dev/urandom or something else to process to get your 20GB.

  • I think that for really strong security, government-proof stuff, I'd have more confidence in a hardware generator of real random numbers. But this question seems to be addressing the speed of the generation; I would suspect that as the bandwidth generated goes up, those devices tend to produce less random output.

    Rating a random number generator requires careful analysis to check that it is decent, and I'm don't think that you can prove that a string of bits is random. It is basically the same problem as proving an encryption system unbreakable.

    Check out this guy's page: http://webnz.com/robert/index.html [webnz.com] and in particular this section: http://webnz.com/robert/true_rng.html [webnz.com] I ran accross Robert Davies' site a long time ago, it may have actually been something posted to slashdot in the early days.

  • audio-entropyd [mindrot.org]

    This is a small program to reseed the Linux kernel random number generator with data from soundcard.

    Audio is ready periodically from a stereo soundcard, the difference is taken between the left and right channels, the difference is hashed and credited to the KRNG.

    Using the difference between the left and the right channels should eliminate some external signals (e.g. 50/60hz power cycle).
    --------
    "I already have all the latest software."

  • Well, how fast can you grab input from video boards? A 1/4 screen 320*240 stream of 8-bit bytes is 76KB. Feed it an often-changing channel or "snow" on an unused channel. Or several channels on several video cards, mixed together.

    This is of course fast and merely somewhat random. There are a lot of similarities between consecutive frames. An unused channel is also particularly susceptible to intential attack by a nearby transmitter.

    For more randomness, this can be run through further randomizing algorithms. The obvious example is the SGI LavaRand [sgi.com].

  • The 1MB container file is compressible down to 320K by gzip. This doesn't seem right to me - shouldn't there be almost no redundancy in the encrypted file? Further investigation shows that about 930 of the 1024 1K blocks in the container occur in repeated groups of four blocks. The data inside each block are certainly scrambled beyond human recognition, but isn't this exposure of the cleartext's redundancy a sign of something wrong?
    It varies on the scheme used. It appears to me that each of the "virtual sectors" of your encrypted filesystem is being encrypted separately with one of four keys - and of course, identical data encrypted with identical keys will produce identical cyphertext!
    What you actually want to happen is for each VSector to be "seeded" with a random value (probably as part of the key) to prevent this - but in practice, having large numbers of sectors with identical data in them isn't likely to happen, so the additional overhead in normal use isn't worth the special-case.
    It may be worth you bearing in mind though, if you ever need to hide *how much data* you have - seed unused VSectors with pseudorandom or straight counter values, and make sure any filestructures have sufficient internal structure(pointers and so forth) that identical blocks of plaintext will come out as different file-data
    --
  • I think I read on Intel's site that the PIII and Celeron2 have a builtin hardware random number generator. That would probalby be the way to go.
    But now that I surfer around their site, I might have been thinking about this: http://developer.intel.com/design/security/rng/rng .htm

    Oh well, you be the judge.

    Peace out.
  • Chaotic algorithms (one type of fractal) are non-repeating, causal, and non-predictable. The idea is that a minute change in the input will result in a massive change in the output. The causality, of course, allows the results to be repeated, allowing for two-way encryption.

    Chaotic algorithms, if properly chose, should result in excellent hashing algorithms. Of course, most chaotic algorithms are not truly chaotic, but merely approach chaos. Integer chaotic algorithms, I suspect, will have even more frequent repetition than floating point. That's not the point. The point is to get things Good Enough(tm).

    I recall reading about a fractal algorithm about 2 years ago on slashdot where a Japanese researcher had discovered a fractal algorithm that doesn't repeat until 2^895 repetitions. This would make this algorithm sufficient for a hashing algorithm. Combine this with some of the techniques used in 3*DES, and you should be able to get a reasonably secure system with a key length that approaches infinity. Of course, such a system would be illegal to export from the US (I'm a US citizen, I'm afraid). However, a publication including said encryption algorithm should be covered by the 1st Amendment to the Consitution ;-)

    At any rate, your concept of a fractal algorithm repeating is, quite simply, wrong. There are very simple patterns that make fractal algorithms possible, but the fact of the matter is they are causal, non-predictable algorithms (at least, using any conventional math that I am aware of).

    If anyone has any further interest in this, let me know. I'll need some non-US help implementing any algorithms. And, of course, their programming will have to be independent of any help from me (merely helping a foreigner understand encryption outside the confines of the 1st Amendment constitutes a crime equivalent to the trade of nuclear arms...).
  • I am an InfoSec professional, but this is not professional advice. (Gads, I hate lawsuit-happy cultures like ours...)

    Bruce Schneier's YARROW is the only PRNG I'm aware of which has actually gone through formal cryptanalysis. I'm not overly fond of YARROW--it's extremely slow with its 3DES/SHA-1 core--but the fact that it's been cryptanalyzed makes it much more trusted than almost any other PRNG.

    Substituting a fast cipher like Blowfish or an AES candidate, and replacing the hash algo with MD5 or somesuch, would result in a much faster YARROW. Unfortunately, this also invalidates the cryptanalysis, since the modified version wouldn't have undergone the same level of cryptanalysis: still, if I remember correctly, YARROW is intended to be both hash- and cipher-agnostic.
  • I think the idea was to build the encryption scheme first. I began a search for a suitable algorithm about 4 years ago. To date, the only algorithms I found that were suitable required floating point math. I didn't have the collective mind of slashdot behind me then. Now I do. So, if you have a *suggestion*, please let me know if you are aware of any integer fractal/chaos algorithms. No more, no less. I'm not trying to claim that I have some kick ass encryption scheme. I have an idea. I'd like to further the idea to a prototype. Until then, I still search.

    Once done, I will turn over my work to the world. I don't want to hoard it. I merely want strong encryption that is fast enough to handle video streams. I also want it to be far more secure than any of the conventional 2-way hashing algorithms. Even the most robust of the 2-way hashing algorithms can be defeated with a finite amount of computing power. Use a key that approaches infinite lenght and you get a far more secure algorithm. This is the idea behind 3-des (as opposed to des, at least). Increased key length == more security. That said, a properly chosen fractal algorithm should approach randomness (or, at least, it should come close to being non-predictable). The randomness should allow for a key length that is more secure. The fractal part should help take care of the non-repeating part (that is, the predictability part). If it's a one-time pad, then all the better. The causal part of the fractal algorithm should allow both encryption and decryption, as both sides can generate the key by using the exact same inputs to the fractal algorithm.

    These aspects are well known. The infinite-length, one-time pad is sort of the Panacea of encryption. Since the US prevents the export of such algorithms, research in this area is somewhat stunted. I have no commercial interest in this algorithm. Thus, I thought I might be able to help out in a field where business (American business, anyway) cannot.

    Now, I repeat my question. Do any of you know of an integer-based fractal algorithm (that is, a fractal algorithm that doesn't include the division operator, and preferable doesn't include the multiplication operator with non-integer numbers)?
  • "Rating a random number generator requires careful analysis to check that it is decent, and I'm don't think that you can prove that a string of bits is random. It is basically the same problem as proving an encryption system unbreakable"

    Isn't any string of digits from an irrational, transandental constant, such as PI, provably random?
  • Hey!

    I may be missing something here, but what good encryption system would need 20GB of random data? What is that, like half your hard disk space? if you want to do a sort of XOR job, then encrypting the random data, you'd need that much, but 20GB? that's a lot of hard disk space.

    I'd go for something different if I were you. I like the look of TCFS but I'm fairly new to Linux so I don't think I'll be using it just yet. Here's an ectract from the security HOWTO:

    "6.10. CFS - Cryptographic File System and TCFS - Transparent Cryptographic File System

    CFS is a way of encrypting entire directory trees and allowing users to store encrypted files on them. It uses an NFS server running on the local machine. RPMS are available at http://www.zedz.net/redhat/, and more information on how it all works is at ftp://ftp.research.att.com/dist/mab/.

    TCFS improves on CFS by adding more integration with the file system, so that it's transparent to users that the file system that is encrypted. More information at: http://edu-gw.dia.unisa.it/tcfs/.

    It also need not be used on entire filesystems. It works on directory trees as well."

    You could try either of them. As I may have said, I like the look of TCFS. The website in the qoute may have been removed, if so, I know it's availiable at http://tcfs.dia.unisa.it/ [unisa.it].

    The Linux Journal has a good article [linuxjournal.com] that you could also look at.

    Just my $0.02

    Michael Tandy

  • Hey!

    If you want a good source of random numbers, try downloading DIEHARD [fsu.edu] for Linux.

    There are also some RNG links at SecurityFocus [securityfocus.com].

    My copy of diehard includes 'makewhat.exe', a FAST RNG. It makes 11MB chunks of data in... like... less than a second with the faster generators. Check it you if you want.

    Michael Tandy

  • Well, it is random in that the digits are distributed evenly. But it is not random in that if you know the preceeding digits, you can easily predict the next one. There are different definitions of "random" involved here.

    If you took the first N digits of PI and used them to encrypt your harddisk, say by simply xor'ing them, it would be fairly easy to break the encryption and read the disk. So you need a string of digits which somehow has some more random property than the digits of PI -- I like to think of it in terms of a minimum description length; you can write a very short program to generate the digits of PI, but you can't for a truely random string of digits.
  • You know the more I think about it, the more the CD-ROM idea seems like a bad idea. No matter how you manipulated the random data off the CD, having a copy of that CD would still be a huge benefit to a potential cracker. I think those CDs of random data are only good for testing the randomness of of sets of data.
  • 1) I'm not sure about the serpent cipher, but DES (and others) use an "initial vector" dependent upon the disk blocks relative (or absolute) position. Identical plaintext blocks will *not* give you identical ciphertext blocks, so something is wrong. (If serpent doesn't have an IV key, then it's an inappropriate cipher to use in this application.)

    A secondary factor, already mentioned, is that the cipher should be run in CBC mode within the disk block. That shouldn't be a problem since any block cipher can be run in CBC mode - it's coded at a higher abstraction level.

    2) the "randomness" in a well-encrypted file is not due to the encryption - it's due to the compression that occurs before the encryption. Non-randomness is a problem with byte-wide encryption (including variants like Viriege(sp?) and Playfair ciphers), but it's far harder to attribute any significance to non-randomness in a 8- or 16-byte cipher running in CBC mode.

    3) the reason for preloading the disk with random data is simple: you eliminate the implicit "known-text" problem that occurs with empty (and nulled) sectors. With random data the only known-text on the entire encrypted disk is the superblock, plus *maybe* a file or two at an unknown location.
  • You use that random data to eliminate most of the "known plaintext" on the disk. That's the same reason you need to use a real cryptographic PRNG, not a weak one which is easily cracked.

    On the one hand, this doesn't offer you *that* much more protection - I would rather use a strong algorithm and zero-initialized blocks than a weak algorithm and random blocks. On the other hand, if you feel it's appropriate to use an encrypted FS in the first place you shouldn't begrudge small improvements with definite benefits.
  • Using a cheap BT848 you can grab and write 30 fps at that resolution and depth, so you could potentially get 2.2 megs a second.. For a 20G volume, that's just over two and a half hours..

    By a simple increase of resolution to the break-even point (512x384 at 16bpp I believe is where the older Brooktree flatline) you could easily saturate any IDE device, and mow that sucker in under an hour..

    A queer side effect would be that any super-hyper-sharp NSA cracker might be able to make out a snowy episode of Dragonball-Z through your data, and know the key off the bat!!

    Oh, and does anyone else get Johnny Mnemonic deja vu?

1 + 1 = 3, for large values of 1.

Working...