Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
User Journal

Journal davidwr's Journal: Labeling unused disk space on readiness for use

Goal:
=====
=====

Create a framework so any newly-developed filesystem can have a user-tunable and user-extensible mechanism for handling deleted files and deallocated blocks.

Problem:
=======
=======
Filesystems allocate new disk from the free disk pool based on factors OTHER than the free disk space's "readiness" to be re-used. Filesystems tend optimize for quickly locating available space or read-write performance after the space is assigned to a particular file.

In some cases, you want to preserve a deleted block until certain actions can be taken. This may be to aide in file-recovery, or to scrub a block multiple times before using it for real data.

The solution:
============
============

Tag deleted blocks with the following information:
=================================================
Arbitrary information added by a deleted-block handler (DBH), including priority level assigned by the DBH.
The arbirary information includes information needed to help determine how "valuable" the data is, possibly including the time of deletion, the userid of the deleter, the process name of the deleter, the previous owner of the file, the previous inode number of the file, the block-offset into the file, and other information. Typically it will just be the time of deletion.
The priority level DBH_CURRENT_PRIORITY will range from 0=DBH_UNPROCESSED to MAXPRIORITY=DBH_FULLYPROCESSED, with higher-priority blocks getting preference to lower-priority blocks during allocation.

The filesystem itself will record the following parameters:
==========================================================
DBH_PRIORITY_HARD_CUTOFF = n >= 0
DBH_PRIORITY_SOFT_CUTOFF = n >= DBH_PRIORITY_SOFT_CUTOFF
DBH_SOFT_CUTOFF_ACTION = {skip, fix}

Tunable while mounted:
=====================
A filesystem's DBH_PRIORITY_HARD_CUTOFF, DBH_PRIORITY_SOFT_CUTOFF, and DBH_SOFT_CUTOFF_ACTION are all tunable while a FS is mounted. Likewise, the DBH routine itself can be replaced while the system is mounted. Whether mounted or unmounted, changing values can have side-effects, so it is recommended that any such change be carefully controlled to prevent disaster. One way to do this is to raise lower the cutuff priorities to 0, another to raise the DBH_CURRENT_PRIORITY of all existing deleted blocks to above the soft cutoff. More sophisticated means would examine each deleted block on a block-by-block basis and make an intelligent decision. This takes time and is not recommended on anotherwise-busy system.

Discussion:
==========

Deleted blocks whose DBH_CURRENT_PRIORITY is less than DBH_PRIORITY_HARD_CUTOFF will be unavailable for use by non-privilaged users. If the only blocks avaible are below DBH_HARD_CUTOFF then call the DBH to perform additional cleanup.

Deleted blocks whose DBH_CURRENT_PRIORITY is between DBH_HARD_CUTOFF and DBH_SOFT_CUTOFF will either be skipped until they are the only available blocks left or an immediate call will be made to the DBH to perform additional cleanup, depending on the value of DBH_SOFT_CUTOFF_ACTION. If the only blocks avaible are below DBH_SOFT_CUTOFF then call the DBH to perform additional cleanup.

Examples:
========
========

A typical DBH might do the following:
=====================================
If the file is less than 24 hours old, preserve it and keep DBH_CURRENT_PRIORITY at 0.
Then, on a time-available, lowest-priority basis, sweep the entire filesystem overwriting each block first with 0's then with alternating patterns. At each pass, raise DBH_CURRENT_PRIORITY.
Set DBH_PRIORITY_HARD_CUTOFF at 1 and DBH_PRIORITY_SOFT_CUTOFF at the maximum value.
DBH_SOFT_CUTOFF_ACTION is set to fix.
The typical "fix" action will be to overwrite the data enough times to raise DBH_CURRENT_PRIORITY to DBH_PRIORITY_SOFT_CUTOFF.

A typical security-conscience environment:
==========================================
Run a medium-priority task to scrub sectors.
Set DBH_PRIORITY_HARD_CUTOFF and DBH_PRIORITY_SOFT_CUTOFF to the maximum priority.
Set DBH_SOFT_CUTOFF_ACTION to skip.

A typical performance-oriented environment:
===========================================
Set DBH_PRIORITY_HARD_CUTOFF at 0 and DBH_PRIORITY_SOFT_CUTOFF at 1.
DBH_SOFT_CUTOFF_ACTION to skip or fix depending on which gives better overall system performance.
This will give preference to sectors that have been overwritten one time.

Disabling this feature entirely:
===============================
Set DBH_PRIORITY_HARD_CUTOFF at 0 and DBH_PRIORITY_SOFT_CUTOFF at 0.
This makes DBH_SOFT_CUTOFF_ACTION moot.
Install a stub, do-nothing DBH. It won't ever be called once the filesystem is mounted.

Using this to prioritize deleted sectors by age:
===============================================
Routinely update DBH_CURRENT_PRIORITY based on age, with most-recently-deleted files having a value of 0 and files that are very old having a maximum priority.
Set DBH_PRIORITY_HARD_CUTOFF to a value corresponding to the minimum time you guarentee files will be kept and DBH_PRIORITY_SOFT_CUTOFF to a higher value.
Set DBH_SOFT_CUTOFF_ACTION to skip.
Set the DBH handler to make the block available if DBH_CURRENT_PRIORITY is greater than DBH_PRIORITY_SOFT_CUTOFF.

Alternative method to prioritize deleted sectors by age which checks blocks on-demand:
=====================================================================================
Routinely update DBH_CURRENT_PRIORITY based on age, with most-recently-deleted files having a value of 0 and files that are very old having a maximum priority.
Set DBH_PRIORITY_HARD_CUTOFF to 0, it is ignored.
Set DBH_PRIORITY_SOFT_CUTOFF to 1.
Set DBH_SOFT_CUTOFF_ACTION to fix.
Set the DBH handler to make the block available if the time since deletion is long enough.

Performance impact:
===================
===================

Formatting and mounting a filesystem will have a small additional overhead to write and read fs-wide values.

While a filesystem is mounted, additional memory is needed to hold additional filesystem metadata.

Any operation that requests a block will have an overhead as DBH_CURRENT_PRIORITY is checked and, if necessary, the DBH is called to make a block available.

Any operation that requests a block may not get the block it wants, leading to a sub-optimal layout of the file on disk.

Any operation that requests a block may fail due to lack of available blocks when it otherwise would not have.

Any operation that frees a block will have an overhead while the block's DBH_CURRENT_PRIORITY and other arbitrary date is set. This can probably be made very simple and fast if additional data isn't kept.

If the user-level free-block scavenging task does not get enough opportunity to run, the system can degenerate to a point where every block is below the DBH_HARD_CUTOFF and only root can use the system. If DBH_HARD_CUTOFF is set to 0 then the degenerate case will have every block being made available as needed, possibly a time-consuming operation. The latter can be a design feature, as it is in the example "Alternative method to prioritize deleted sectors by age which checks blocks on-demand" above.

Benefit:
========
========

The reuse of free disk space becomes a tunable parameter.
This can aid in file recovery and in legal compliance for data retention and destruction.

Requirements of a filesystem:
============================
============================
Any filesystem that impliments this will need hooks or callbacks in the appropriate places, such as:
initialization, volume-formatting, volume-mounting, volumen-unmounting, block-allocation, block-delallocation, etc.
It will also need a way to store information about deleted sectors in non-volatile storage and a way to store additional information in memory.
To the extend that information is recorded, this information should be quick to generate. Information such as the current time is quick to generate. Information such as the previous owner of a block may not be in all filesystems, and in some situations the information may have been destroyed prior to deleting the block. Some operating systems or filesystems may require an "assistant" routine that is called before any file is removed to temporarily record useful information.
A well-defined data block that says "here is a list of easy to find things and here are their values or here is where to find them" will be useful to make user-written deleted-block-data-saving routines more portable across filesystems and operating systems. This data block will be populated by filesystem- and operating-system-specific routine when files are deleted or blocks deallocated.

History:
========
========
Many filesystems, including DOS's FAT, preserve some information about the names and other meta-data for deleted files to aide in reconstruction.
Microsoft's NTFS has the concept of a "tombstone" to hold recently-deleted data.

Implementation:
===============
This has not been implemented yet. This is a high-level description of what such a system might look like.

This discussion has been archived. No new comments can be posted.

Labeling unused disk space on readiness for use

Comments Filter:

An engineer is someone who does list processing in FORTRAN.

Working...