Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
User Journal

Journal Digital_Mercenary's Journal: ZFS Basics Tutorial found on Net::Thank the tubes!

zfs tutorial part 1
Learning to use ZFS, Sun's new filesystem.

ZFS is an open source filesystem used in Solaris 10, with growing support from other operating systems. This series of tutorials shows you how to use ZFS with simple hands-on examples that require a minimum of resources.

In this tutorial I hope to give you a brief overview of ZFS and show you how to manage ZFS pools, the foundation of ZFS. In subsequent parts will we look at ZFS filesystems in more depth.

This tutorial was created on 2007-03-07 and last revised on 2008-08-24.
ZFS Tutorial Series

1. Overview of ZFS & ZFS Pool Management
2. ZFS Filesystem Management, Mountpoints and Filesystem Properties

Let your hook be always cast; in the pool where you least expect it, there will be a fish. â" Ovid
Getting Started
You need:

* An operating system with ZFS support:
o Solaris 10 6/06 or later [download]
o OpenSolaris [download]
o Mac OS X 10.5 Leopard (requires ZFS download)
o FreeBSD 7 (untested) [download]
o Linux using FUSE (untested) [download]
* Root privileges (or a role with the appropriate ZFS rights profile)
* Some storage, either:
o 512 MB of disk space on an existing partition
o Four spare disks of the same size

Using Files

To use files on an existing filesystem, create four 128 MB files, eg.:

Code: Select all
        # mkfile 128m /home/ocean/disk1
        # mkfile 128m /home/ocean/disk2
        # mkfile 128m /home/ocean/disk3
        # mkfile 128m /home/ocean/disk4

Code: Select all
        # ls -lh /home/ocean

total 1049152
-rw------T 1 root root 128M Mar 7 19:48 disk1
-rw------T 1 root root 128M Mar 7 19:48 disk2
-rw------T 1 root root 128M Mar 7 19:48 disk3
-rw------T 1 root root 128M Mar 7 19:48 disk4

Using Disks

To use real disks in the tutorial make a note of their names (eg. c2t1d0 or c1d0 under Solaris). You will be destroying all the partition information and data on these disks, so be sure they're not needed.

In the examples I will be using files named disk1, disk2, disk3, and disk4; substitute your disks or files for them as appropriate.
ZFS Overview

The architecture of ZFS has three levels. One or more ZFS filesystems exist in a ZFS pool, which consists of one of more devices* (usually disks). Filesystems within a pool share its resources and are not restricted to a fixed size. Devices may be added to a pool while its still running: eg. to increase the size of a pool. New filesystems can be created within a pool without taking filesystems offline. ZFS supports filesystems snapshots and cloning existing filesystems. ZFS manages all aspects of the storage: volume management software (such as SVM or Veritas) is not needed.

*Technically a virtual device (vdev), see the zpool(1M) man page for more.

ZFS is managed with just two commands:

* zpool - Manages ZFS pools and the devices within them.
* zfs - Manages ZFS filesystems.

If you run either command with no options it gives you a handy options summary.
Pools

All ZFS filesystems live in a pool, so the first step is to create a pool. ZFS pools are administered using the zpool command.

Before creating new pools you should check for existing pools to avoid confusing them with your tutorial pools. You can check what pools exist with zpool list:

Code: Select all
        # zpool list

no pools available

NB. OpenSolaris now uses ZFS, so you will likely have an existing ZFS pool called syspool on this OS.
Single Disk Pool

The simplest pool consist of a single device. Pools are created using zpool create. We can create a single disk pool as follows (you must use the absolute path to the disk file):

Code: Select all
        # zpool create herring /home/ocean/disk1
        # zpool list

NAME SIZE USED AVAIL CAP HEALTH ALTROOT
herring 123M 51.5K 123M 0% ONLINE -

No volume management, configuration, newfs or mounting is required. You now have a working pool complete with mounted ZFS filesystem under /herring (/Volumes/herring on Mac OS X - you can also see it mounted on your Mac desktop). We will learn about adjusting mount points in part 2 of the tutorial.

Create a file in the new filesystem:

Code: Select all
        # mkfile 32m /herring/foo
        # ls -lh /herring/foo

-rw------T 1 root root 32M Mar 7 19:56 /herring/foo

Code: Select all
        # zpool list

NAME SIZE USED AVAIL CAP HEALTH ALTROOT
herring 123M 32.1M 90.9M 26% ONLINE -

The new file is using about a quarter of the pool capacity (indicated by the CAP value). NB. If you run the list command before ZFS has finished writing to the disk you will see lower USED and CAP values than shown above; wait a few moments and try again.

Now destroy your pool with zpool destroy:

Code: Select all
        # zpool destroy herring
        # zpool list

no pools available

On Mac OS X you need to force an unmount of the filesyetem (using umount -f /Volumes/herring) before destroying it as it will be in use by fseventsd.

You will only receive a warning about destroying your pool if it's in use. We'll see in a later tutorial how you can recover a pool you've accidentally destroyed.
Mirrored Pool

A pool composed of a single disk doesn't offer any redundancy. One method of providing redundancy is to use a mirrored pair of disk as a pool:

Code: Select all
        # zpool create trout mirror /home/ocean/disk1 /home/ocean/disk2

Code: Select all
        # zpool list

NAME SIZE USED AVAIL CAP HEALTH ALTROOT
trout 123M 51.5K 123M 0% ONLINE -

To see more detail about the pool use zpool status:

Code: Select all
        # zpool status trout

pool: trout
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
trout ONLINE 0 0 0
mirror ONLINE 0 0 0 /home/ocean/disk1 ONLINE 0 0 0 /home/ocean/disk2 ONLINE 0 0 0

errors: No known data errors

We can see our pool contains one mirror of two disks. Let's create a file and see how USED changes:

Code: Select all
        # mkfile 32m /trout/foo

Code: Select all
        # zpool list

NAME SIZE USED AVAIL CAP HEALTH ALTROOT
trout 123M 32.1M 90.9M 26% ONLINE -

As before about a quarter of the disk has been used; but the data is now stored redundantly over two disks. Let's test it by overwriting the first disk label with random data (if you are using real disks you could physically disable or remove a disk instead):

Code: Select all
        # dd if=/dev/random of=/home/ocean/disk1 bs=512 count=1

ZFS automatically checks for errors when it reads/writes files, but we can force a check with the zfs scrub command.

Code: Select all
        # zpool scrub trout

Code: Select all
        # zpool status

pool: trout
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-4J
scrub: scrub completed with 0 errors on Wed Mar 7 20:42:07 2007
config:
NAME STATE READ WRITE CKSUM
trout DEGRADED 0 0 0
mirror DEGRADED 0 0 0 /home/ocean/disk1 UNAVAIL 0 0 0 corrupted data /home/ocean/disk2 ONLINE 0 0 0

errors: No known data errors

The disk we used dd on is showing as UNAVAIL with corrupted data, but no data errors are reported for the pool as a whole, and we can still read and write to the pool:

Code: Select all
        # mkfile 32m /trout/bar
        # ls -l /trout/

total 131112
-rw------T 1 root root 33554432 Mar 7 20:43 bar
-rw------T 1 root root 33554432 Mar 7 20:35 foo

To maintain redundancy we should replace the broken disk with another. If you are using a physical disk you can use the zpool replace command (the zpool man page has details). However, in this file-based example I remove the disk file from the mirror and recreate it.

Devices are detached with zpool detach:

Code: Select all
        # zpool detach trout /home/ocean/disk1

Code: Select all
        # zpool status trout

pool: trout
state: ONLINE
scrub: scrub completed with 0 errors on Wed Mar 7 20:42:07 2007
config:
NAME STATE READ WRITE CKSUM
trout ONLINE 0 0 0 /home/ocean/disk2 ONLINE 0 0 0

errors: No known data errors

Code: Select all
        # rm /home/ocean/disk1
        # mkfile 128m /home/ocean/disk1

To attach another device we specify an existing device in the mirror to attach it to with zpool attach:

Code: Select all
        # zpool attach trout /home/ocean/disk2 /home/ocean/disk1

If you're quick enough, after you attach the new disk you will see a resilver (remirroring) in progress with zpool status.

Code: Select all
        # zpool status trout

pool: trout
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress, 69.10% done, 0h0m to go
config:
NAME STATE READ WRITE CKSUM
trout ONLINE 0 0 0
mirror ONLINE 0 0 0 /home/ocean/disk2 ONLINE 0 0 0 /home/ocean/disk1 ONLINE 0 0 0

errors: No known data errors

Once the resilver is complete, the pool is healthy again (you can also use ls to check the files are still there):

Code: Select all
        # zpool status trout

pool: trout
state: ONLINE
scrub: resilver completed with 0 errors on Wed Mar 7 20:58:17 2007
config:
NAME STATE READ WRITE CKSUM
trout ONLINE 0 0 0
mirror ONLINE 0 0 0 /home/ocean/disk2 ONLINE 0 0 0 /home/ocean/disk1 ONLINE 0 0 0

errors: No known data errors

Adding to a Mirrored Pool

You can add disks to a pool without taking it offline. Let's double the size of our trout pool:

Code: Select all
        # zpool list

NAME SIZE USED AVAIL CAP HEALTH ALTROOT
trout 123M 64.5M 58.5M 52% ONLINE -

Code: Select all
        # zpool add trout mirror /home/ocean/disk3 /home/ocean/disk4

Code: Select all
        # zpool list

NAME SIZE USED AVAIL CAP HEALTH ALTROOT
trout 246M 64.5M 181M 26% ONLINE -

This happens almost instantly, and the filesystem within the pool remains available. Looking at the status now shows the pool consists of two mirrors:

Code: Select all
        # zpool status trout

pool: trout
state: ONLINE
scrub: resilver completed with 0 errors on Wed Mar 7 20:58:17 2007
config:
NAME STATE READ WRITE CKSUM
trout ONLINE 0 0 0
mirror ONLINE 0 0 0 /home/ocean/disk2 ONLINE 0 0 0 /home/ocean/disk1 ONLINE 0 0 0
mirror ONLINE 0 0 0 /home/ocean/disk3 ONLINE 0 0 0 /home/ocean/disk4 ONLINE 0 0 0

errors: No known data errors

We can see where the data is currently written in our pool using zpool iostat -v:

Code: Select all
        zpool iostat -v trout

capacity operations bandwidth
pool used avail read write read write
---------------------------- ----- ----- ----- ----- ----- -----
trout 64.5M 181M 0 0 13.7K 278
mirror 64.5M 58.5M 0 0 19.4K 394 /home/ocean/disk2 - - 0 0 20.6K 15.4K /home/ocean/disk1 - - 0 0 0 20.4K
mirror 0 123M 0 0 0 0 /home/ocean/disk3 - - 0 0 0 768 /home/ocean/disk4 - - 0 0 0 768
---------------------------- ----- ----- ----- ----- ----- -----

All the data is currently written on the first mirror pair, and none on the second. This makes sense as the second pair of disks was added after the data was written. If we write some new data to the pool the new mirror will be used:

Code: Select all
        # mkfile 64m /trout/quuxx

Code: Select all
        # zpool iostat -v trout

capacity operations bandwidth
pool used avail read write read write
---------------------------- ----- ----- ----- ----- ----- -----
trout 128M 118M 0 0 13.1K 13.6K
mirror 95.1M 27.9M 0 0 18.3K 9.29K /home/ocean/disk2 - - 0 0 19.8K 21.2K /home/ocean/disk1 - - 0 0 0 28.2K
mirror 33.2M 89.8M 0 0 0 10.4K /home/ocean/disk3 - - 0 0 0 11.1K /home/ocean/disk4 - - 0 0 0 11.1K
---------------------------- ----- ----- ----- ----- ----- -----

Note how a little more of the data has been written to the new mirror than the old: ZFS tries to make best use of all the resources in the pool.

That's it for part 1. In part 2 we will look at managing ZFS filesystems themselves and creating multiple filesystems within a pool. We'll create a new pool for part 2, so feel free to destroy the trout pool.

If you want to learn more about the theory behind ZFS and find reference material have a look at ZFS Administration Guide, OpenSolaris ZFS, ZFS BigAdmin and ZFS Best Practices.

This discussion has been archived. No new comments can be posted.

ZFS Basics Tutorial found on Net::Thank the tubes!

Comments Filter:

So you think that money is the root of all evil. Have you ever asked what is the root of money? -- Ayn Rand

Working...