Forgot your password?

typodupeerror
Software

Journal: Time to PAUSE, and DFS Progress (of sorts)

Journal by TTK Ciar

I got a PAUSE account!

As of today I am "TTKCIAR" at pause.perl.org, a full-fledged member of the open source perl community, and capable of contributing software to CPAN, the central repository of perl modules, for others to download and use! Exciting day!

I have a bunch of code to contribute too .. but there's one problem: most of it totally falls short of PAUSE's standards for contributed code.

Look at the KS module, for instance. KS is a repository of handy functions which I have accumulated over the last nine years. They're useful and mature functions, but scarcely documented and need to be broken out into category-specific submodules. It's not "the way" to have functions for converting Brinell Hardness units to Vickers Hardness units rubbing elbows with file-locking functions and networking functions and string manipulation functions, all in the same module. The unit conversion functions need to go into modules in the "Physics" namespace, and the network functions need to go into modules in the "Net" namespace, etc. The documentation needs to be brought up to PAUSE standards as well.

It's work I knew I needed to do, but it was easy to put it off as long as I didn't have a PAUSE account. But now that I do, there's no more putting it off! I just need to find time to do it.

The first module I publish might be a relatively young one, a concurrency library called Dopkit. It's something I've been wanting to write for years, but I just finished writing and debugging it yesterday. There are many concurrency modules in CPAN already, but most of them require considerable programming overhead and require that the programmer wrap their head around the way concurrency works. These are reasonable things to do, but I've often thought it would be nice if it could be made trivially easy for the programmer to make loop iterations perform in parallel, without changing from the familiar loop construct. Dopkit ("Do Parallel Kit") provides functions that look and act like the familiar perl loop syntax -- do, for, foreach, and while -- and chops up the loop into parts which execute concurrently on however many cores the machine has. The idea is to put very few demands on the programmer, who needs only to load the module, create a dopkit object, and then use dop(), forp(), foreachp, and whilefilep() where they'd normally use do, for, foreach, and while(defined(<SOMEFILE>)). There are some limitations to the implementation, so the programmer can't use Dopkit *everywhere* they'd normally use a loop, but within its limitations Dopkit is an easy and powerful way to get code running on multiple cores fast.

Dopkit suffers from the same documentation deficit as KS, but at least it's already "categorized" -- as soon as I can get the documentation written, it should be published as Parallel::Dopkit. KS will take significant refactoring.

Most of the perl in my codecloset is embarrassingly primitive (I wrote most of it in my early days of perl, before I was very proficient with it), but there are a few other modules on my priority list to get into shape and publish. My "dy" utility has proven a tremendously useful tool over the years, but is in desperate need of rewriting. It started out life as a tiny throwaway script, and grew features organically without any design or regard for maintainability. I've been rewriting its functionality in two modules, FileID and DY (which should probably get renamed to something more verbose). When they're done, the "dy" utility itself should be trivially implementable as a light wrapper around these two modules. Another tool I use almost every day is "select", which is also in need of being rewritten as a module. I haven't started that one yet.

In other news I stopped dorking around with FUSE and linux drivers, and dug into the guts of my distributed filesystem project. Instead of worrying about how to make it available at the OS level for now, I've simply written a perl module for abstracting the perl filesystem API. As long as my applications use the methods from my FS::AnyFS module instead of perl's standard file functions, transitioning them from using the OS's "native" filesystems to the distributed filesystem should be seamless. This is only an interim measure. I want to make the DFS a full-fledged operating-system-level filesystem eventually, but right now that's getting in the way of development. Writing a linux filesystem driver will come later. Right now I'm quite pleased to be spending my time on the code which actually stores and retrieves file data.

Questions posted by other slashdot users focussed my attention on how I expect to distinguish my DFS from the various other distributed filesystem projects out there (like BTRFS and HadoopFS). I want it to do a few core things that others do not:

(1) I want it to utilize the Reed-Solomon algorithm so it can provide RAID5 and RAID6-like functionality. This will produce a data cluster which could lose any two or three (or however many the system administrators specify) servers without losing the ability to serve data, without the need to store all data in triplicate or quadrupilate. BTRFS only provides RAID0, RAID1, and RAID10 style redundancy -- if you want the ability to lose two BTRFS servers without losing the ability to serve all your data, all data has to be stored in triplicate. That is not a limitation I'm willing to tolerate. Similarly, the other distributed filesystems have "special" nodes which the entire cluster depends on. These special servers represent SPOFs -- "Single Points Of Failure". If the "master" server goes down, the entire distributed filesystem is unusable. Avoiding SPOFs is a mostly-solved problem. For many applications (such as database and web servers), IPVS and Keepalived provide both, load-balancing and rapid failover capability. There's no reason not to have similar rapid failover for the "special" nodes in a distributed filesystem.

(2) I want the filesystem to be continuous. Adding storage, replacing hardware, or allocating storage between multiple filesystem instances should not require interruption of service. This is a necessary feature if the filesystem is to be used for mission-critical applications expected to stay running 24x7. Fortunately I've done a lot of this sort of thing, and haven't needed to strain thusfar to achieve it. (On a related note, I still chuckle at the memory of Brewster calling me in the middle of the night from Amsterdam in a near-panic, following The PetaBox's first deployment. The system kept connecting to The Archive's cluster in San Francisco and keeping itself in sync, and nothing brewster could do would make it stop. The data cluster's software interpreted all of his attempts to turn the service off as "system failures" which it promptly auto-corrected and restored. It was a simple matter to tell the service to stop, but Brewster has a thing against documentation.)

(3) I want the filesystem to perform well with large numbers of small files. This is the hard part for filesystems in general, and it's something I've struggled with for years on production systems. None of the existing filesystems handle large sets of very small files very well, and most distributed filesystems such as RAID5 do not address the problem (and in some ways compound it -- as RAID5 arrays get larger, the minimum data that must be read/written for any operation also gets larger). In my experience, most real-life production systems have to deal with large numbers of small files. Just running stat() on a few million files is a disgustingly resource-intensive exercise. RAID1 helps, but the CPU quickly becomes the bottleneck. One of my strongest motivations for developing my own filesystem is to address this problem. I don't want to be struggling with it for the next ten years. I am tackling this problem in three ways: First, filesystem metadata is replicated across multiple nodes, for concurrent read-access. Second, filesystem metadata is stored in a much more compact format than the traditional inode permits. Many file attributes are inherited from the directory, and attribute data is only stored on a per-file basis when it is different from the directory's. This should improve its utilization of main and cache memories. Third, the filesystem API provides low-level methods for performing operations on files in batches, and implementations of standard filesystem functions (such as stat()) could take advantage of these to provide superior performance. For instance, when stat() was called to return information about a file, the filesystem could provide that information for many of the files in the same directory. This information would be cached in the calling process' memor space by the library implementing stat() (with mechanisms in place for invalidating that cache should the filesystem metadata change), and subsequent calls to stat() would return locally cached information when possible. This wouldn't help in all situations, but it would help when the calling application was trying to stat() all of the files in a directory heirarchy -- a common case where high performance would be appreciated.

I don't know how long it will take to implement such a system. What work I've already done is satisfying, but it just scratches the surface of what needs to be done, and I can barely find time to refactor and comment my perl modules, much less spend hard hours on design work! But I'll keep at it until it's done or until the industry comes up with something which renders it moot.

User Journal

Journal: Eating fruit

Journal by TTK Ciar

This journal entry was originally titled "Getting bought and eating fruit" because I wanted to talk a little about Discovery Mining (the small datamining company which has employed me these past few months) getting purchased by Interwoven. But I kept writing paragraphs and deleting them, keeping in mind the scary documents they made me sign which, paraphrased, read "don't you say nuthin' about nuthin'". I'll just vaguely mention that it seems to be a good thing, and people are generally happy about it, and I'm not getting rich from the transaction. My responsibilities and compensation have remained about the same. Now, onward to something which I can say something about: The House!

Cobalt and I just got back from doing some yardwork on the new property, hacking off dead tree limbs and resalting the water softener and putting together some furniture, that sort of thing. It wasn't all work, however. We also wandered around the property, looking at things and grazing on random things we found growing on trees and bushes. We found plums ready to be picked and eaten, and some apples, and more blackberry berries than you can shake a stick at. I commented to her that if the world ended we could just hide there and eat berries for the rest of our lives. There's really more fruit there than the two of us can possibly eat. I think my co-workers will be getting more than just free eggs! There was also a pear tree, but none of its pears were even close to ripe.

The contractor has been doing really good work. We have a totally redone electrical system now, and new raised redwood decks in the front and back, and various other things the house was sorely needing. We were discussing today how much of the painting we wanted him to do, and how much we would do ourselves. We purchased about eight gallons of paint at Kelly Moore (of various colors) and unloaded them when we were there to work on the grounds. I have many, many pictures of the place, but haven't gotten them online because I haven't had the time to write captions for them. I like to do that, but maybe getting them up is more important than captioning them. I'll update here when they're up.

This has been a rather boring journal entry. It is boring because I am extremely tired, but wanted to get something into my journal, but most of the interesting things in my life right now I cannot talk about. Perhaps the next entry will be more interesting.

-- TTK

User Journal

Journal: Brief update: calc, dvm, distributed filesystem 2

Journal by TTK Ciar

Quick Rundown

Progress on the new house is .. well, progressing. We're having a really hard time of it, but may be getting more help soon. At least our contractor is doing a brisk job on his end -- the deathtrap electrical system has been redone, and cat6 connects patch panels in various rooms to a master panel in the room that will be my office, and the repaired bathroom will have its fixtures installed soon. We got a riding lawn mower, which is kind of annoying to use but also thrilling at times -- there are some pronounced hills in our front yard. I need to trim some low-hanging tree branches before next using it, though.

Calc3 is almost, but not quite, useful. This despite its being 150% the size of calc2. I've been abstracting things nicely and putting in hooks for all the things I want it to eventually become, which eats time and lines of code. I'm also impatient enough, though, that next time I get to it, I'll just focus on getting the calculator mode minimally useful. If it's useful enough that I start using it for my day-to-day work, it should become obvious what my priorities should be, and what features need rethinking.

Some time ago, I was hot after developing a distributed filesystem, not only because I needed one for my own use, but because I wanted to make some money. I know of a guy who runs a business which would benefit tremendously from being able to offer a distributed filesystem on his products, and figured he might be willing to license the thing, if I could get something working.

When I switched jobs, money ceased to be so much of an issue (did I mention that The Archive pays its employees about 2/3 what they're worth? Consider it mentioned), so I backburnered the idea. Lately I've been revisiting it because, well, I still need one. The original plan was to develop a "tolerably functional" interim solution fast, then work on the "preferred" solution.

My idea for the interim solution was to use linux's "network device" (nnd) driver to make really big network-mounted RAID5 and/or RAID6 filesystems, then use SAMBA to export them, and write some automated monitoring and control logic to manage the system and to provide users and administrative interface. This would have the advantage of being relatively fast and easy to develop, and would leverage linux's existing software-RAID technology to make the cluster's disks look like one huge filesystem (or a few). It would also have several disadvantages, though: Users' accesses to each distributed RAID array would be funnelled through a single server, making it both a SPOF and a performance bottleneck. Adding new space to existing filesystems would not be seamless, necessitating significant service disruption as the array rebuilt itself. Linux's RAID6 bites, performance-wise, and would limit the robustness of the cluster -- if three hosts went down, or even the wrong three disks, the filesystem would cease functioning. Finally, it turns out that nnd is not really ready for production use, so basing the filesystem on nnd would be like building a house on sand.

On top of it all, very little (if any) of the work I'd put into this interim solution could then be used to develop the preferred solution. It would all be throwaway code. The only real appeal was that I'd perhaps get a sellable product sooner, and be making a little money (hopefully) while developing the preferred solution.

Since I'm no longer feeling pressured to get something out fast, I'd might as well just start work on the preferred solution, which would provide distributed access (so users' connections could use any host's network connection for their accesses, alleviating the most severe bottleneck), better-than-RAID6 redundancy, and seamless addition of new space to existing filesystems.

Towards this goal, it makes sense to start with a project I've already been working on: DVM. It would provide the robust messaging system the cluster would need to chat amongst itself, and I need it for other projects anyway. It also makes sense to get DVM working first, because things might happen in the meantime which makes the distributed filesystem project moot. Google might publish its GFS and make it free for everyone to use. Ceph might come out with a better distributed filesystem than mine (I had sought a job there, but it didn't work out -- they have more technical talent than business-running talent). EMC or some other company might open-source their solution. Hell, I might even lose interest. If any of these things happen, then any work I've done which is specific to my distributed filesystem project will be wasted effort. But if I've spent my time working on DVM, then none of my work will have been wasted, because DVM has many uses.

Sometimes it seems like all roads lead back to DVM. Some of the functionality I want for calc3 will want DVM, too. So I'm working on DVM. This iteration is being developed in C, so it's slow going. I have a hash library which implements perl-like hash tables ("dictionaries", for the philistines amongst my readership), which helps speed things up, but it's still going to be a long while before I get it to a useful state. One advantage to my frustration is that it's encouraging me to really, *really* avoid early optimizations and unnecessary features (which one should avoid anyway -- it's just good engineering). Those can come later. This iteration is looking a lot more like conventional old-school unixy networking code than my other attempts, with big select()-and-loop constructs and the like. Maybe that's a good thing. We'll see.

-- TTK

User Journal

Journal: Quick interlude: select

Journal by TTK Ciar

I just posted the latest iteration of the "select" utility (for unixy systems) here.

"select" selects rows and columns from STDIN like the SQL "SELECT" statement selects rows and columns from a table (more or less). It understands how to parse tab-delimited text, CSV, hash, XML (poorly), and JSON. It can also emit any of these formats to STDOUT .. even if the format coming in on STDIN is different. This can make it also handy for simply translating between different text formats.

At The Internet Archive, I was living in data format hell. Our third-party contributors were pushing metadata at us in all manner of formats, the (very powerful) Petabox department refused to deal with anything other than XML, and the Collections department (where I worked for the last couple of years) had a whole lot of existing tools which used hash-format, but I was personally transitioning from hash to JSON. I already had a "selectcol" utility which could read tab and hash formats, so I evolved it in the direction I so very desperately needed.

At my current position at Discovery Mining, I've needed to translate between even more data formats, and the need for selecting rows and columns from large tabulated datasets has become even more pressing, so I forked "select" into a public domain part and a proprietary part. Improvements I develop on my own time will go into both, while improvements made during company hours will go into DM's copy. Since I'll be actively using this tool to get real work done, its development is likely to stay lively. So if you find the need, feel free to check on its status from time to time. I'll try to push out changes at least every other month.

Another recent update: prefix, which is mostly a timestamping tool. I'll write some documentation when I get around to it. Without arguments, it simply loops through STDIN, prepending a timestamp and emitting the result to STDOUT. With arguments it can prepend/append many other things to its input to generate realtime-annotated output. Yeah, I'll write some documentation.

-- TTK

User Journal

Journal: House!, Geekly Stuff, and Yet More Projects

Journal by TTK Ciar

House!, Geekly Stuff, and Yet More Projects

Touching on some other subjects: House!, cloning the sfgate site, dvm, and calc3.

Yay! The Escrow Has CLOSED!

Well, we finally did it! We are the proud owners of a house, with a key and everything! Cobalt and I are beside ourselves with joy and excitement. We've felt as though our lives were on haitus for these last few years, while we lived in this rental cottage and saved our pennies. The housing market kept going up as we saved, but with the market in a slump, and with considerable help from my parents (thanks, guys!) we've finally caught up with our dream.

For years, when cobalt or I talked about something we wanted to do, the talk would start with, "When we have our house, ..". Cobalt got really tired of that, so we amended it to "When we have our new life, ..", which added enough sarcasm to take the edge off the reality of our situation. We just haven't had the indoor space to do any of a number of things. The kitchen is too small to cook together, and there isn't room for a table on which to work leather, solder a circuit, fix gadgets, sew, or even draft a drawing. This rental has a huge yard (the largest yard we've ever had), but because it is a rental we're sharply limited in what we may do with it.

I have stubbornly tried working on various projects on the front porch, but it has been very frustrating having to unbox my tools and materials, work on them for a while, then box them back up and bring them back inside. The setup/teardown process eats half the time I should be using to get stuff done. Also, when it's even a bit windy outside, projects get bits of windborne dirt, pollen, leaves, and twigs stuck to them -- a big problem when glue, varnish, or bearings are involved!

Cobalt has been frustrated too. Most of her critters have been living in "slum" habitats for years now, and that's taken its toll on their health and on her enjoyment of them. It's one thing to have geckos living in a glass terrarium set out in the living room and filled with living plants, and quite another to have those geckos living in sterilite boxes stacked on shelves with paper towels for bedding.

Living here has taken its toll on our health as well. Because every single wall and every single corner is filled with "vertical storage" (bookshelves, cabinets, dressers, etc) to hold our critters and our stuff, it's very difficult to keep the cottage clean. Combine that with the entropic influence of four cats, two dogs, and a parrot, and the influx of dust kicked up outside by the chickens, and the windborne pollen endemic to Sonoma County, and this place can get really toxic very fast. My allergies kicked into high gear a few years ago, and I'm having to take singulair, nasonex, and loratadine to keep them under control. Cobalt's allergies have been starting to give her trouble, too, in the form of sinus headaches. She also gets so grossed out by the filth that she stops eating, and gets very depressed and agitated. We both do what we can to keep it clean, but it's not always enough .. in fact, it's often not enough. Like, almost all of the time, it's not enough.

The new house is going to be different. We are resolute in this. The living spaces will be kept sparsely-populated with furniture, to make cleaning easier, and we will keep work spaces separate from storage space, to keep our living and work areas roomy and free of clutter (and thus easy to keep clean, or at least that's the theory).

Having spaces in which to work will be such a blast! Cobalt and I have been huddling together over drawings of the house's floorspace, deciding who will use what, which spaces can usefully be shared, and what spaces will be purposed to what tasks. One room will have my drafting table and a workbench for small, "clean" projects (fiddling with computer hardware and repairing small appliances) and cobalt's sewing and crafts. Another will be my office, which will be a welcome change from my current "office" (consisting of one corner of the couch in the living room, with a bookshelf to my left and a 2'x2' "desk" which I share with the 1'x1' parrot's cage). We will share the garage for "dirty" projects (wood and metal work, mostly -- cutting, drilling, sanding, etc), and will set up a shed for the "very dirty" projects -- painting, varnishing, gluing, and welding.

There will need to be yet another space for another class of projects .. I hadn't told cobalt about all of the projects I wanted to do, and figured I'd better so we could plan for them and so it wouldn't come as a surprise to her when I got around to doing them. The turbine is a good example. I've been reading about turbine engines for the last few years. The concept seems to have a lot of untapped potential. For instance, the power generated by a turbine is influenced by many things, but is proportional to the rate of mass flowing through it. In a fuel- burning gas turbine, most of this mass is air. This necessarily means gas turbines are at their most effective (generating high power and torque for their volume) at extremely high speeds (since the only way to move a lot of air is with large and/or fast-moving blades), which dominates all other aspects of their design. Low mass, high strength, and thin blades are deemed absolutely necessary. Complex recuperators are used to squeeze as much power as possible out of the work the turbine produces, and the size and shape of air ramps, ducts, and compressors have to be carefully engineered to produce and manage high air flows and pressures.

Turbines can actually be extremely simple machines, and simple turbines can be made to work well at low speeds. Unfortunately low speed means low air flow, which means low mass flow, which means low power and torque production. I have some ideas on ways to remedy this problem, though, and would like to try them out. The new house has a fairly large property attached to it, and I was hoping to be able to set up a workshop out away from the house to work on the turbines and other similarly noisy/fiery things. Upon discussing it with cobalt, though, she'd much rather see this work done off the property altogether. We talked about perhaps finding someone else in the area who I could partner with to rent an offsite industrial space. Since my schedule really prevents me from spending a lot of time on my projects (just look at how long it takes me to write a simple journal entry!), we should be able to timeshare the space easily, without stepping on each others' toes. I'm totally game for that.

I would also like to set up a melting/molding rig for thermoplastics. Some amazing things have been published in the material engineering journals lately about plastic/ceramic composites, and I would like to see if I can reproduce the published effects using commodity (read: cheap) plastics and ceramic materials. Surface modification of ceramic granules will be necessary to replicate the described effects, so that will require some space, too, to set up a "pressure cooker" where chemicals can be introduced to granules under controlled (high) pressures and temperatures. The strength and other properties of a plastic/ceramic matrix depends not only on the properties of the plastic and the of the ceramic, but also on how and how well the two adhere to one another. Vinylesters and epoxies, for instance, tend to be pretty similar in most aspects, and their adhesion strength to aluminum oxide tends to be nearly the same, but since vinylesters bond with aluminum oxide near the ends of their polymer chains while epoxies bond near the middle, the vinylester/aluminum-oxide composite tends to have significantly higher resilience and compressive strength (since the granules are slightly more free to move at the ends of the chains, they can form up mutually supporting columns, like the aggregate in concrete).

Most commodity plastics will not adhere to ceramics very well at all. Polyethylene absolutely will not. Polyethylene terephthalate tends not to, but could be coaxed into it if the ceramic is given the appropriate surface modification. Surface modification is a matter of changing the exposed surfaces of the granules (either chemically or in their physical shape). "Sharper" surfaces have more surface area, and surface area is a linear factor in determining adhesion strength, so sharpening the granules' surfaces can improve adhesion some. The real gains are to be had by convincing things like cyanates, amines, or simple raw carbon to bond tightly with the granules' surfaces. Many plastics will then bond well with these intermediary chemical coatings. I look forward to finding out how easy it will be to produce these effects, and the effectiveness of the resulting composites.

Unfortunately, the house needs a lot of work before we can move in. We're meeting with a general contractor in a couple of days, and we anticipate it will be a couple of months before we can start lugging in our stuff and making it into a home (and a workshop, and a reptile habitat, and and and and ..!).

SFGate: Yeah, We Want To Do That

Switching gears a little, now that cobalt and I have quashed all doubts that YES, we are going to live right HERE for a while, and set down roots, we'd like to start work on a mutual project we've been talking about for years. Most denizens of California's bay area are familiar with the website sfgate.com. It provides a lot of handy information about local resources, news, and events. It's a fantastic site .. if you leave in or near the bay area. Unfortunately, content regarding our neck of the woods tends to be fairly sparse. Even when websites (like sfgate, yahoo local, or craigslist) have information about California's northern counties, it tends to be mixed up with an overwhelming volume of information about locales further south. Even the radio station I use to get traffic reports (KCBS) tends to cover "north bay" traffic very little. It would be really nice to have a site which focussed primarily on areas north of the bay area. So nice, in fact, that we're going to try to make it.

The site would be a mutual effort of myself, cobalt, and perhaps a few other people. We would try to gather and present information that someone would find useful if they were not already familiar with the area: restaurants, movie theaters, radio stations, local traffic patterns, and the like. Most of the information would be static (since that's easy to maintain), but we would also like to regularly add articles about local concerns (like the local economy, indian casinos, etc) and (especially!) reviews of restaurants and stores.

You'd think that Sonoma County would be rife with really good restaurants, since it relies on tourism for much of its income, and the place has a reputation for being posh and high-class, but really most of the restaurants here are terribly subpar. Cobalt and I keep trying new places, seeking to expand our small-but-growing list of places fit for dining. We've found a couple of italian places, some diners, a chinese joint, and two good mexican restaurants, but so far no good japanese food. We have to go to San Francisco to get decent sushi (Ichiraku, on the corner of 2nd and Geary, is our favorite "dive", though we also like Isobune in SF's japantown and Tsugaru in San Jose). We'll keep looking for a local place, though.

As we find places we like, they'll get reviewed and put on the site, along with our impressions and some standard metrics (for things like ambiance, price, variety, food quality, etc). Maybe it will catch on .. maybe not. But at the very least it'll be a place to organize our own interests. :-) A bit more work than pinning handwritten notes to the refrigerator door, perhaps, but more fun too.

A side rant -- why is the Bay Area called "Northern California" when it's located in the middle of the state? What do people call the 27 of California's 58 counties which lie north of "Northern California"? Oh that's right, they generally don't! Or they lump it into the Bay Area, despite its cultural, legal, and industrial differences. Maybe it's time to resurrect an old idea. It's not like the rest of the state would miss us anyway. "Huh? What? There's stuff between San Francisco and Oregon? *blank stare*" sums up most people's take on the subject.

Okay, mini-rant over.

On To Increasingly Geekly Topics

Calc3 (which I've blathered about quite a bit in my previous journal entry) is shaping up nicely. As I've found time, I've added more than four hundred lines of code to it (calc 2.5 is only 247 lines). The basic framework is nearly done. Another hundred lines or so and the essential functionality for the "perl calculator" mode should be there. Another hundred beyond that and I should have the "C calculator" mode working. The hooks are in place for the "sql" and "shell" modes, but I want to focus for now on getting the calculator modes running, so I can start actually using the thing in my day-to-day activities. The other functionality can wait, and I'll suffer in silence with tcsh, bash, and the mysql/pg clients.

One of the things I've often, *often*, often wished for, both in calc and in my shell, was a sharable history, so that I could work in one shell for a while, then switch to another shell session in a different window (perhaps running on an entirely different computer) but still have the other shell's history available for me to draw upon. Calc3 will have that, to a degree. I've written it such that every command gets appended to the history file in the user's home directory, and before evaluating a command Calc3 will look at the shared history file to see if new commands have been appended, and import new commands into its own history buffer. This at least gives me shared history on the same computer (or on multiple computers if they're using an NFS-shared /home, but right now none of the systems I use do that). Histories are heavily annotated with things like the pid of the contributing shell instance, the controlling tty, a timestamp, etc. Using tcsh most of my life has gotten me used to using the history file to audit my own activities, and this richer history format will enable me to do more of that. It will also make it possible for the user to specify subsets of the history to draw upon when performing a history-related operation (like, "!rm" -- should default to repeating the last command issued by the *current* shell that began with "rm", but the user should have the option of specifying that other shells' histories should be drawn from too).

Other features I've wanted for about a decade now are the ability to rearrange shell pipes after the processes being piped together have launched, and the un-unixy ability to daemonize commands by specifying that the last process's stdout should be piped to the first process' stdin, like thus:
# ssh bob@somehost | host_fiddler.pl |
.. which would pipe the stdout of ssh to host_fiddler.pl's stdin, and pipe host_fiddler's stdout to ssh's stdin. This could be attained without too much difficulty by having the shell fork() after the two processes have been created (and after their pids, fd's, etc have been pushed to the stack), and having the child process read from host_fiddler.pl's stdout and write to ssh's stdin in a loop (while also monitoring a "control" pipe to/from the parent process). And since the processes are "exposed" on the stack, manipulating them after they've been launched should be possible in ways not available under bash, tcsh, and zsh. Want to disconnect ssh from host_fiddler.pl and feed it your own keystrokes for a while before re-attaching them? Sure, why not! And if you modify host_fiddler.pl and want to use the new version without restarting the entire session, it should be easy to launch a new host_fiddler.pl process, detach ssh's stdin/stdout from the old process, attach them to the new process, and kill the old host_fiddler.pl process.

Yeah, Calc3 is going to be nice. 8-]

Flip-Flopping on DVM Yet Again

The first time I tried implementing DVM (my Distributed Virtual Machine, similar to PVM, the Parallel Virtual Machine), I tried making its messaging TCP-based. Then I switched to UDP because I wanted to take advantage of some of UDP's qualities and I thought connectionless messaging would be easier. Well, I've restarted my implementation yet again, using TCP as my transport layer. In the past I've implemented UDP-based protocols with more TCP-like reliability, and UDP-based protocols with authentication, and thought implementing a UDP-based protocol with reliability and authentication would be piece of cake. And it was! If your cake has big hard rocks baked into it. A few broken teeth later, I'm switching back to connection-based transport.

I was very eager to take advantage of UDP's ease of broadcast transmission, which is a handy way to get the same data to a bunch of different remote nodes without having to send the same data twice. The network switches take care of the retransmission for you, which removes one of the major bottlenecks in distributed systems scalability. I also wanted to use DVM as part of an ad-hoc network, in which it is the basic assumption that unreliability and a lack persistent connections is the common case (due to the limitations of the link layer and the supposed mobility of the communicating nodes). What I'm going to do instead is have DVM use UDP broadcast as a side-channel in a LAN environment for specific applications, and roll a completely different messaging framework for ad-hoc environments. Hopefully I can implement that as a light wrapper around DVM, or at least re-use some of DVM's code.

I'm eager to get something working. My systems will just have to continue using http-rpc and PVM in the meantime :-P

Okay, that's plenty of blather for now .. onwards and upwards!

-- TTK

User Journal

Journal: Calc v3

Journal by TTK Ciar

Back by popular demand! .. Okay, maybe not so popular.

Just as I was sitting down to write this, yardwork popped up, and now I'm exhausted. And I need to go to bed .. oh half an hour ago :-P whee!

But first, a little blather about Calc v3.

Eleven years ago, I started programming a little in Perl. I'd been reading through the "perlfunc" manual, and had successfully written the classic "Hello, World" program in perl, when my eyes lighted upon the eval() function. It occurred to me that this function would make it really easy to write a simple calculator, so I did. My second perl program ever was a simple loop that read input from the user, stuck an "$a =" in front of it, and passed it to eval(), which interpreted the data as though it were valid perl code .. which it usually was. If the user typed "2 + 3", the value "5" would get put into variable $a, and then it would display the contents of $a before repeating the loop. Perl also made it easy to push $a onto a stack and then display the top few slots of the stack, so I did that too. The resulting program was so amazingly wonderful that I've used it almost every day since then, and added new functionality to it incrementally. Over the years It's accumulated more and more code, much of it pretty bad code.

Version two-point-something (2.5? 2.6?) is here. It's become quite featureful, but any half-experienced perl programmer would cringe if they looked under the hood.

I've been wanting to rewrite it cleanly for a few years now, but have been mindful that perl is on its way out, and I need to learn a new language. Since writing calc was so instrumental in learning how to use perl (not only developing calc, but also using it as a sort of interactive perl shell, so I could test snippits of code at its prompt and see the results), I figured that rewriting it in "the new language" would be a good way to learn the new language too.

Well, I'm pretty sure "the new language" is Python, but Python would be a pretty poor language for the next version of calc. Also, writing calc in a language I barely knew was a sure-fire way to make it just as amateurish and haphazard as the perl version it was replacing. I've convinced myself, at this point, that I want to write v3 in perl. I was discussing this point with some friends on ResearchMUCK, and someone jokingly suggested I write it in C++.

It was a cheap joke, but it got me thinking. The language it's written in wouldn't have to necessarily be the language it accepted as input. I have another utility, called "c", available here which takes mixed C and perl on the command line, wraps a template C program around the input, compiles it, and runs the result. That's written in perl. Also, when I was tired of the shortcomings of postgreSQL's interactive client, and tired of switching back and forth between postgreSQL and MySQL clients in a heterogenous database environment, I forked calc and rewrote its guts to understand SQL and manage connections to multiple MySQL and PostgreSQL environments. That code is the intellectual property of The Internet Archive now, but I could rewrite it from scratch pretty easily.

These interactive clients were all developed separately, but thinking about them all together made me realize that if they could be combined into a single modeful utility with a shared stack, it would be quite powerful. I could, for instance, switch it to SQL mode, select some data from a database onto the stack, then switch back to perl mode and write perl that fiddled with the selected values. I decided this idea of modes would be an integral part of calc v3's design. I would also like to try giving it a "shell" mode, so that it was appropriate to use as a login shell. The processes and shell pipes would be exposed to the perl, C, and SQL modes through the shared stack, making for some very powerful opportunities for process manipulation and dynamic shell pipe redirection.

Other things high on my list for calc v3 are dynamic loading of modules and better saved history. The current version of calc has horrible history interaction. Other new features can wait. Loading too much complexity into the design from the get-go would get in the way of a clean, well-functioning rewrite (qv: Second System Syndrome). It will be better to write it with an eye towards allowing for these features, and then filling in the features incrementally later (and hopefully in a more orderly fashion than before).

I started writing it last week. So far it's going well. When I have something minimally useful I will put it up on my codecloset page.

Okay, it's way too late now. I go make the bed. One unfortunate detail of my new job is that I have to be in the Presidio every morning by 9am, which means I need to leave home by 7am, which means I have to go to sleep by 10pm if I want eight hours of sleep (and I really need eight hours of sleep to function correctly). I'll touch on other subjects in my next journal entry.

-- TTK

User Journal

Journal: Whoo boy .. much much

Journal by TTK Ciar

Woo! Finally posting!

A lot has happened, but I haven't much time, so I'll make it brief.

I (finally!) left the Internet Archive to work for a stealth startup, which was great except that it couldn't actually pay me until it received capital investment funds. Two months of that motivated me to look for employment elsewhere. and I found a gem of a place just a few blocks from Archive HQ. It's called Discovery Mining and I'm very happy there. I'm in a room with a dozen other programmers cranking out perl and wrestling with a complicated data-mining system of interconnected databases and special-purposed data servers and processing nodes. I'm learning a lot, which makes it a welcome change from The Archive. My position there hadn't exposed me to anything new for nearly a year. This new work is a happy mix of the new and the familiar -- it's familiar enough that I could get up to speed quickly and make myself useful, but new enough to stretch me and teach me useful stuff. I've been there about a month, and they seem as pleased with me as I am of them. We'll see how it goes, but I'm optimistic. They made me sign the nondisclosure agreement from hell, so there isn't much I can actually say about what I do .. so, moving right along ..

As you might remember, I've been giving Enlightenment17 a try. Several months later, I regretfully report that it just didn't work out. The developers decided to lose the third dimension of virtual desktops, and even after using E17 for all this time I miss that feature most terribly. E17 also seemed much less stable than E16, and stability is always at the top of my priority list when it comes to technology. Speaking with the developers proved fruitless. So I'm back on E16 at home and at work, and wondering what I'm going to do. E16 just isn't supported very well on newer versions of linux distributions. It relies on the freetype1 libraries, which haven't been ported/maintained for years, and the newer distributions are different enough from what freetype1 expects that the freetype1 installation process gets weirded out and refuses to proceed. E17 has been weaned off of freetype1, and uses freetype2 exclusively (later versions of E16 use both, freetype1 and freetype2, which raises its own set of problems).

If I really want to stick with E16, I have a few choices:

(1) I could stick with an older linux distribution version which still supports freetype1. On one hand this means I'm just putting off the inevitable, but on the other hand it coincides with my upgrade habits anyway. I tend to stay with what works for me, incrementally installing security patches and bugfixes, until I am presented with an irresistable reason to change to a newer operating system version. Then I put off upgrading some more until I can find a version which is stable enough to be worth my while. Instability in my operating system is utterly intolerable to me, and bad versions crop up far more often than they should. My migration path since 1996 was: Slackware 3.0 to Slackware 7.1, to Slackware 8.1, to Slackware 10.2. But anyway, even though 10.2 has been working very well for me for a long time, I know eventually modern applications will stop building under 10.2 and I will have to change again. So I'd rather have a backup plan.

(2) I could take up porting/maintenance of freetype1, and make sure it will install on future systems, thus ensuring the longitivity of E16 as well. This doesn't appeal to me much, as freetype1 is something of a pain in the arse. There's a reason the developers rewrote it, and I don't relish the notion of picking up someone else's can of worms.

(3) I could fork E16 and maintain my own version of it. The E17 code is available, so I could conceivably figure out what they did to shed freetype1 and apply those changes to the E16 codebase, hopefully without losing the three axes of virtual desktops in the process. Similarly I could incorporate bugfixes (if relevant) from the E17 code to the forked E16, and have the best of both worlds. This approach has a lot of appeal, but it's also a hell of a lot of work. I don't shy away from work, but I don't have a lot of time these days, and what time I do have I prefer to spend on other projects. My window manager is just supposed to *work*, and maintaining it is not something I want to do just for the sake of doing it.

So .. I don't know what I'm going to do. It's not critical yet, so I'll be pondering it some more.

House! We finally closed escrow on a house! W00t! Finally, a place of our own, a proper workshop, privacy, security, stability! I'm looking forward to it. I've got some pictures up here with more on their way. More on this later.

Aaaaaand I ran out of time! Maybe I'll get to write more tomorrow. Topics I want to touch on: calc3, dvm, and more about the house.

-- TTK

User Journal

Journal: LinkedIn, Enlightenment, and the Return of Cobalt

Journal by TTK Ciar

Getting Back Into the Swing of Things

I am happy to report that cobalt's strength is returning. She has been on the ropes for about a year now, fighting ulcerative colitis and side-effects from the drugs which fought the colitis. This weekend we dusted off the toolbox and constructed a fence around the back porch, to keep the chickens (and their slimy little poops) off of it. It was good to see her in action again -- she insisted on swinging the hammer whenever one needed to be swung, and she obviously enjoyed being out of bed and having the energy to actually do something! (And yes, I've asked her if we perhaps should just beef up the chicken run to contain the chickens better, but she said no, she wants to let them out of the run periodically anyway, so we'd still have the issue with free-roaming chickens polluting our elevated porch.)

The Remicade is definitely kicking the colitis' butt, and she's no longer on the steroids which were giving her the worse of the side effects. On the downside, she has developed some bad anemia (low red blood cell density), for which her specialist has prescribed some uber-powerful special-formulation iron supplements. If her red blood cell count drops much lower, she's going to have to get some transfusions, but hopefully we can pull her back from the brink. We're both keeping out fingers crossed. Yay modern medicine!

Professional Linkage

During my last round of job-seeking, I joined LinkedIn, one of those newfangled social networking sites. This one is optimized towards getting professionals in contact with other professionals, and it does a pretty good job. I actually got an interview out of a professional link on LinkedIn, though I actually didn't follow up on it (the NASA Archive job trumped other offers).

Looking at my list of "connections", I realized that this would potentially be a powerful tool for someone seeking to start up a new business. Many of the people I know are right there in my connection list, with their skills laid out for easy browsing. Were I to found (another) startup, it would be the easiest thing to run down that list saying "I need one of those, one of those, and one of those as new employees" and send them messages to that effect. Associates who were not interested but knew someone who might could then easily "link" me to someone in their own connection network. Nifty!

Enlightenment

For several years now I have enjoyed using the Enlightenment Window Manager to organize my computer workspace. It has precisely the features and behavior I need to manage a very large (200+) number of open windows. In particular, I have found its notion of "virtual desktops" very useful. A "virtual desktop" looks exactly like what you see when you sit down and use your computer -- a screenful of windows, icons, etc. Under Enlightenment, one can have many virtual desktops, like having many computers sharing the same screen. If I have Firefox, Xterm, and XPaint open in one virtual desktop, then I can switch to a different (empty) desktop, open a bunch of other windows, and then pop over the the other virtual desktop and there's my Firefox, Xterm, and XPaint again, exactly as they were before.

Enlightenment16 supports a 3D array of virtual desktops, multiple 3x3 "grids" of desktops which can be easily switched and shared. I have used Alt-F1, Alt-F2, etc to switch between different grids, and alt-arrow to navigate around the grid of desktops. This has enabled me to partition my workspace according to application.

Banks 1 and 2 (of 2D grids of virtual desktops) are for xterms related to programming, email, and system administration. These eighteen virtual desktops tend to be the most crowded. Bank 3 is for firefox windows (one full-screen firefox window per virtual desktop, for a total of nine firefox windows). Bank 4 is for realtime chat sessions and other "fun" stuff. This arrangement makes it easy for me to find the instance of the application I need with minimum fuss. If I want to find the open text editor window I was just using, I can just press Alt-F2 to switch to the second grid, and maybe navigate up/down, left/right one virtual desktop's worth to find the xterm I was looking for. Similarly, if I want a firefox window, Alt-F3 will get me to whichever one I was using most recently (though I tend to reserve the middle and bottom rows of virtual desktops for work-related firefox windows, and the top row for "fun" browsing).

Enlightenment16 has been getting harder and harder to install in newer systems, unfortunately (not Enlightenment itself, but rather some of the libraries Enlightenment depends on .. some of the older libraries are particularly problematic on 64-bit systems). Getting Enlightenment16 installed on my new work desktop system (replacing workstation20) proved very difficult. I eventually gave up and switched to Enlightenment17.

Enlightenment17 has dropped support for the 3rd dimension of virtual desktops, limiting me to a 2D grid of virtual desktops only. Alt-F1 through Alt-F12 now switch me between the first 12 virtual desktops, rather than between banks of grids of virtual desktops. Needless to say, after five years of using Enlightenment16, this is tripping me up and making me cranky.

At first I tried using a 5x4 grid of virtual desktops, with the first two rows reserved for xterms, the third row for firefox, and the fourth row for fun (thus corresponding the y-axis of the grid to the z-axis of the array I was using under E16), but this proved quite unsatisfactory -- lots of keypresses were necessary to navigate around and find my stuff, and I quickly became "cramped" with only five virtual desktops per category (whereas before I had nine).

As of today I am taking a different approach, turning my partitions ninety degrees into a 4x7 grid of virtual desktops, with each column of 7 virtual desktops representing an application category (columns 1 and 2 for xterms, 3 for firefox, and 4 for fun). This allows me to continue using my Alt-F1/Alt-F2/etc "muscle memory" to switch between categories (since Alt-F1 will take me to the top of column 1, Alt-F3 to the top of column 3, etc) and gives me seven virtual desktops per category.

To further facilitate navigation, I am going to try to start new tasks in a row corresponding to the day of the week, with Monday corresponding to the first row, Tuesday to the second row, etc. Thus on a Friday if I wanted to start writing a new program, I would hit Alt-F2 to go to the top of my second column, then alt-downarrow four times to the row corresponding to Friday. In that virtual desktop I would open all of my xterms related to that new programming project. Since I keep written notes about my daily operations and keep them annotated by date, finding that programming project should be even easier than it was under E16.

However, I have also gotten in touch with the Enlightenment developers, and asked why the feature was dropped from E17. Depending on how they respond, some or all of the following options may be feasible for moving forward: (1) Talk them into porting this feature back into future versions of Enlightenment; (2) Convince them to port this feature into E17 by ways of monetary donation; (3) Port the feature to E17 myself; (4) Perform the necessary coding/hackery myself to get E16 working on modern operating systems and abandon E17; (5) Suck it up and put up with E17 and its less-useful virtual desktop management featureset; or (6) Abandon Enlightenment altogether and try to make FVWM2 (another window manager) do what I want.

For now I await the Enlightenment developers' response to my emails. Perhaps in the meantime this new arrangement will grow on me. Time will tell!

-- TTK

User Journal

Journal: Remicade, NASA, and the Passing of Needles 1

Journal by TTK Ciar

Remicade, NASA, and the Passing of Needles

Man, things have been busy lately! And it's been a month since my last journal entry .. scowl. So I'd better write something, eh?

Remicade

My wife's condition has been getting nothing but worse for a few months now, and the steroids she's been taking to control her internal ulcerations have had a little effect, but also many negative side effects (weakness, tiredness, sleeping 14+ hours a day, loss of balance .. icky things). After much discussion with doctors (one GP and two specialists!) and our insurance company, we finally got the go-ahead to put her on Remicade. Her first infusion was a little over a week ago, and it usually takes two or three weeks for it to show effects. We're crossing our fingers real hard for this, because if the Remicade doesn't work there are only a couple of options left open to us, all of which are scary and unpleasant.

I'm looking forward to her being well enough that she can come with me to a Houseness BBQ party and meet some of my new SF geek friends. They're great people, and I think cobalt would get along with them well. Also, some of my friends who have never met cobalt may be starting to think she's my imaginary friend :-) There is a reason people invite all of their friends and relatives to their wedding -- there's something to be said about showing a relationship, and demonstrating the wonderfulness of one's life partner. It will also do cobalt some good to get out of the house and meet new people.

NASA

I was determined -- determined! -- to leave The Archive for a less dysfunctional company. I was interviewing places and even got a couple of highly generous (perhaps even overly generous) offers from some cool companies, but then something really wonderful happened. The Director of Data Collections at The Archive offered me an extremely desirable role in his newly founded NASA Archive project. We are going to be digitizing, categorizing, archiving, and making available online a huge volume of NASA-produced content, and as soon as we hire someone to take over my old role as catch-all Data Collections engineer, it will be my full-time job to help make the NASA Archive happen. If I'm being a little vague on the details of what that entails, well, unfortunately there's a reason for it. All I can say is that this opportunity is like a dream come true for me. I will own several problems which intensely interest me, will be archiving vast volumes of hard scientific content (and pretty pictures, too!), and will do it while working with people I know and like. For the past few weeks I have been splitting my hours between general Data Collections tasks and NASA Archive tasks, and not really getting enough done on either, despite working even longer hours than usual.

So, please, if anyone out there knows a good software engineer who works well independently (by which I mean "submerged in utter chaos") and is excited at the notion of archiving hundreds of terabytes of new content, point them at this job description and encourage them to drop us their resume. It can be a wonderful company if you have the temperament for it, and I would work closely with my replacement for a couple of months before shifting entirely to NASA Archive tasks. The skills requirements isn't too much -- candidates need some PHP experience (preferably PHP5), experience with XML (parsing it, generating it, and using it in PHP5), and practical knowledge of using linux remotely (connecting to remote machines via ssh, using "df" to see if disks are full, looking at process lists to see what programs are misbehaving, simple stuff). Knowledge of another language well-suited to file manipulation (especially perl or python) is a plus, but not totally necessary as long as the candidate is willing to learn some perl (there is some legacy software written in perl that the Archive Engineer will need to maintain -- or totally rewrite in a different language, if they want). About half of the day-to-day work involves writing software that translates third party metadata (which might be XML, Excel, text files, or whatever) into Archive-compliant XML metadata. The other half tends to be more interesting, comprised of many things which we can talk about in person.

Farewell, Needles

Needleclaws, my beloved cat, had been struggling with her body for a long time (dementia, respiratory problems, arthritis, blindness, and other issues brought on by old age). On August 2nd of 2007, she gave up the struggle and passed away.

When I got home about 9:30pm she was panting and intermittently gasping for air, and would not sit or stand. She was barely responsive to cobalt and me touching her. Cobalt got a towel as I held my cat, and we wrapped her loosely to keep her warm and to contain the "nature" which would surely flow. I held her in my arms, talking to her, touching her face and head and neck.

Soon her pants faded to bare whispers of breath, and her gasps became more violent but less frequent. A corner of my mind couldn't help but count the clock ticks between her gasps. Ten seconds for a while, then twelve, then sixteen, twenty-four, thirty-two for a while .. then sixty-two, once, and she stopped breathing altogether. I still held her for a long while, with cobalt next to me. We wept, and talked about her life, and sometimes what we said made us laugh. She was a goofy cat.

She had been blind for about four years. She was a bulldog of a cat, my "battleaxe cat". She was insane in a way that made the other cats give her room when she wanted her turn at the food or water. One of her signs of affection was to rub her cheek and jowl against your hand, such that her overhanging fang would scrape lightly against your flesh. Cobalt loved it, called it "tusking". I took cobalt's hand and ran it lightly over my dead cat's tusk. Surprised, she laughed, and then cried.

Cobalt looked for something "more dignified" to wrap Needles in, while I went outside to finish the casket I had mostly built for this eventuality. I had wanted to go to the hardware store to find two more pieces of wood which were not warped and were the same length for the bottom of the casket, but there was no more time. I picked the two flattest pieces of wood which were more or less the same length, and finished building the casket bottom. Meanwhile, cobalt had laid Needles down on the t-shirt I had donated to the burial shroud (my old black "Codewarrior '96" shirts; it seemed appropriate) and snipped tufts of hair from my cat's body. It's a tradition she invented two decades ago, to make keepsakes of her dead pets' hair tufts. She would cut a tuft, lay it on the sticky side of a segment of clear packing tape, then fold the other end of the tape over the top, sealing the hair tuft in. Homespun lamination. She made three for me, two for herself.

I was very sad, but wanted to make something with my hands for her. A casket seemed just perfect. Apologies to the neighbors for the four nails I had to hammer that night. The following day I put the bottom on, put my cat inside, and nailed down the lid. It wasn't until the following day that cobalt and I were up and active at the same time, and we had a dignified but informal burial. I looped two lengths of rope under the casket, and left them in place when I buried it. The other end of the loops are showing above ground, and they will help me find the thing later and help me raise it up when we move to a different house. I want Needles to be buried on our own property. Where she is now is only temporary, like too much in our life right now.

Cobalt wrote up her own entry for this sad but inevitable passing.

I was very, very sad until I dug a four foot deep hole big enough to hold Needles' overly-large casket. The process burned away much of my grief and unhappiness. I still miss her, but not as harshly as I did before digging that hole. I surmise from this that people who hire a funeral parlor to bury their loved ones for them are doing themselves a disservice. Those who loved their deceased might find a lot of therapy in burying them themselves (or at least digging as much of the hole as they are physically able, and letting family and friends take over if necessary).

-- TTK

And now for something completely the same.

Working...