Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
User Journal

Journal Journal: Perl Revisited

Six years ago I wrote a blog on my thoughts on Perl. I thought it was appalling.

I've been forced to reconsider it as my new job is full of Perl enthusiasts. They even have a Trading System host written in Perl. So what do I think now?

Perl appears to be a combination of AWK and Unix Shell programming. And it's best used in these environments. So far, so good.

The problem is there are modules for nearly anything you can think off. And these modules allow Perl to do more. In my view, it's these modules that allow Perl to be used in environments where it has no business at all.

The language is endowed with OO features. But that's trying to solve a problem it's brought on itself, application size. And of course, because it has a history of compromising speed and general efficiency for apparent ease of use, it's slow.

So my impression of the language has improved, but my impression of the users have decreased.

User Journal

Journal Journal: Whatever happened to strcimp?

Back when computers just spoke American, string processing was easy. The letters were arranged in ascending order and upper-case was spaced a fixed amount from lower case. This allowed letters and words to be sorted easily and conversion between the upper and lower cases was trivial. As words are fixed length, processing of words could be done very efficiently. This is ASCII.

But the world has grown up and those Europeans and Asians want to use their old languages with their ancient quirky ways and it doesnÃï½ÂÂÃÂÂt work anymore. What to do? The two things that have been done are:
1. Remove case insensitive string compare, stricmp.
2. Provide knowingly broken implementations. (e.g. http://msdn.microsoft.com/en-us/library/system.globalization.textinfo.totitlecase.aspx)

So what are the issues?

The new languages combined have more characters. Conversion between upper and lower case words is no longer trivial.

Unicode supports a large number of characters. But there are a variety of encodings. One practical encoding is UTF-8. This encoding uses ASCII for the first 127 characters, then the character length grows beyond that. So you get to keep all the efficient ASCII optimisations and only pay for extended character sets when you use them. Windows NT and its derivatives use UTF-16.

Ãï½ÂÂÃ...Â"McKnightÃï½ÂÂàis an example of a mixed case word that confounds conversion to upper case and lower case. TheyÃï½ÂÂÃÂÂre no longer reversible. Furthermore, phrases like Ãï½ÂÂÃ...Â"War and PeaceÃï½ÂÂàconfound title case conversions (the naÃve conversion produces Ãï½ÂÂÃ...Â"War And PeaceÃï½ÂÂÃÂ. In some languages, there isnÃï½ÂÂÃÂÂt a direct one to one conversion of upper to lower case letters. This introduces the notion, case fold.
So to sum up, we have special rules for Ãï½ÂÂÃ...Â"to upper caseÃï½ÂÂÃÂ, Ãï½ÂÂÃ...Â"to lower caseÃï½ÂÂÃÂ, Ãï½ÂÂÃ...Â"to title caseÃï½ÂÂàand Ãï½ÂÂÃ...Â"case foldÃï½ÂÂÃÂ. And they all depend on whoÃï½ÂÂÃÂÂs doing them; their locale.

Programming

Journal Journal: The Folly of Garbage Collection

C programming is tricky. You have almost complete control of the machine but you have to rely on disciplined behaviour to maintain a handle on things. What happens if you get it wrong? You get resource leaks and abuse. The resources aren't constrained to a finite list of operating system resources, but from library resources or your own programs resources too. But the biggest, most notable culprit of all, is the raw resource that nearly every computing resource is made from, memory.

C with Classes addressed this by adding support for resource management directly into the language with constructors and destructors that automatically initialise and release class resources. The language has since evolved into C++ with further programmer assistance. But even with better assistance, without disciplined use, you're still in danger of committing the 'memory leak' error.

Without thinking too much about the problem, Java and C# have adopted C++'s syntax and style, but made the world a better place by solving the 'memory leak' problem by relieving the programmer of that duty. The supposed saviour? Garbage Collection.

The C virtual computer has a heap and a stack. Local variables occupy space on the stack where functions are run. When the function returns, the stack and all the variables in it are cleared down. The heap provides a store of memory where blocks can be given out on request and returned when complete. So what can go wrong with heap usage?

Handling NULL
If a request is made for a certain amount of memory, the heap can return NULL, meaning the request cannot be satisfied. Careless code will not check for this NULL and proceed as if the allocation succeeded, writing data into the NULL location.

Help comes in many forms. The runtime memory layout is changed where possible so that the NULL address maps onto something helpful. In DOS, it maps onto a string that has a helpful message, just encase the memory is first viewed. In a protected mode environment, a page fault is generated. C++ provides exceptions, and a standard exception, bad_alloc, when an allocation fails.

Buffer Overwrites
An application can ask for 10 bytes, and write beyond this range. This typically damages the heap in some way, affecting future allocations. Debug heaps are available that assist in the catching of these errors; such as Microsoft C++'s Debug heap, or ValGrind on Linux or UMEM on Solaris. But these are debug measures. In production systems, these error are hard to catch. The C++ class goes a long way in containing these errors as the error may be confined to a class, making it easier to detect and fix.

Dangling References
When the program is finished with a heap allocation, it should return it, making that memory available to other parts of the program. If the memory is returned to the heap, but the pointer not cleared, you can continue to use the memory inadvertently. The discipline of clearing pointers after use goes a long way to removing this error. C++ classes again mitigate by confining the use of pointers to smaller sections of code that typically release memory in a destructor, deleting the pointer as well as the memory.

And finally, Garbage
If an allocation is obtained from the heap and the pointer to the block is lost, the memory is lost to the program for the duration of the run. We call this garbage.

Garbage collected languages expect you to ask for memory and to loose the pointer to it when you're done. The heap detects the garbage condition and marks is as ready for release. Brilliant, a whole class of problem has disappeared. So what's the problem?

The problem is that memory is the raw material that resources are made off, not the ultimate resource at all. If the actual resource is a socket connection to a server, the socket can fall into the CLOSE_WAIT state and never be released for the run of the program (just like memory garbage). The resource could be a scarce graphics device object that take resources from the graphics system, or a file handle that takes resources from the file system, and on, and on. Memory, of itself, is rarely the ultimate reason for obtaining memory and so cannot be the focus of its return.

The C++ destructor, a language defined construct to do exactly that, does exactly that in C++. Used well, C++ can eliminate all of these resource errors, as well as heap errors.

Garbase collected languages move the responsibility of releasing resources back the the programmer. The programmer can no longer rely on the resource cleaning itself up, this is a major set back to the procedural programming ways.

Garbase collected systems have no notion of ownership of objects. No one or no thing owns resources. They are created and passed around with free abandon. This makes static analysis of the lifetime of objects impossible to perform. At best, you can only identify where an object was created. You can never be sure if it's deleted where or when it ought to be.

Garbage collected languages leak resources where that resource is not just memory and resorts to placing the burden on the programmer to release resources, reinventing the resource leak.

User Journal

Journal Journal: It's all magic

In recent times I've become extremely impressed with computing.

This morning, I docked my Palm LifeDrive which is connected to my development computer. I noticed that I have some pictures on there that need to be transfered to my server at home. My admin conmputer has ssh access to the internet, but the development computer doesn't.

I put the Palm into "Drive Mode" and it shows up a drive letter on my dev box; I share that drive letter. I connect to that share from my admin box.

From cygwin's bash on the admin box, I move to the top of the pictures directory and tar it, piping the output to ssh where I extract the tar'd stream to the local file system. All this is done in one line.

Of course, while all this is happening, I'm downloading some missing Reggae albums from the server onto the Palm, for my commute home from work this evening.

The traffic moves from the file system on my Palm to my dev box via it's USB2 interface. My dev box talks to my admin box using SMB over ethernet. My admin box talks to my domain's firewall across the world wide internet using a secure tunnel. My domain's firewall writes the data to my file server using NFS.

It all works. It's all concurrent. It's all seemless. It's all amazing. It's all magic.

User Journal

Journal Journal: Linux For Losers According To De Raadt

Theo is right about hardware support. It's hardware support that makes Windows king. PC hardware is designed and constructed to work with Windows. Without efficient high quality graphics drivers for another Operating System, your high end graphics card is wasted, your RAID card is rendered useless, and so on. To his credit, Theo has been pushing hardware manufactures to provide drivers with varying degrees of success.

It may be unfair to call Linux crap, Theo has high standards and we all strive to be worthy. However, the introduction of the File Alteration Monitor in Red Hat Linux 8 dragged the performance down to something akin to Windows. Questionable components implemented with varying degrees of correctness is a well made point here.

Linus thinks Linux should be a jack of all trades rather than a master of one (http://os.newsforge.com/article.pl?sid=05/06/09/2128249). OpenBSD is a master of security, but doesn't make best use of SMP for example (http://www.onlamp.com/pub/a/bsd/2005/01/20/smpng.html). Given that you can never have a computing system where all nodes run the same hardware or the same versions of the same software, I think you must accept that a heterogeneous computing system is more effective. You need to use what hardware and software is best for the right place or job. As such the most important this is interoperability. If Linux makes a better workstation and OpenBSD makes a safer firewall or FreeBSD drive your SMP hardware better, then use them and make it all work together.

The comment, "Does this belong here?", isn't a big deal. It flags a section of code to be looked at in the future. You can only (reliably) change so much at a time. If you bump into something odd, often the best you can do is make note of it. You have to contain changes to manage quality, costs and time.

Posted on Forbes.com 29-Jun-05 as a comment on http://www.forbes.com/intelligentinfrastructure/2005/06/16/linux-bsd-unix-cz_dl_0616theo.html

User Journal

Journal Journal: Another Paradigm Shift

Another Paradigm Shift

It took me a long time to get OO. Even when it was obvious, it seemed to trickle in rather than sink in. Since then I've been wary of paradigms, but was resigned to never really understanding the current paradigm until a different one came along.

OO was exactly procedural looked at from a different point of view. All the code still had to be written, but it lived in different places. The real bonus was it supported limited reuse thru inheritance. It is perhaps ironic that it is inheritance that will prove the death of OO, we've long since given up on multiple inheritance.

I'm struggling to get my head around genetic progamming. This is different from the procedural and OO paradigms in that you spend your time setting up the machine, then let it find a solution for you.

But before you do all that, you have to understand the process and the inputs. Again, new terms: genotype, phenotype, recombination ... The solutions produced are not the best solution, but a set of good solutions. I quite like that idea. It sort of mimics our efforts. We don't come up with the best solution, but stop at solutions that are good enough. It's all looking up.

And the race is on becuase I'll be forced to use Neural Networks soon, and I need to understand and use GA well so I can challenge it.

User Journal

Journal Journal: I don't get it

We learn by linking new information to information we already have in meaningful ways.

Remember learning multiplication? It's just like addition done lots of times. Subtracting, division, roots, integrals are all undoing addition, multiplication, exponents and differentiation respectively.

Using a VCR was the video equivalent of using an audo cassette recorder. Playing a DVD is equivalent to playing a CD (so it should be obvious that you wouldn't record on it).

The key is abstractions. If you understand the abstraction, then you can use it. It's a Platonic idea. Label something a chair and people will sit on it. Label it an artifact and it's not considered polite to sit on it. Check out these new fancy modern art museums. You have to learn what to sit on and what to accept as art (you'd normally dump modern art when you find it in your kitchen).

And so that enevitably brings us back to software. Abstraction is the key. You need abstractions to give you a context in which to understand things. I once read Ken Thompson on Plan 9 say everything in Plan 9 is a file. Everyone knows what a file is, but with objects the operations are different for each object. You need to discover the syntax and semantics of each group of operations.

The virtual world of computers is limited by our imagination, which is limited by our abstractions and conventions.

If the abstractions become too complicated, no one will understand them. And if you don't understand what your doing, you just can't use it effectively. Even the Dali Lama said understand the rules properly so you know how to break them.

User Journal

Journal Journal: Perl

I've just spent the last 15 mins on a Perl tutorial. I need to do some stuff and thought a scripting language might be the way to go.

I am appalled.

I've always heard that C is too complicated and loads of complaints about C++ too, but I never understood the basis. The following is a Perl implemenation of cat taken from http://www.comp.leeds.ac.uk/Perl.

#!/usr/local/bin/perl
#
# Program to open the password file, read it in,
# print it, and close it again.
$file = '/etc/passwd'; # Name the file
open(INFO, $file); # Open the file
@lines = INFO; # Read it into an array
close(INFO); # Close the file
print @lines; # Print the array

Now, arrays are lovely. And in an effort to make them 'easy to use', code like "@lines = INFO" can be written that will read a file into the array.

But has it not occurred to anyone that this sort of code can pull huge content into a sparce resource? Hasn't anyone thought of running out of memory? There is never enough you know.

Just last week I had to search a 40G file for some stuff and used "cat filename | strings | grep pattern | less". Try doing that with the Perl implementation of cat.

Here's another extract:
"It is also possible to assign an array to a scalar variable. As usual context is important. The line

$f = @food;

assigns the length of @food, but

$f = "@food";

turns the list into a string with a space between each element. This space can be replaced by any other string by changing the value of the special $" variable."

As soon as I saw the convenient serialisation of the array I wondered how delemiters might be specified. I thought there must be a default parameter somewhere, but I had only to read the next line to see that there's a global variable that specifies the delimiter. Doesn't this seem very wrong to anyone else?

The whole thing reminds me of BASIC all those years ago.

User Journal

Journal Journal: Computing: Engineering Failures

Computers exist and their capabilities are primarily limited by our imagination and secondarily by our inability to instruct them. It's instruction that I'm concerned with here.

We seem to have come full circle with the commercial programming of computers. We've gone from inline code to macros to subroutines to procedures to objects to inline code to templates(macros). Subroutines of common code are looking favourable again.

I have been working on some OO code for the last few weeks and wish subroutines of common code were used, window updates done in one place, variables properly initialised, resources released, cursed smart pointers and singleton objects.

It seems we've spent the last decade or so training prospective computer programmers OO & Java without learning the basics of software engineering. Factoring of common code, management of resource scope and lifetime, efficiency and maintainability have all been forgotten.

We now not only have spaghetti code, but now spaghetti objects too. All that coupled with uninitialised variables and unreleased resources makes for a maintenance nightmare.

Patterns. Back in 1993 patterns entered the realm of computer software engineering. The idea was simple. If you ask a novice how to do something he'll devise a scheme for doing it. If you ask an expert, he thinks what the task is most like that he's done before and do that, thus not reinventing the solution. The notion of education is to pass on experience and patterns we thought to embodied this very well.

The problem of course is that patterns aren't often integrated with existing experience, creating a parallel existence, generating write only code that makes no sense whatsoever to the maintainer unless it is known prior what pattern has been used. And even then, the implementation may require a new framework to be constructed, forcing developers to learn yet another framework. Now that OWL is forgotten, isn't MFC enough? No it isn't, we've got a raft of custom ones offering different benefits (e.g. Fox Toolkit for Windows/X portability, and .NET for change's sake).

Back in 1994 I responded to a request by Scott Meyers for a comment on the effectiveness of Lint for C++. My response was, it was extremely useful, but it didn't fare well when checking code that implemented patterns. 10 years later, I can say with confidence that neither do programmers.

We need to reduce the complexity of our code and not forget lessons of the past. Remember, code is written once but modified for years after. It must be maintainable.

Maintainability can be improved in a number of ways.
1 - Don't use esoteric facilities without good cause. I spent a number of years on the first large Windows system and was involved in a number of projects. Various new technologies were used in places just so that they could legitimately be put on developers CVs. I remember having one fight trying to keep COM out of the lowest layer of the security component and keeping Connection Points out of the system altogether. Within 6 months Connection Points were out of favour and never heard of again.

2 - Factor common code into common functions (or a common library, depending on project size). The code I am presently maintaining has a class with three almost identical functions for identifying the same state for threee other parts of the same library. Surely one should suffice. If there was a problem, and there was, if would only need to be fixed in one place. having each object on a grid update their section of the grid may seem like a cool OO thing to do, but it contributes to maitenance hell.

3 - Initialize variables using language facilities or common practices. When using C++, don't get carried away with magical object relations and forget to initialise your variables or ignore const correctness.

4 - Keep it simple! Use the KISS principle. I went for an interview back in 2000 and was told that the product under maintenance had over 600 COM interfaces and the next release would add another 150. I was not impressed.

User Journal

Journal Journal: Computing: Mathematical Imprecision

The Amsterdam Derivatives Exchange relies on it's Market Makers to establish prices for contracts. This is in contrast to the London Exchange where Options Contract prices are determined by a computation based on variety of parameters. So if an Option Contract has a particular settlement type, then so does its underlying Futures Contract because they are traded and settled on the same exchange. So A=B, B=C implying that A=C.

Computing modules apparently do not obey commutative rules. Quite rightly, we may think, the application should handle Options based on the properties of the Options contract and not imply (thru the commutative rule) that the properties of a related (underkying Futures contract) may be used instead.

Not all mathematical opjects commutative and distributive and so on. It would be nice if computer programs within a system could exhibit mathematical properties. That way methods could be developed that could prove a system correct. (I do recall that Z and VDM attempt this.)

One might then be tempted to push the thinking back into the programming language. LISP is such a language and there is a host of Functional Programming Languages out there. They don't host state though. Big deal, we're not supposed to hold state anyway are we?

What does that mean for a Web Page? Can a Web Page be thought of a Mathematical Object? What does it mean to update the thing with new colours via the stylesheet? Or to change the login screen? Or to change a bit of server side behaviour such as a search routine or database access procedure?

Can the notion of a comlpex number or matrix, the idea of a composite mathematical entity be used here?

I'll have to think about this a bit more.

User Journal

Journal Journal: Open Source

No one can figure out what to do about Open Source. Some of us love to produce stuff and marvel at what other stuff has been produced. gcc and Anjuta are fine examples of collaborative efforts.

There is a whole raft of "thinkers" who comment on the state of these matters and hope to influence the conditions that the Open Source movement will eventually produce.

There are negative criticisms. These are seen to come from supporters of the big vendors. The production process is criticised. The team of wild ones creating stuff (viruses too) seems tractable. After all you can't control quality in such an environment. Cost too is mentioned. Who will produce something without getting paid? Notions of the C word come to mind. Big business is afraid of this Open Source thing because it is not just thinking, its acting outside the box.

As the debate rages on, new releases of Operating Systems and software continue to trickle onto the worlds servers. PrimeOS. Remember that? I do. It was a proprietary operating system for Prime computers. Who needs a proprietary OS by the way? We just need a few standard ones that handle digital watches to Super Computing arrays. We'll at least were getting there. That's one effect of the Open Source movement.

And the cost of kit is coming down. Not really, it's settled to a value somewhere between a Washing Machine and a High End TV, for home kit anyway. And coupled with free software, the only money will be in the data. This simply cannot be avoided. So if M$ are lucky, .NET will move your documents off your box to a place you must pay to access it. Fortunately for us, they can't write a piece of network software to save their life. But someone may succeed where they fail.

Back to Open Software. Can anyone make munny from it?

RedHat's dropped it after having a really good go at it. But when they release an xmms that can't play MP3s because of licensing issues, they've just gotta call it a day and move on.

Windriver's feeling the pinch as customers a coming 'round to accepting that Open Source OS's aren't so bad. So BSDi is probably not selling that well, when the customers realise they can switch to FreeBSD or NetBSD stable releases. And off course there's the carrot of the current branches. When FreeBSD 5 goes stable, it'll be the bomb.

So much for quality. They're not 14 year olds. They are the best developers in the world who often have day jobs delivering quality proprietary software. And there are processes. Check out Debian or FreeBSD.

So the question is, what will be the computing model of the future? How will Open Source pressure change it from what it is now? Will it matter to ordinary non-technical consumers?

User Journal

Journal Journal: Distributed Systems

It has come to my attention that Distributed Systems are no longer in vogue. From my experience at The Bank, it seems that no one knows how to support a distributed system.

I saw T3 yesterday. Skynet was distributed. And Clair Danes is all grown up.

And what about Lain? ... and that AI in Dune?

It's software you see, but not all software. We can simulate P2P networks on TCP/IP networks, but we don't face all the issues.

But P2P is only one dimension.

DNS is distributed. That works quite well in practice. It's a pain to administer too. Most DNS' are incorrectly configured, placing unncessary load on the 10 central nodes.

Music sharing seems to function well enough in practice, but doesn't seem to be good enough for critical applications.

For example, what about a distributed financial exchange? Does that even make sense?

Dunno, just the musings of a grey engineer.

User Journal

Journal Journal: Meta-moderating 16/07/03

I did a spot of meta-moderating today. I also posted a comment.

So now I've done it once!

Slashdot Top Deals

Think lucky. If you fall in a pond, check your pockets for fish. -- Darrell Royal

Working...