Forgot your password?
typodupeerror
The Internet Books Media Book Reviews

Writing Apache Modules with Perl and C 43

Posted by Hemos
from the must-learn-to-do-better dept.
Thanks to darrn chamberlain for an excellent review of the Lincoln Stein and Doug MacEachern book Writing Apache Modules with Perl and C. This is an excellent book for those considering working with Apache and mod_perl, and helpful for C programmers. Click below for more details.
Writing Apache Modules with Perl and C
author Li
pages 724
publisher O'Reilly, ISBN: 156592567X
rating 9.5/10
reviewer darren chamberlain
ISBN
summary Absolutely essential for anyone who is considering using Apache and mod_perl. C programmers may need more.

The Scenario

If you're like me, your first introduction to Perl [?] was in the form of CGI [?] scripts. A few years ago, I inherited a few dozen ancient CGI scripts (Perl and otherwise) that required Immediate Attention. CGI led to Perl, and to Apache [?] ; Perl and Apache led, naturally enough, to mod_perl [?] , once I started hitting the performance bottlenecks inherenent in CGI programming. After researching mod_perl, building a mod_perl-enalbed Apache, and reading all the available online documentation, I got it up and running--and I was suitably impressed.

So, when O'Reilly [?] announced a book devoted to programming Apache with Perl, I was extremely excited. The book starts with an introduction and history of web programming, introduces CGI and other types of web programming (server API [?] 's, such as ISAPI and NSAPI; embedded processors, such as mod_perl, mod_dtcl, and mod_pyapache; FastCGI; Java [?] servlets [?] ; ActiveX [?] ; and client-side scripting languages, such as VBScript [?] and JavaScript [?] ), and then describes the Apache module architecture, using some simple examples ("Hello, World" in Perl and in C). Then it gets good, covering dynamically generated content; the hobgoblin of HTTP, state; and all the other stuff that gives CGI programmer nightmares (like authentication and authorization).

What's Bad?

Although the title reads '... with Perl and C', the emphasis is very obviously on Perl. The C API reference chapters (chapters 10 and 11, pages 505 through 631) are very thorough, but almost all the examples are in Perl only. In fact, the authors go so far as to recommend that almost all Apache modules be written in Perl, and not C, except for very small modules or modules that need that extra speed boost or small memory footprint of being compiled into the server (page 13: "Anything you can do with the C API you can do with mod_perl with less fuss and bother."). Their reasoning is sound: mod_perl modules and scripts require a server restart at most, and often not even that, while for C modules, Apache itself must be recompiled; but I was expecting more in this area, perhaps a larger section on using DSO. After the book was published, however, several of the Perl-only examples were ported to the C API, and are available for download.

A few of these examples have already been published, and in these cases the book is mostly redundant. Notably, the Apache::NavBar module (which Lincoln uses on the server in his lab) and the Apache::AdBlocker module (chapters 4 and 7), appeared in The Perl Journal last year (issues 12 and 11). This is not that big a deal, since both of these modules are incredibly useful and probably deserve to be published in a few more places, but two brand new modules would have been most welcome, especially since the book's target audience probably also reads The Perl Journal.

What's Good?

There's a lot to like here. Since I'm a Perl programmer by trade and disposition, I personally liked the fact that 99.9% of the examples were written in Perl. With only a few exceptions, the modules could be copied into the right locations and run immediately; the exceptions were the modules that made use of either other programs (Chapter 5's Hangman program which uses a relational database to store state information) or specialized Apache features (Chapter 7's Apache::AdBlocker module, which requires proxy functionality).

Much of the text and all of the source code is available on the web at www.modperl.com. Chapters 6, 7, 8, and 9 can be found on the web site for the book, as can all the Perl modules and some of the examples in functional form (Apache::Magic and hangman).

Chapter 9 is the key chapter, and the heart of the book. It describes in great detail all the Apache:: modules. If you use mod_perl at all, download and print this chapter. Memorize it. Use your favorite indexing script to make it searchable. Everything you need to know about mod_perl is here in this chapter.

The appendices are also excellent, although, because it is an Apache book, I would have figured that several of the sections would be regular chapters, and not relegated to the end. The appendices are divided pretty evenly between concentrating on Perl and on C, unlike most of the rest of the book.

So What's In It For Me?

Fortunately for people like me, there is a lot of information about mod_perl on the web; The Perl Journal has had several articles on it, WebMonkey has had an article or two, and so on. There is a comprehensive mod_perl developer's guide on the offical Apache/Perl site. Lincoln Stein uses it a lot on his site and in his software. And, of course, we have the man pages and perldocs. So why do we need a book?

A few reasons. First and foremost, few of those sources go into the kind of detail that this book does, while still being approachable. Second, the book focuses on Apache, programming Apache, and (to a lesser extent) programming applications on the web; Perl and C are the means here, not the end. The in-depth technical discussions are about Apache: how it translates URI's to filenames, how it handles subrequests and internal redirects, how it maps files to MIME types. It then presents techniques for usurping these functions, customizing each phase of the reponse process, and explains when and why you would want to do this, instead of letting Apache do it's own thing. Creating checksums on the fly, compressing and decompressing data, creating extremely flexible HTML preprocessors, and modifying outgoing and incoming headers are some just some of the given examples.

The reference chapters are probably the single most valuable thing about the book. If you are a Perl programmer on a budget, you can download chapter 9 from the web site, but the C programmers out there have to buy the book to get the C API refernce. The C reference is 2 chapters (126 pages) long, and covers all the functions in precise detail.

For those among you who are using Microsoft operating systems, the book pays special attention to building, installing, and configuring mod_perl and Apache on Win32 systems, where it is different from Unix and Unix-like systems. Most of the actual modules are very similar (except for the obvious ones, such as scripts that call sendmail and the scripts that access MySQL), but the installation and building of mod_perl (or ApacheModulePerl.dll) are very different. The process is described in enough detail to make it possible, without boring those readers to whom it is irrelevant.

Conclusion

Programming Apache/mod_perl without this book is like writing Perl without the camel book. It can be done, but it is much easier and more enjoyable with the book. The writing is clear, informative, straight-forward, and, at times, amusing. The authors are the definitive sources for information on mod_perl and CGI programming, and this is reflected in every aspect of the book. While not as definitive for C programmers, it is still the best Apache API reference out there, other than the actual source code itself.

Purchase this book at Amazon.

Errata

Table of Contents

  1. Server-Side Programming with Apache
  2. A First Module
  3. The Apache Module Architecture and API
  4. Content Handlers
  5. Maintaining State
  6. Authentication and Authorization
  7. Other Request Phases
  8. Customizing the Apache Configuration Process
  9. Perl API Reference Guide
  10. C API Reference Guide, Part I
  11. API Reference Guide, Part II
  1. Standard Noncore Modules
  2. Building and Installing mod_perl
  3. Building Multifile C API Modules
  4. Apache:: Modules Available on CPAN
  5. Third-Party C Modules
  6. HTML::Embperl--Embedding Perl Code in HTML
This discussion has been archived. No new comments can be posted.

Writing Apache Modules with Perl and C

Comments Filter:
  • by Anonymous Coward
    I was actually very disappointed with this book. I might suggest looking elsewhere.
  • I found the C API to be very well documented, and the examples I found on the web were fairly concice and illustrative. Why anyone would want to burden their server with modules written in Perl is beyond me though.
    ----
    Dave
    All hail Discordia!
  • by Ed Avis (5917) <ed@membled.com> on Tuesday September 21, 1999 @01:14AM (#1670034) Homepage
    Why anyone would want to burden their server with modules written in Perl is beyond me though.

    I think the idea is that the Perl interpreter is loaded at startup as part of the Apache process. The Perl programs are also compiled just once at startup. Once you've done this, running modules written in Perl simply involves interpreting bytecode, which although not as fast as C, is probably fast enough for most applications. Process creation overhead and loading / compiling scripts is usually the real killer for performance, not executing them.

    Besides, how much time does the machine spend in the Perl script, and how much calling Apache API functions? And how relevant is any of this, given that the biggest bottleneck is often bandwidth, not CPU time?

  • by johnnyb (4816) <jonathan@bartlettpublishing.com> on Tuesday September 21, 1999 @01:26AM (#1670035) Homepage
    This book was really good for an introduction to modules for someone (like me) who had never done anything beyond fork/exec CGI scripts. However, as you learn more, and try to do more interesting stuff, you find that the book skimmed the surface on several areas. Basically, for anything very technical or sophisticated, take the book with a grain of salt. Don't assume the book to be 100% correct on every point. They make a lot of mistakes. However, it was definitely worth the read and the money, and I use the appendices quite often when trying to find the function I need.
  • Has anyone else noticed that O'Reilly have their own web server [oreilly.com] which competes with Apache?

    I expect the publishing and software divisions are kept separate, to avoid the IBM syndrome of products being squashed / crippled to avoid 'cannibalizing' sales of products from another division. But it still seems a bit strange.

  • This book is excellent. You can learn enough from it to get way into the internals of the server, and the focus on perl is warranted. With mod_perl + apache, there is a near perfect marriage of performance and development time.

    I knew many of the things discussed in it, but the added detail of the chapters taught me many new things. If you have access to a mod_perl server to develop on, this book will fill your head with great ideas for features, design strategies, and even does a great job of cataloging "fun" CPAN modules out there for the taking.
  • Could you be more specific? What is it you didn't like?
  • Most of the actual modules are very similar (except for the obvious ones, such as scripts that call sendmail and the scripts that access MySQL)

    I'm not sure how the book examples access MySQL, but I use DBI. Scripts run on NT or unix without modification. Otherwise, DBI would be pointless.

  • by Anonymous Coward

    This summer I was in intern at Cold Spring Harbor Biological Labs where Dr. Stein works as a bioinformatician! I got some help from him a bunch of times and worked with some of his postdocs.

    We also heard a presentation from him regarding his internet interface to the DB of the C. elegans genome. He's a nice guy and something of an interesting character, and definately knows his perl!

    Respectfully,

    Kevin Christie

    kwchri@maila.wm.edu

    PS - Perl rules!!!

  • Aqualung's post was not a troll - he has a point in that mod_perl is a slippery beast. If it's not used *just right* it leaks memory like a string vest.

    Perl is also a no-no (in mod_perl or straightforward standalone guise) for very heavily loaded sites. At Yahoo!, Perl is considered too resource hungry for use on the frontline webservers.

    This leaves you in the unenviable situation of writing leakless, bugless C or C++ code. Catch 22 time ...


    Chris Wareham
  • Any C programmer worth his salt can figure out how to write a module by looking at other modules and the Apache source code.

    What kind of hand holding is next? Apache Module Wizard integrated into bash? ;)

    Let's face it; most computer books are written purely for profit. Particularly ones about dreary, passionless, narrowly-defined topics like writing extensions to a particular application.
  • I found the online docs for writing Apache modules in C to be sorely lacking. The C functions are vaguely defined (some have no definition at all) and no examples given. This book on the other hand lists every C function call and shows an example. Yes, most of the book is mod_perl biased but the C chapters are worth the cost of the book alone.

    This book makes a great comliment to online docs for C module writers. I'm also on the Apache module writers mailing list and I happen to know that most of the other people on that list refer to this book often -- it is the defacto bible for Apache module writers who use C.

  • Maybe it's just me but I find the Everything links much more distracting than helpful. I see where they're a good idea in theory (ZDNet has something similar that's pretty good) but I think Everything is more of a facetious geek manifesto than a reference source. Is it useful for someone who doesn't know what ActiveX is to learn that it's "One of Bill Gates' and Microsoft's evil minions to take over the world. Life would be much better without it."?

    Just my $0.02

  • by Jeffrey Baker (6191) on Tuesday September 21, 1999 @03:12AM (#1670046)
    If you want the authors to make a little more money when you buy this book, use one of the links on www.modperl.com [modperl.com].

    There are links to Amazon.com and O'Reilly.

    Cheers,
    -jwb

  • Are there any resources out there that DO go beneath the surface? I'd love to write some of my own Apache modules (in C) and would like to know some of the nuts&bolts without walking through all of the httpd code.

    Nevertheless, I plan on picking up a copy of this book. 8)

    --jeddz

    p.s. every time I think about moderating a topic, I end up posting to it!

  • not really a conflict of interest, since the software component is built in house - and the books are authored elsewhere.

    note that WebSite Pro only runs on NT/98/95 whereas Apache runs on whatever you can build it on. And O'Reilly use Solaris as the hardware for www.ora.com and linux.ora.com ( the latter is definately running Apache for the webserver, the former cannot be running website) and others check out Netcraft details for Ora sites [netcraft.com].

    Website Pro does look to be quite a nice product, and should displace IIS as a good sererver for these platforms (NT etc).
  • I found the C API to be very well documented, and the examples I found on the web were fairly concice and illustrative.

    Then could you share with us where on the web you have found useful documentation for the C API? I sure would like to find such a thing, and do not believe that it exists until someone proves otherwise. The part of the API spec [apache.org] in the online manual [apache.org] that are finished are fine; but there are very large and important areas that are not covered at all. In some places, the author evidently did not get past his outline (here's [apache.org] an example).

    This is a gap that evidently needs to be filled in by the book.

    Why anyone would want to burden their server with modules written in Perl is beyond me though.

    I have written an Apache module in C and quite a few handlers in mod_perl, and the advantage of mod_perl is just as the same as any Perl programming over C -- you get a lot more done a lot more quickly. I can get a handler done in mod_perl in an hour that would probably take me all day if I wrote it in C. A C module forces you to spend too much time memory management, string twiddling and core-dumping, and Perl is a great relief from all that.

    To be sure, sometimes you need to squeeze every last bit of speed out of your software, and that's when you probably need C instead of Perl. That's why I wrote the Apache module. But if you need to program to the Apache API in a hurry, mod_perl's the thing.
  • by Anonymous Coward
    I think this book is even better than O'Reilly's "Definitive Guide to Apache" to learn about how Apache works internally. Of course, this is due in part to Lincoln Stein, who is a great author.
  • I'm not sure how the book examples access MySQL, but I use DBI.

    The book uses DBI in their database examples. I'm sure everyone is also aware of the Apache::DBI module, which keeps persistent database handles available for each child process of apache.

  • Seriously. This is *not* a flame.

    How do you suggest people generate dynamic web pages?

  • C versions of some of the examples are posted at http://www.modperl.com/cmodules/ .
  • Seriously. This is *not* a flame.

    How do you recommend people generate dynamic web pages.
  • by Anonymous Coward
    I also found the C documentation lacking. The top-level overview stuff is fine, but when you want to dig deeper, the detail is just not there. I end up going to the source of other modules to figure out how to do things.

    I *REALLY* disagree with the author's assertion that apache modules should be written in perl. Many apache modules end up being glue into an existing system. Most of the benefit of being an apache module goes away if it consists of perl code that calls 'system' on existing programs. For peak performance, the existing code must be glued directly into apache, which means using C.

  • How do you suggest people generate dynamic web pages?

    What, me? You're not asking me, are you?

    Well, anyway, at the moment I'm developing a web application using CGI and Perl, together with a handful of useful libraries such as CGI.pm . Later, I can move it to something like mod_perl (or in this case, PerlIIS) to increase performance.

    I think the most important decision is how you will store your data. Will you use an SQL database, a flat file, serialization (something like the Perl Storable module) or even something funky like OpenLDAP?

    If you want SQL, it might be good to go for something like PHP or ASP that has extensive SQL support 'built-in'. Of course Perl has SQL modules too, but it's probably not quite as easy (I haven't used PHP / ASP). If you don't, you have a much freer choice. For groups I really cannot say what to use, but if you're working on your own, just use your favourite high-level language. It probably isn't worth learning a new scripting language just for Web development - there are too many already.

    If you don't already know a scripting language, go out and learn Perl at once. Yes, I know Python and many others are a lot cleaner, but Perl is fun to learn, there are lots of good books on it, and you'll probably end up having to use it someday anyway.

  • being primarily a c programmer, i was initially dissappointed that this book seemed to focus almost entirely on mod_perl. (while i like perl for quick and dirty wirk, it never fails to infuriate me when i try to use it for anything significant) however, after reading through the book, i still found it to be extremely useful. the middle chapters give a very good explanation of how the apache api can be used to do what you want. Basically, i skimmed through the rest of the book to get the basic concepts down, and since then i have lived in chapters 10 and 11, the c api reference. even without the rest of the book to explain how to use it, these two chapters are by far the most useful reference for the apache api that i have found.
  • I tend to be doing database driven stuff, mostly with MySQL, but occasionally with Oracle, Informix, etc. Apart from this it's information stored in memory mapped files, which are updates from live feeds.

    The actual web pages tend to be HTML hardcoded into C and C++ programs, with the dynamic stuff coming from the database or memmapped files. For instance, I am currently writing a reporting system. This is a C++ database load program that uploads the tables once every 24 hours. The searching is done by several C programs tailored to the individual search being performed - in other words one program for editors, another for authors. The nearest thing to 'templates' that it uses is a static library that has output routines for various headers, footers and standard menus.

    This is a little bit more laborious than using say PHP3, or mod_perl. However, it is blisteringly fast and efficient.

    One reason I tend to shy away from Perl besides the performance or resources issue, is the question of maintainability. It is very easy to get the job done quickly in Perl. It's also easy to write terribly unreadable code. One of the systems that I am replacing is simply line noise and a bunch of cron jobs. The other does absolutely no error checking, and has been missing many errors in the data feed for the last two years.

    You may argue that the issue of Perl code maintainability is down to the authors of the original systems, but Perl encourages quick hacks. When these hacks go into production they end up being a nightmare to maintain or enhance.


    Chris Wareham
  • Also note that because a perl script is interpreted its code actually goes in the data segment, not the text segment. Hence there is no memory sharing of scripts among the httpds.
  • According to Philip Greenspun's Philip and Alex's Guide to Web Publishing [photo.net], Amazon uses CGI programs written in C.

    "Amazon.com has a market capitalization of $5.75 billion (August 10, 1998). They built their site with compiled C CGI scripts connecting to a relational database. You could not pick a tool with a less convenient development cycle. You could not pick a tool with lower performance (forking CGI then opening a connection to the RDBMS). They worked around the slow development cycle by hiring very talented programmers. They worked around the inefficiencies of CGI by purchasing massive Unix boxes ten times larger than necessary. Wasteful? Sure. But insignificant compared to the value of the company that they built by focusing on the application and not fighting bugs in some award-winning Web connectivity tool programmed by idiots and tested by no one."

  • in MIPS assembly. seriously; C is way too high level, and perl is write-only line noise. MIPS assembly is the way to go. if you really must run x86 servers, you can always use the mod_mipsasm [a.joke.stupid] emulator.
  • FWIW, perl's database support is extremely good *and* easy to use. the only difference that "having it as a module" brings to your code is a "use DBI;" statement at the top.
  • woo.. now that's a joke that went way past you! and i thought that picking MIPS, *and* recommending an emulator inside apache (as if such a thing existed), *and* pointing to a broken link [slashdot.org] would make it obvious enough!
  • It does the same as a mod_perl script running under Apache::Registry, but you can use Java instead of Perl.

    So... if you know Java, but not Perl, you should use mod_jserv.
    --

"More software projects have gone awry for lack of calendar time than for all other causes combined." -- Fred Brooks, Jr., _The Mythical Man Month_

Working...