Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Perl Programming

Perl Domination in CGI Programming? 432

at0m asks: "Perl seems to be the language that most people use for CGI programming. But is there a good reason for it? Sure, it's easier to use Perl than a lower level language, but programs would be more efficient if C/C++ were used. Programmers don't sacrifice this efficiency when programming applications, so why is it the standard to use a higher level language for CGI, when one wouldn't use one for other programs? I've been using Perl for all my CGI projects, but for my next one I'm contemplating using C++ to make it more efficient. did I make the correct assumption?" Most CGI apps prefer development speed over the complexity inherent in C/C++, which is why Perl and other scripting languages are used. Your thoughts?
This discussion has been archived. No new comments can be posted.

Perl Domination in CGI Programming?

Comments Filter:
  • by Anonymous Coward
    I agree with you 100 percent my brother.
    You have to remember that a programming language is just a tool. It aids oneself when implementing a solution to a problem. Different problems can be solved in different ways. So different programming languages are geared towards solving different types of problems.
    Python is a good language for medium to large cgi scripting. Perl is great for a small scripts. However, once you write up a large app in python it is a bit tough going back.
  • by Anonymous Coward
    Well, you asked a few questions, so this is really the simple answers.

    Perl seems to be the language that most people use for CGI programming. But is there a good reason for it?

    CGI scripts mostly just perform text manipulation, since Perl is so good at text manipulation it is an obvious choice. The second big advantage for Perl is CPAN... there are Perl interfacces to everything on there. So even if your CGI script has to do more than text processing, chances are, that code is already written for you and all you really need is the code to glue it together.

    Sure, it's easier to use Perl than a lower level language, but programs would be more efficient if C/C++ were used.

    The simple answer is due to one thing, network lag. When executing a CGI script, execution time is orders of magnitude faster than the network can pump the data out at. So the network is the bottleneck.

    As network speeds increase, you might reach the point where execution speed was the bottleneck, however until then you should use whatever tool is best for the job. Since CGI scripts do a lot of text processing, Perl is a natural choice.

    Programmers don't sacrifice this efficiency when programming applications, so why is it the standard to use a higher level language for CGI, when one wouldn't use one for other programs?

    The bottleneck with a local application is execution speed, so to speed it up, you make it more efficient. That aside, some applications are written in interpreted languages. As computers continue to get faster, this will become less and less of an issue.

    I've been using Perl for all my CGI projects, but for my next one I'm contemplating using C++ to make it more efficient. did I make the correct assumption?

    In a word, NO. You will not see increased speed, which as I understand was the sole reason you wanted to try C++.
  • by Anonymous Coward
    Perl is the better tool for CGI programming over C/C++. Perl is _the_ choice tool for text manipulation - manipulating text with C is like trying to shuffle cards while wearing boxing gloves. Perl is interpreted and therefore doesn't need compilation (faster turn around time and ease of debugging and program life cycle maintenance). Perl is a simpler language with less chance of making syntax/logic/memory errors. Finally, the bottleneck in CGI programming is the CGI gateway itself. If you're getting slowdowns, switching to C/C++ will not improve the situation because new processes are spawned for each simultaneous execution. You can get around this problem by using mod_perl, or upgrading your hardware/bandwidth ;-) You can also switch to java servlets. Having said all the above, C/C++ is the choice tool for building regular (non-cgi) applications.
  • Your argument may be true for the special
    case of a T1-connected PC running a simple
    web site, but as the bandwidth increases and
    the web site gets more sophisticated, you will
    find that indeed perl does slow you down.

    We run a server farm of about 75 production
    boxes, and we regularly exceed 50Mbit of
    traffic (more than a T3). The web site is
    somewhat complex, with over 100,000 lines of
    perl code behind it, and every page view is
    served under mod_perl. At the busiest times,
    the load average on the boxes exceeds 15,
    but there is certainly still plenty of
    headroom available on each box's network
    card.

    Basically, it all depends on your setup. For
    us, the combination of perl, mod_perl, and
    Apache has been fabulous. But if you need
    a little extra performance, writing your own
    Apache modules in C may be the only way to go.
  • um, maybe you should read a bit about Servlets before posting...

    Servlets are serverside only, you have no control over what the server is using to build the page your are seeing.


    henri
  • wow,

    when i wrote the response no one else had posted anything, by the time i hit submit you got pounded...

    sorry bout that.

    henri
  • > I think that perl sucks. I did a quick poll here
    > at the office and the only people that like Perl
    > are People that do NOT come from a 'traditional'
    > programming background. Typically a webmaster,
    > sysadmin etc... These are people that need
    > something quick and dirty.
    >
    > The day they'll ask me to maintain PerlCode I'm
    > quitting!

    Then I suspect you're office is in the minority. Here at our office, Perl is the tool-du-jour. Sure, we can do everything in C/C++ that we can do in Perl, but it takes a heck of a lot longer, especially considering how many platforms we have.

    Just because Perl is easily used for something quick and dirty doesn't mean that's all it's used for. Readability of perl code is *entirely* the results of the abilities of the programmer. Perl is a very forgiving language, and that means that while it's easier to write great code, it's also easier to write horrible code.

    Like I had said in an earlier /. thread about Perl readability... "Perl doesn't kill code... PEOPLE kill code."

    On the subject of the topic at hand, I was under the impression that a lot of the disadvantages of Perl compared to C/C++ (forking to run a CGI, time to interpret/compile the script before running) are taken care of by mod_perl (as noted in other threads). mod_perl, combined with the ability to create web applications much faster in perl would lead me to believe it's well worth it.
  • > To me the biggest speed disadvantage Perl has to
    > C/C++/whatever is that it's interpreted and not
    > compiled, but what little people realise is that
    > there is a perl compiler out there

    As far as I know, the way perlcc works is to compile the bytecode, and then embed the bytecode into a mini perl-interpreter. You may get a speed increase out of not byte-compiling, but that still wouldn't take care of the speed hit from forking and such.

    I suspect a mod_perl solution will still be faster than perlcc...
  • I currently maintain a project which has about 30k lines of perl code, all of which has been developed with very strict coding guidelines by professional (ie., not the 'Perl for Dummies' readers) programmers, and because of my experiences with this project and many others, I have to object to your characterization of Perl.

    Perl is a tool, like any other, and can be misused like any other. Professional programming is an art, not an assembly line job, and some artists are more passionate about the artistry of their creations than others. Your suggestion that no one should use Perl just because some people create heinous Perl code is tantamount to suggesting that no one should paint pictures anymore just because your third grade cousin turns out finger paintings of "Mommy and Daddy" that look more like Roschach tests than human figures. If you don't like Perl, there is, of course, no one holding a gun to your head, and you're free to choose another language. I suspect you will find that the same principles apply to those languages, too.

    If you really wish to understand Perl, and want to learn to use it professionally, I'd suggest that you look for examples outside of the crap you have apparently had dumped in your lap. Like with any language, there is much reward to be gained by transcending that first 'scripting' stage of Perl; rewards which are only accessable to those with more than a passing interest in it.

    Oh. One more thing:

    #!/usr/bin/perl -w
    use strict;

    Anyone that writes Perl code that will have to be maintained should have those two lines (or their analog) somewhere in their file. They will eliminate about 90% of your maintainability problems.

  • I actually wrote a CGI with the full VB 5 once. It sucked horribly of course but it worked...
  • There are a lot of things other than CGI that can be used on HTTP servers -- most servers have either their own scripting language, or C API. The problem with most of those things is that most of scripting langages are limited, and most of C API are either complex or unsafe (run in server's address space and can cause all kinds of trouble if program misbehaves). I have designed my own API for fhttpd, trying to avoid both problems, and IMHO the result [fhttpd.org] is usable.
  • Herbmaster wrote:
    If you spend a majority of your time on a project waiting for it to compile rather than writing code, you either have a really slow processor or a really small project.

    ajs replied:
    I just compiled GNOME from scratch, including glib, gtk+, ORBIt, all of the other support libraries, GIMP and a number of other applications. It took all morning and some of the afternoon. I have a six-month-old machine with 64MB of RAM. Would you care to re-think that statement?

    And how many mornings and afternoons do you think it would take you to code GNOME from scratch? Sounds like you're backing up Herbmaster's comment, not undermining it.

  • The reason an interpeted language is popular is because so many people develop a website on a workstation and then upload it to a server. Perl is a lot more portable for that purpose.

    When you write a perl script, it should still work after your ISP changes servers. That's a big problem when you aren't in charge of the server. It could change at any second and you need to be ready.

  • >> The performance bottleneck is bandwidth, not
    >> performance. Usually, it's the speed of
    >> someone's modem, or the crowded internet
    >> backbones that slow down a web-page's
    >> performance. Using a faster language isn't
    >> going to help that, so typically web-folk go
    >> for the easiest solution.

    > I've had people tell me this before. This
    > assumption can be an illusion. While it's true
    > that you are limited on a per connection basis
    > in many cases, it's also true that the number
    > of requests that can be processed at a given
    > instant in time is also a bottleneck at that
    > instant. So, if you expect to be
    > processing large volumes of hits in a finite
    > window, it's important to have an optimal
    > solution. This point seems to be frequently
    > ignored or forgotten. What does this mean?
    >> ......
    >> Well, assuming you have the bandwidth,

    I see what you are trying to get at but the point is that you *rarely* have the bandwidth yourself. Once you have maxed out your connection that's it. You can't go any faster by saving CPU cycles. Servers still have much more CPU power than bandwidth. The original poster's arguement still holds true.

    Another point I want to make is that things like DB speed can also have a big impact on DB centric CGIs. This reduces any gain from coding in a lower level language too.


    --
    Simon.

  • Perl does a lot of things well. A lot of things do specific things that Perl does better.

    Reasons why Perl is dominant:

    • It's good enough
    • Everyone knows it
    • It's very well documented
    • It's easy to pick up
    • It's powerful once you get to know it
    • It's a bit like C, only nicer

    Using C or C++ for CGI scripts will not give you the performance gains you might think. We are no longer talking about mega-applications which are highly CPU-bound. This goes double if you make sensible use of the built-in functions in Perl, which are themselves written in C.

    Rather, you are more likely to come across one of these bottlenecks:

    • Forking a process or two for every single web request
    • Disk I/O
    • Network I/O

    In real life that means: the amount of runtime that could be saved by writing the script in a compiled language is dwarfed by other factors - notably the effort of context-switching an entire program in and out every time the script is called. This is a fundamental problem with CGI, and a big one if you want all or most of your site to be dynamic.

    So where do you turn from here? Well, chuck CGI and build the scripting language into the web server. This isn't as painful as it sounds. The obvious one is mod_perl for Apache - but Perl, though cool, is BIG and you do not necessarily want a half dozen copies of it resident in memory at all times.

    My favorites - and I use these all the time these days, are:

    • PHP [php.net] if I have to run Apache
    • AOLserver [aolserver.com] if I don't

    Don't be put off by the name! AOLserver used to be called Naviserver, until the company was bought out by you-know-who. It's a rock-solid web server that I've been using it for years. Its embedded scripting language is Tcl, which is not fun to debug, but is extremely elegant once you are used to it.

    There are others, but speaking from my own experience these two have allowed me to turn entire sites dynamic that would not otherwise have been possible - usually hobby sites [angwels.com] that are running on old and donated hardware [slashnull.org].

    Dave

    --

  • Perl is portable. Show me another good language that allows me to write a web-based app or cgi and easily move it between a Mac OS, Windows NT, and unix server in practically no time at all...

    Not to mention that perl is almost always available for cgi, very important for people who maintain sites on like a dozen different server configurations.

    Yeah, perl has it's disadvantages, but it's advantages usually outweight them.
  • Um, yes, Java is interpreted. There is an extra step than pure interpretation(they is called compiling, and it is). But you still need the VM(Vitual Machine, a.k.a. the interpreter) to run the program. It doesn't exist in machine code like pure compiled code. The VM "interpretes" the bytecode and runs it just like any other interpreter.

    Ok, maybe you could call it a high-bred, but I'd still say it is interpreted because of the need of a VM.

    --

  • If you're so gung-ho by calling Perl "interpreted", then I wonder what you mean by a "real compiler".

    The distinction between an interpreter and a compiler is not at all well defined. Both contemporary interpreters and compilers contain a translator as an essential component, a translator which usually outputs machine code for some (real or virtual) processor. So what's the difference? A naive definition (which should be sufficient for discussing perl) says that the output of an interpreter's translator is targeted at a hardware emulator (the virtual machine implementation). The emulator may be embedded in an executable together with the translator output. Conversely, the output of a compiler is targeted at some real chip's machine code, and you can't separate a posteriori the output into an emulator and bytecode. But this is by no means a comprehensive definition, so don't read too much into it.

    The Perl compiler gives you C code. Which you can compile with a C compiler. Which gives you a huge binary, which basically is the same as you have after perl compiles the code.

    I was told by someone who I regard as an authority on matters perl that the perl compiler can generate C code that is better than what you describe: that you cannot separate it into bytecode and its interpreter or into a trivial variation of this theme.

    Perl is "interpreted" about the same way compiled C code is - by a low-level interpreter.

    I doubt the perl interpreter is as low-level as chip hardware or microcode.

  • I'm truly amazed that the comments so far seem to have ignored Perl's primary advantage in CGI programming: Taint mode.

    In taint mode, Perl does data flow analysis, tracking what data either come from or are influenced by outside, untrusted ("tainted") data like environment variables and file contents--thus CGI form data. And any attempt to use tainted data to create files or spawn processes is automatically prevented.

    Other languages could have taint modes, but they don't. So I recommend using only Perl for CGI.

  • It seems that to do anything useful in PERL these days I have to have 15 CPAN extensions which balloon the running size of PERL to ridiculous sizes. PERL has becoming increasingly utilities-centric around here, and we are forsaking PERL for Pike and PHP on the web server. Even mod_perl is monstrous.
  • A typical web site will have the following characteristics:
    • It will not be running any software besides the web site for security reasons
    • It will have a T1 connection to the internet.
    Given those constraints, you can pretty easily saturate the network with Perl and a Pentium 100, let alone a $1000 server with a Celeron 400 in it. So, there's no performance problem or impact on other applications.
  • I think the key point is that it doesn't matter whether it's a Perl CGI, a VB WinCGI, PHP3, JSP, or some highly tuned proprietary C++ application. Badly implemente code can kill a site.

    The point is language selection is the least of your worries in terms of performance. Pick whatever makes the job easiest so a) the code won't be as likely to be screwed up and b) you'll have more time to fix the screw ups that are made.
  • Development aside. How can perl be easier on servers?

    Same work done in less cycles, requiring less memory, the only issue is fork() exec() delay, but that is a server design issue.

    You could load the CGIs into the server context as a library, saving the fork() hit. Clearly this is not a good idea from a stability point of view, however it isn't any worse than mod_perl and it would run at full compiled speed.

    The fork(), exec() hit is pretty tiny anyway.
  • Sorry, you just hit one of my favourite peevees. You cannot cut startup time by 1000% unless you can get negative startup times.
  • CL has several advantages over both Perl/Python and C/C++.

    • Like Perl and Python, Lisp provides an interactive environment. You can make changes to a running program without having to restart it. Plus, modern Lisps give you a real garbage collector, not a simple reference counter like in Python (although newer versions of Python may have a better GC).

    • Like C/C++, Common Lisp is compiled. Unlike C/C++, CL allows you to call the compiler interactively---again, you never have to restart your program. Compiled Lisp code is about as fast as comparable C or C++ code. In fact, most interactive environments compile code on the fly as you type expressions in!

    • Like Python and C++, Common Lisp also provides a robust and rich object system, called CLOS. I haven't done much with CLOS, although I like the idea of multiple inheritance and the ability to dispatch methods based on more than one object (Lisp methods and generic functions can dispatch on any of their arguments).

    • Unlike C and Perl, Lisp is pretty clean, syntacticly. You never have to remember operator precedence or any of the funky variable naming rules. Lisp is case insensitive, although it is pretty easy to override this.

    Several Lisp environments are available, both commercial (Franz [franz.com], Harlequin [harlequin.com]) and free (CMU Common Lisp [cons.org]). There's a complete web server written in Lisp, the Common Lisp Hypermedia Server [mit.edu]. If you want to learn more about Lisp, check out the Associate of Lisp Users [alu.org] and browse through the section on tutorials and books (a good book, by David Lamkins, is called Successful Lisp [psg.com]).

    Not all is happy in Lisp-land, though. There's no archive network like CPAN or CTAN, so you'll have to go digging when you want a regexp package (although I can tell you to look at SCSH for that). While commercial Lisp environments from Franz and Harlequin are available on Windows, the only free Lisp I know of that has been ported to windows is Clisp, which "only" has a byte-compiler (like EMACS). CMU CL, the best free Lisp around, only runs under UNIX. I also don't know of an equivalent to mod_perl that embeds Lisp in Apache, although if you use CL-HTTP this isn't an issue. Still, Lisp may deserve your attention. As old as the language is, Lisp is still years ahead of its time.


    Rev. Dr. Xenophon Fenderson, the Carbon(d)ated, KSC, DEATH, SubGenius, mhm21x16
  • It's true that Perl has a lot of intrinsic features geared toward the mainpulation of text. However, C++ has a lot better support for constructing new library features that integrate nicely into the language. With suitable classes and templates, you should be able to manipulate text just as easily. (Getting your hands on, and deciding what is suitable is another matter). The standard C++ string class already hides the clumsy storage managment chores associated with classic char * strings so you can do stupid things like add two strings together using + without caring how the memory is managed: the objects take care of it. A string class like that could be extended to have features such as the ability to snarf whole files, do regexp substitutions and whatnot. Problem is that it takes a lot of expertise to develop a good library.

    The primary (only?) advantage of perl is that the extended support for text mangling is intrinsic to the language. Wherever perl is used, the perl coder can use the same skills to do the same basic tasks. You don't have to learn yet another class for doing regex substitutions. You could make the argument that a good interpreted language is better than a score of poorly designed C++ classes.

    What about robustness? How do you handle out of memory conditions in a perl program? Part of the perceived ease of programming in perl is that perl programs don't do sufficient error checking. C is also easy to use if you don't bother with the little details, like handing null returns from malloc() or error indications from fprintf(). What if a memory allocation request fails inside the perl interpreter? Can the script recover, or will the whole thing just die at some arbitrary point in the execution? In C++ you can use exception handling to implement a sensible strategy for such conditions and construct fault-tolerant programs without littering them with explicit conditional checks everywhere. It's often not acceptable for some low level module to turf the program in response to an error. Bad C and C++ programs are peppered with calls to exit() which make them difficult to extend or incorporate as subsystems into a larger fault-tolerant application. In Perl, this is the modus operandi. Not everyone writes fault tolerant C++ programs, but the support is there if you want to use it.

    I don't buy ease of use arguments in favor of Perl. The ease is largely a myth perpetrated by people who use it to write compact utility programs that don't have to be robust against failures. There is a very low startup cost to making such programs with perl, and they can be written tersely, so it looks deceptively easy. But try maintaining any significantly large perl program. Programming is more than just stringing clever operators and subroutines together. That is really sub-programming; solving a small subproblem that is part of a larger whole. Perl falls down when it comes to engineering that larger whole. Features like classes and function prototypes are pathetic in perl; it's painfully clear that they were horribly hacked in as an afterthought.

    About your reference to slashdot: would slashdot be so slow if it was written in C++? Those delays can't be entirely network related.

    What does slashdot do anyway? It's basically a glorified BBS. In the 80's, kids used to write predecessors to slashdot in BASIC (the kind with line numbers) with a dash of machine language. These things had e-mail, discussions, voting booths, user profiles with preferences, anonymous cowards, etc.
  • I was speaking purely in the context of compiled C code vs mod_perl, which is what get's rid of the parsing and bytecompiling..
  • I don't think Slashdot or Freshmeat or Deja or Valueclick or IMDB would be the same with PHP instead of mod_perl.

    I am pretty damn sure that freshmeat runs PHP...
  • The performance bottleneck is bandwidth, not performance. Usually, it's the speed of someone's modem, or the crowded internet backbones that slow down a web-page's performance. Using a faster language isn't going to help that, so typically web-folk go for the easiest solution. The easiest solution is to use an interpreted language. The reason is listed in my first bullet point.
    I want to preface this comment by saying I a not picking on you in particular, your comment is well thought out and a good argument. I feel that the bandwidth argument is only a half truth though.
    While the user may not see much difference in the performance of a C module extension and a perl module extension, your server will. CGI's put a very heavy load on the servers that run them. If you write your CGI to be efficient, you will be able to run more concurrent accesses on a single system.
    More and more I'm seeing people that don't think about the resources that they use in programs and how they affect the overall system performance. This is the reason that even though the hardware speed has increased tremendously in the past few years, many new versions of applications don't go any faster.
    This type of thinking is bad for the industry in general. The good news is that not all software suffers from this. gcc for one is much faster now than it used to be, even on the same hardware.
    In short, think about not just how fast your single application will run, but also the effect it will have on the overall system conditions.
    --
    Mike Mangino Consultant, Analysts International
  • The reason that I don't know the RE syntax for C is that I don't use it very often. I tend to do extremely low level programming, like Solaris STREAMS modules and kernel programming, you can't do this in perl.

    I don't want it to look like I'm getting down on PERL, it certainly has its uses. I used it a fair bit at my previous job. I have some complaints about PERL but that doesn't belong here. I was simply saying that there are some misconceptions about C/C++. The reason that you know the PERL RE syntax is because you use it every day. I know the Solaris DDI interface well. That doesn't make it the best tool for CGI (obvious again, I know)

    In too many Slashdot threads, people say X is faster than Y, without really looking about the design. Where I work, we are currently saturating some machines due to a poor program design. The tools they used were incredibly fast, but the design stunk and we are paying the price now. While the comment The speed is based on the quality of implementation is an impossible generalization and remarkably obvious, you would be amazed how many people forget this when writing real world code.
    --
    Mike Mangino Consultant, Analysts International
  • Hmmm, I see a real bias in the comments. Dare I risk being flamed and suggest ISAPI and C++? OOh! A Micro$oft solution!

    It seems that most people's responses are about CGI and how C++ is no more efficient than Perl because the bottleneck is the process fork. ISAPI takes a different and practical approach in solving this problem, and is alternative to Java Servlets.

    I know this because I was asked to use ISAPI Extensions for the project I am currently working on. I could have recommended an alternative solution, but we were already hitting a SQL Server DB running on NT, and we already had an IIS web-server. It seemed silly to me to have another web-server and another DB... Micro$oft technologies work best if everything is Micro$oft. Mixing technologies causes problems, and I'm also a firm believer in a consistent development approach to keep things simpler and easier to maintian across a team. We have deadlines and no time to consider a 100% change to something else!

    Anyway. ISAPI Extensions are trivial. They have better performance than straight CGI. The framework is straight forward (I'm no guru with when it comes to Micro$oft technologies such as MFC, ATL, ODBC, etc) ... linking the web to an ODBC data source is trivial. If you're a good C++ programmer then it's relatively easy. The only thing that I haven't liked is constantly starting and stopping the web-server to unload my ISAPI Extension DLL, and then having to re-attach to the running process in *another* instance of MSVC. But that's just a small amount of extra time that my boss is willing to pay for!

    The bottom line is that there is more to consider than just a straight choice between Perl and C/C++. Do you even need to use CGI? What platform or web-server are you on? Who are the clients that you're targetting? What do have to do on the backend? Databases? If you're a consultant, what environments are the other projects of your employer being developed in? Somebody might have to come and maintain this after you leave, so consistency is good.
  • ...say this:

    That's a fine followup post. It's pertinent, clear, concise, and cool. Thanks.
  • Sounds like you are volunteering to give up a large portion of your paycheck to your new customer.

    Are you?

    If not, how do you expect him to pay for all the up(down?)grade costs?

  • I started with PHP last spring and really like it. Previous to that I did a lot of Perl stuff. When it comes to server side, basic types of stuff, php3 is really nice.


    I use it a lot in semi dynamic web sites where the managers don't want to learn html. Separate the header stuff, the body and the footer and just and watch the fun begin.

    I also think that slamming between php and html is handy for some web sites. I love for loops such as : HTML CRAP . I know there is perl stuff for that too, but php is really quite nice.

    I love perl. I love php. I find perl has a lot of more powerful features that I miss in php (good regex setup is one thing, php's pseudo perl regex's are a pain), but for cheap and fast html, I can't find better than php.

    Plus, with PHP4 it will be compilable which means less of my day wasted fixing "webmasters" "improvements".
  • heheh.... my php code got wacked out. Oops.
  • I've built and deployed several commercial sites in the last 3 years and currently I do intranet application design for a large telecommunication company, so I have some little experience in this.

    I started development on my projects in Perl and moved to C++. Frankly, if you have a good string class and a lib to provide functions such as argument parsing (I wrote my own) I see no advantage to Perl for CGI at all.

    IMHO, Perl is a great write once, read seldom language. Unfortunately the real-world model of web development is write it, change it change it change it as the client and the user community's expectations evolve. A properly designed and structured C++ code set is much easier to maintain in this type of environment.

    Recently, I've been using PHP3 for the more simple pages, and reserving C++ for the heavy duty work. PHP is a great big hammer for the routine form processing or query results display and I highly recommend it in those situtations. However, I won't go back to Perl in any case, and I'd recommend you don't go there yourself.

    ---------
    confined though we are,
    infinity dwells within.
    --------

  • I find that most things I want to do with CGI benefit greatly from Perl's fast, simple text handling. The need to roll your own versions of the standard Perl text mangling functions really puts me off.

    That said, I did reimplement a pretty big Perl CGI in C a while back, in part for speed but mainly because it had to go in with a C based backend suite and the developers didn't know Perl and so couldn't support the old CGI.
  • Perl doesn't do it that way exactly. It compiles
    the entire script into bytecode and then
    executes that. True interpretation is
    significantly slower.
  • For reasons beyond my fathoming (ie. Microsoft legacy lock-in), our firm uses Active Server Pages (ASP).

    Basically server-side CGI in Visual Basic.

    Quite why a server-side programming language has to be "visual" is beyond me. Also incomprehensible is why it takes two pages of ASP to do what PERL does in two lines.

    It's a bunch of arse, I tell you.

    But if you think that's confusing, what REALLY got my goat is that some idiot company came up with a way of serving Microsoft ASPs from UNIX! I don't know what drugs ChiliSoft [chilisoft.com] are on, but I don't want any.

    --

  • > This would be the equivilent of not only
    > programming it all in C/C++, but building it in
    > as an Apache module.
    Well, not quite. "Compiling" a perl program consists of compiling it to byte code, a preparsed format which the perl interpreter can execute much quicker since it doesn't have to parse syntax on it. So, C/C++ code will still be faster that perl code in many cases, however modperl does eliminate the need for reparsing and bytecompiling which are probably the most signifigant issues with start up time.
  • 1) Complexity. Most desktop apps are much more complex than your average cgi. Granted there are some anomaly cgi's that you may consider doing as a java servlet or somesuch setup because it is more complex.

    2) You own the code. It's nice to use an interpreted language because it's easy to change without a recompile. And due to the fact that most CGI's are only running on a handful of machines at most, having to recompile everytime you want to bugfix is annoying.

    3) Mod Perl. Mod perl cgi's are faster than C cgi's, and are easily built from freestanding perl cgi. The execution speed to ease of use ratio is in mod-perl's

    4) Hardware is cheap. If you are getting enough hits on your mod-perl machine where the cgi is becoming a bottleneck, buy more hardware. At that point your making enough money from ad banners to afford it.

    5) Security. Although I would venture java as the safest CGI language, you run an extreme risk of buffer overflows unless you make sure you're memory management is secure in CGIs. Since perl has garbage collection, that is not an issue. Howerver, you still have to worry about user input being interpreted as code, but that is easily remedied with taint, as well as CGI.pm's various quoting methods.

    ...and any other reasons I left out.
  • You're a bit overoptimistic about the re module's performance; current Perl versions have all sorts of optimizations that the underlying PCRE library doesn't have yet. But this also makes the Perl regex engine's code very hard to understand, and effectively impossible to divorce from the rest of the Perl interpreter. (I'd love to see someone prove me wrong and create a perlre.a, but it seems unlikely to happen.)
  • tests have proven superior scaleability as well.

    What?! Tests by drunk people?! Cite a reference here -- my experience is that this is statement pull bullshit. IIS/ASP is a great RAD environment, but add COM and scale it up and you are in for a world of hurt. (BTW, the famous Mindfuck... er... Mindcraft study carefully avoided using ASP or COM in it's tests... hmmmm).

    Why do you think that Microsoft's own Hotmail runs on Solaris?

  • Other languages could have taint modes, but they don't. So I recommend using only Perl for CGI.

    Yeah, too bad that instead of "taint mode", Java is stuck with an incredibly sophisticated security model. Java security can allow fine-grained control over I/O based on who who wrote the code, who invoked it, where it was loaded from -- pretty much anything.

    Perl does data flow analysis, tracking what data either come from or are influenced by outside, untrusted ("tainted") data like environment variables and file contents

    How quaint.

    P.S. For those hitting the reply button to rant about "Java Security Bugs" please read up on the difference between server-side and client-side Java, the difference between JavaScript and Java and don't forget to start your post "I'm and idiot..."

  • Perl is the dominant choice for CGI among Perl programmers. Surprise, surprise.

    Among programmers still clever enough to learn new techniques, there are several "CGI" methods that are rapidly eclipsing Perl:

    • ASP. Yeah, it's Microsoft and it's not too scalable, but it's easy and fast to write and comes with some decent visual tools. Great for small-to-midsize web sites and web site prototypes.
    • Java Servlets. This technology is explained at length in other posts.
    • PHP. Fast, flexible.
    • Python. OO, easy to learn.
    I think there have been some great posts outlining Perl's shortcomings, so I won't reiterate them here. Are there a lot of sites using Perl? Yup. How many new sites are going to be Perl-based? None that I've seen.
  • Do people actually use CGI these days? I've been a professional web developer for four or so years, and it seems to me, at least, that traditional CGI scripting is pretty much dying off if you don't count small scripts people create for their homepages and such.

    I used to do Perl and CGI's but that came to an end (except in some projects where I have to build on an old code base) over a year ago; these days, Java servlets and application servers, PHP, ColdFusion, StoryServer, Zope etc are pretty much the tools that I see being used (on Unix side of things, I'm not well-informed about the things NT developers use these days.) It all comes down to three things: ease of development, ease of maintenance and performance.

    Perl is a quick language to develop small hacks in, but larger scale projects are a different story. An old-fashioned CGI is just way too slow for pretty much anything; I'm not sure about the performance of mod_perl vs. servlets, though. For ease of development, developing scripts from ground up can't compete with a nice application server where you can count on the product to handle DB connections, persistence etc for you.

    And for smaller projects, products such as PHP and ColdFusion offer in many cases an even shorter development time than Perl.

    Of course, YMMV; my views might be somewhat colored by my rather intense dislike of Perl (mind you, I hate Java too, and that is my primary development language these days :-))

    // Juri
  • Geez, why does no one ever seem to get that for building big web sites where you're going to re-use stuff, the only way to go is to have an approach whereby you decouple presentation logic from business logic? Then you can go ahead and use asp (for microsoft people), jsp (for java people, but face its, its basically the SAME thing), or some other component technology.
  • I think you've done a good job of hitting all of the relevant points here. Writing CGIs in Perl is Quick, Easy, and Good Enough. But I seem to be doing less and less real CGI work in favor of SSI-type scripting.

    Embedded scripting languages like PHP for instance tend to one-up Perl on most of the points you make:

    • Rapid Development - Perl was designed as a text processing language. That makes it great for CGIs. But PHP was designed...for scripting web pages, so most of the stuff you are gonna do in a web page, is therefore quite easy. I find it much easier to add code to a web page, than to add HTML to a perl script. Particulary as web pages get more complicated. And editing stuff is MUCH easier than in Perl.

    • Performance - Perl is fast enough for most things. If you run mod_perl, performance is good for higher loads. But PHP is fast out of the box. mod_perl is quick, but you have to be anal about your code and there all sorts of gotchas.

    • Text Processing - well, ok, PHP is not quite even with perl here, but it's acceptably close.

    Really, embedded languages are the way to go for most things.

  • Your system would run much faster under perl. There's no excuse for a search on a 2 MB file taking 5-6 seconds, even on the equivalent of a 486/66.

    I just tested a mildly complex Perl regex on 2 MB of text, 300000 words, in RAM, on my P133; it found 900 word matches in 1.34 seconds. Your described "prototype" of grepping through files on disk really doesn't have anything to do with Perl.

    If your script were written in Perl, running it with mod_perl would be easy. Once the script is persistent, you won't have to read the 2 MB file into RAM every time it runs. That will be the big speedup.

    Perl's regex engine is also extremely fast. If you're doing any kind of powerful expression-matching in C, I guarantee perl not only would make the task easier to write and maintain, but would run it faster too. And if you're not doing powerful matching (prefix searching? full name searching?) Perl has other tools that will be just as fast as C: hash tables in RAM or in a simple database; b-trees; even a built-in module to find all unique prefixes from a list of words (Text::Abbrev).

    I'm not sure what you mean about "parsing binary databases" as opposed to "text databases." Perl has not been a text-only language since roughly version 2. If you're talking about the difference between a linear and binary search, consider that Perl's hash tables are going to be faster still. If you're talking about storing data in a format native to your language, rather than as one big text string, Perl does this excellently as well.

    I think you should look more carefully at including Perl in your toolbox.

    Jamie McCarthy

  • No way python/tcl are just as good for that purpose as perl, after all if there were any good everyone would be using them, right?

    Perhaps Python, though some people prefer to choose whether or not to program in OO. And Tcl doesn't compare: bytecode-compiled Perl and Python blow it away in execution time as well as (IMAO) syntax and features. I prefer Perl because most of the scripting I do involves regexes, and Perl has the best regex support of any language yet developed on this planet or any other that I know of.. In fact, it is only over the last 6 months that I've tried my hand at any other perl5 features besides the expanded regex set and 'my' and 'use'...

    No way there are some instances where speed of program is more important than speed of development.

    There certainly are. However, for general CGI programming, speed of development is far more important than speed of execution. Typically for perl modules, any execution-critical routines are actually coded in C or assembler and simply interfaced to Perl. If it's not execution-time critical, then why be concerned?

    Not like costs per gate are going up...
    Your Working Boy,
  • It is weak in areas such as GUI development

    you mean in terms of the development environment, or creating GUI applications? Because I always considered PerlTk actually pretty easy to understand and utilize, and it's cross-platform...
    Your Working Boy,
  • I also don't know of an equivalent to mod_perl that embeds Lisp in Apache

    Doesn't AOLserver embed CL support?
    Your Working Boy,
  • I agree. It is difficult in C or C++ to do regular expressions easily (at least for me).

    Now, the main reason that I love perl is that it is easy to whip something up for a CGI (demo or production) and not have to worry about seg faulting when strtok() is used incorrectly or you try to pop a token off of the end of a string and it isn't there. Perl complains, but doesn't drop core.

    Plus, up till now, there wasn't an API for MySQL (I may be wrong - I know that one for C++ was released). It is a lot easier to interface perl with MySQL (Or Oracle for that matter) and not have to use an embedded SQL like oracle provides. There is just something about having to null terminate a character array after doing a select that I just do not like.
  • That kinda depends on the complexity of the web application, and the nature of consistency you want with the page layout and all that.

    A recent JavaReport article did quite a good bit of descriptions on how you can use a servlet as the front to the sets of services underneith, which can be servlets or jsp pages, depending on which is more appropriate.

    I personally grow tired of digging through java to find html tags in order to change the page layout (or at least i did when i was building CGI in C/C++). Server-Side includes helped a bit, but when i got an embedded scripting form (w3-msql), I much preferred it. I suppose I'd be more interested in PHP for my solutions if i hadn't already latched onto JSP. I also prefer JSP for its portability. I can take the Jakarta engine pretty much anywhere. I can't guarentee I can get PHP working on a windows box. I CAN guarentee I can get Jakarta working, as long as I can put the JDK (or just JRE) on the box.

    One definite drawback w/ JSP is that it prefers only writing text. It only gives you access to a "Writer" object, and not the OutputStream. This means that if you want to generate an image on the fly (say, a hit-counter that returns a "gif", like early perl code used to do), you HAVE to use a servlet, or bypass the Writer and go dig out the OutputStream object from the Servlet context. This is technically a standards violation, and isn't portable :(.

    Also, the Tag Library part of the new standard is to address the issues of separation of logic from presentation (a common thread in ALL the J2EE products). The logician creates the tags, the presentor creates the web pages that know how to access them. The tags remain as behind-the-scenes java classes that the presentor doesn't need to know about. Does depend on a clean black-box interface design by the implementor, which can be a rarity. Many programmers can build an application. Few can build good libraries.

  • Perl is amazingly portable, without having to recompile. This is more important than anything else.

    Additionally, its readable in executable form, you dont have to dig out the source should you wish to change anything.
  • Personally, I think the primary reason why Perl is used so often for CGI is because of it's text manipulating abilities. It seems a lot of the CGI out there today has to work with text, forms, HTML, etc.

    Now that's not to say there aren't libraries out there for C/C++ which provide similar levels of power for text manipulation, but by default when doing things like parsing text, you will tend to be working on a much lower level than perl.

    Now there is a question of optimization. You can do some things to Perl to make it run faster using mod_perl or FastCGI, but they have limitations. Mod_perl unfortunately has no concept of suexec so it's not real ISP friendly. FastCGI requires code modifications which make it a bit undesirable.
    The overhead of starting a copy of perl and parsing the perl script every pass will take a real toll on your machine if you can't use mod_perl or FastCGI. You could even 'daemonize' your perl CGI to boost performance, but that adds quite a bit of complexity for simple tasks.

    In these instances, it would be wise to use C, C++ or ObjC.. or even Java w/ the new Java compiler. If you have a banner ad rotation script or some other little program which gets called a lot, writing it in one of these languages will not be much more complex than writing it in Perl.

    Really, unless you are doing a lot of text processing, C, C++ and ObjC aren't all that much more complex than Perl and I would recommend using them in place of perl.

    --
  • Oh my god! I must laugh! hahaha.

    I know a lot of people that think C is too lax. Only one way to do something? Are you kidding me? while() vs for() vs until(). Arrays vs. pointer manip. Pointers alone can cause severe headaches if not used properly. Want more examples of problems with C and readability? It is only recently that basic C standards have been widespread that C has become more readable. Java is slightly better in that it is strongly typed, otherwise it mimics C to a great extent.

    I could show you some C code that is still in use that uses the K&R style of C. And I've had other programmers ask me why the params were not in the parens of the function name.

    Ok, I love C. I love Java. I love Perl. Each has it's good points and bad points. Each can be used to write horrible horrible code, or very elegant pretty code. You hit the nail on the head saying that it is the programmer's fault and inexperience. Also the people that hired them. Perl can be maintainable with some standards. C can be very ugly and unwieldy without them. Granted Perl is the most flexible, but since we as an industry moved away from languages that have a forced structure there has always been a problem with maintainability. And it has been up to the shop to enforce some types of standards.

    --

  • Specifically, perl is compiled to a bytecode

    Bytecode is an implementation technique for interpreters.

    Well, you can call it interpreted if you want but if you do then so is java.

    Yes, they both are interpreted languages.

    BTW, there is a real perl compiler in the latest release. It's just experimental.

  • (
    Me An advantage you might get out of using C++ is that tight loops may compile down into much faster code than you would get with Perl.
    Most of the CGI programs that I've written don't even have loops.
    ... Which is precisely why this isn't all that huge an advantage.

    And which is a reason why it might be preferable to use a language like Lisp that can compile to machine code, thereby providing the best of both worlds:

    • Efficiency, by having the parts that need to be fast be compiled, and
    • Reasonable treatment of dynamic objects like strings
  • An advantage you might get out of using C++ is that tight loops may compile down into much faster code than you would get with Perl.

    Unfortunately, you lose some abilities:

    • The ability to change a script "on the fly" whilst debugging, and have the change automagically deployed. With C++, you have to make and then go through whatever installation process is required to deploy the change.
    • Scripting languages [hex.net] like Perl [perl.org] and Python [python.org] provide built-in operators for doing all sorts of text manipulations.

      With web applications, what you're largely manipulating is text, [hex.net] which means that having the language oriented to that is extremely valuable. Furthermore, since there are powerful, well-optimized operators built-in to these languages, the interpreter disadvantage is significantly diminished.

  • Grandma isn't likely to be hacking either Perl or C++; I'd hazard the guess that Python [python.org] is just faintly more likely...

    It's entertaining that nobody has brought up Java, The Language of The Internet, as an option... for a while, that was all anyone could think of.

  • It seems that to do anything useful in PERL these days I have to have 15 CPAN extensions which balloon the running size of PERL to ridiculous sizes. PERL has becoming increasingly utilities-centric around here, and we are forsaking PERL for Pike and PHP on the web server. Even mod_perl is monstrous.

    hah! that's a good one. If you have to install 15 cpan "extensions" as you call them (I think you mean perl modules), whoever wrote the code is module-happy. It also depends on what you're trying to do... if you want database access, you have to have DBI. If you want to do CGI's, you better have CGI.pm. If you want to turn on your coffee maker, you better have some X10 perl module. I'd like for you to point to a language that has small code size, is as fast as perl, and has EVERYTHING built in that perl can do with a module. Yeah, I thought so. Besides, installing a module is as easy as perl -MCPAN -e install DBI.
    As for mod_perl being monstrous- did you pull this out of your ass, or are you just plain stupid? mod_perl couldn't be any easier, and it makes using perl CGI's in a high traffic environment a viable solution.
  • Are you meaning that Perl scripts have a higher overhead? If this is the case, mod_perl takes care of alot of those concerns, where the perl script itself is actually executed from memory, and not simply loaded from disk. This would be the equivilent of not only programming it all in C/C++, but building it in as an Apache module. I WOULD, however, like to see a writeup comparing the overall processing, etc, of a native C based CGI program, vurses that of an equivilent program in Perl, with AND without mod_perl..
  • I like perl, but I have to take issue with just about everything you said.

    Interpreted languages (like Perl and Python) make it easier to develop things in a hurry because you can make changes without re-compiling.

    If you spend a majority of your time on a project waiting for it to compile rather than writing code, you either have a really slow processor or a really small project. A much more relevant measure of rapid development is how quickly people can make the project work. If one person is working on a project, and can write perl code efficiently, that's great. Perl does not especially promote code readability. If a larger group of people is working on a project and they spend a large amount of time just figuring out how each other's code works, that's a problem. C does promote code readability which may be desired in such cases.

    The performance bottleneck is bandwidth, not performance. Usually, it's the speed of someone's modem, or the crowded internet backbones that slow down a web-page's performance.

    Or not. It depends enitrely on the application. If it's a big database intensive project, performance could easily be the bottleneck. Look at, say, slashdot. Slashdot runs on several machines and the load is still huge. When I read slashdot from school it's painfully obvious that it's the database which is slow, not the bandwidth. I don't have to explain to you that it's an issue of both latency and throughput. The program is latent and the page download is throughput, and most of the server-side processing has to be finished before the bandwidth-intensive moving can even begin. If you're making a cgi project which will run on a quad-athlon, and only have several modem users (you WISH you were so lucky), sure, the speed of the language isn't going to make a difference. If you have to buy nice hardware that just raises the price of the project.

  • I've seen this a whole lot and I have to respond.

    Trying to incorporate that sort of thing into C or C++ might result in a speed increase of execution, but if you're still loading entire binaries off disk each time, it's not likely to be that significant and you've got to crank out the whole API for your backend RDBMS (ie CT-lib for Sybase Open Client, ODBC for ODBC access, OCI for Oracle, whatever) which is a large development investment overhead for stuff-all performance increase.

    This is just plain wrong. You don't have to load the binary off disk any more often than you have to load the script off of disk. The OS (assuming something somewhat intelligent) will map often used binary files into a shared data segment, the same way you don't have to load large dynamic libraries every time you use them. This means that if you use the binary a lot, it will be in memory.

    There is a large difference in speed between well written PERL and well written C for most things, the problem is that it is difficult to find good C programmers to do this type of work quickly. C was not designed to be a RAD like language.

    Other people have mentioned that Regular Expressions aren't native in C/C++, which is correct, but there are regular expression packages. You may have to say re_compile(re); re_execute(re,str); or whatever the syntax is, but it can be done relatively easily. The fact that it is not built in does not make it slower. The speed is based on the quality of implementation.

    In shops where CGI's are written in C, they probably already have extensive backend libraries, which speeds up development.

    I personally use PERL for quick RAD type stuff, but if I'm going to have to maintain it or I want it to scale, I'll use C. (Note, I am not a web or CGI programmer, but a UNIX consultant. I use C because it is strongly typed and easier for me to work with, YMMV)

    In closing, when you start talking about speed, check your facts and know what you are talking about. If you compare the execution speed of well written PERL and well written C, C should win every time. When you include development and testing time, the results may differ.
    --
    Mike Mangino Consultant, Analysts International
  • Some thoughts of my own. (Due to cutbacks, readers of the thoughts will need to donate the 2 cents.)

    • CGI scripts are rarely written with a specific web server or a specific platform in mind. Compilable code, such as C, or C++, is so unportable, because of all these "extensions" OS and compiler vendors insist on providing, that you would end up with more #ifdef's than actual lines of usable code, to get the same portability.
    • CGI scripts often need tweaking, even after deployment on a live server. For a script, you just go into the cgi-bin directory, edit, save and you're done. The changes are now live. For compiled code, you obviously need to throw in the compile & link time, too, which is non-trivial.
    • As others have said, string handling in C/C++ is virtually non-existant.
    • Scripts run in their own enclosed shell. Binaries don't.
    • Scripts can be used by anyone, and the interpreter is typically not that big. Compiled code requires sufficient privilages and disk space to install the compiler & libraries, which can be gigantic, and may be directory-specific. (eg: ld.so.2 prefers /lib. Installing it in your home directory can prove entertaining.)

    Having said all that, I wrote a simple CGI text database search engine in C, and it works just fine. Not one line of Perl in it, compiled code all the way. I had the benefit of knowing something about the machine it was to be installed on, so I didn't run into a lot of the roadblocks that I've outlined above. But most people, for most CGI scripts, won't.

  • You're wrong. The perlcc (CPAN->B::CC) makes
    an optimised pre-byte-encoded binary.

    In effect it saves you the startup parse step and allows you to make a "compiled binary". The CB is
    just a copy of the parse-tree bytecode, loader and
    the perl lib. It still has a full compile and
    interpet engine and still interpets the bytecode tree. It can still "eval" perl code so it has to
    have a full compile engine still.

    mod_perl takes the same route by caching the post
    parsed code for speed gains and disk load avoidance.

    Perl offers easy prototyping and quick evolutionary programming (twiddle with live code =).
    --
  • Freaky.

    I was just browsing The Quotations Page [starlingtech.com] and came across...


    "Tradition is what you resort to when you don't have the time or the money to do it right."

    -- Kurt Herbert Alder

  • Because you have to fork a process, that's why.

    The time to execute a CGI isn't just the runtime of the program. It's time to run PLUS setup/teardown costs. In a CGI context, this means forking a process, which is an expensive operation even in Unix.

    If processing time is a serious concern, then you should be using some built-in web server mechanism for processing - mod_perl, PHP, Java servlets, or even a multithreaded server rather than Apache (AOLServer, or even IIS (although NT will find other ways to screw your performance)).

    And then there is reliance on external resources. For example, many web servers are actually selecting from a relational database. Your "efficient" C/C++ program can't easily share a connection, so it will have to start up and tear down yet another connection to your RDBMS. VERY expensive. Compare this to a multithreaded server that pools database connections and doesn't have to fork anything to share them out. See where the performance goes?

    Remember, 90% of the time, the performance bottlenecks are not where you think they are. And they are rarely in well-written code, regardless of the language used to write it. Bottlenecks tend to be in things like disk and network I/O, RDBMS engines, excessive swapping, excessive forking, and other infrastructure issues.

    Meditate on this, grasshopper, and learn.

    ---
    Maybe that's just the price you pay for the chains that you refuse.
  • To be fair in the comparison, you'd have to consider not just using a C CGI (with the Fork overhead), but also FastCGI.

    To my mind (which is one that want to write Python rather than Perl), FastCGI is a nice *general* mechanism for avoiding one significant time overhead in CGI processing. mod_perl solves the same problem, but it only applies to CGI's in Perl. FastCGI works with any-old language.

    Of course, I've only just got my feet wet in these issues, so maybe others can add more about the comparative benefits of various Apache mods (mod_perl, mod_py, mod_fcgi, PHP, whatever).
  • No, see, it just fired off grep once. :) I know, what I said implies otherwise. No, my script had something to the effect of:

    grep (blahblahblah) `find (topdir) -name 00index.txt`
    ---
    "'Is not a quine' is not a quine" is a quine.

  • Properly-written C/C++ code requires absolutely no changes to recompile on another system. I developed the Hobbes engine [nmsu.edu] on an x86 Linux box, and it took literally no time to port it to the RS6k AIX box it runs on.

    As far as realtime changes to a system: do you really want code which wouldn't compile to be executed at arbitrary times on a public system? On-the-fly trial-by-fire testing is no way to do a production system, and properly-done C++ systems don't take a lot of time to recompile (that's what having separate source files is for, after all). Also, with live scripts, then you have to worry about remembering to back up the realtime-tweaked versions.

    C/C++ string handling isn't non-existent, it just requires some thought behind it. There's plenty of regexp libraries; aside from that, strstr(), and strchr(), what do you need? (That said, on Hobbes, I rewrote my own version of strstr which was optimized to run a little bit faster at the expense of getting the last match instead of the first match, since probably 99% of the CPU time during a search is spent in strstr().) And, if you want to get fancier, that's what vector is for. :)

    Scripts run in their own enclosed shell and binaries don't, but binaries don't need to either. The binary is the shell.

    And um, generally if you're the webmaster of a large website, you can install gcc yourself. I did on Hobbes without a problem... and binaries are no more necessarily directory-specific than a script.
    ---
    "'Is not a quine' is not a quine" is a quine.

  • I used to run the Hobbes Archive [nmsu.edu]. For those who aren't in the know, it's an OS/2 shareware archive. It contains about 3.5 gigs of software in several thousand files with an incredibly-deep directory structure. For it, I wrote my own custom database engine for keeping track of the files and generating the HTML. The average search takes 5-6 seconds depending on system load (it used to only take 3, but the archive has grown signifigantly since I wrote the engine).

    This archive runs on an ancient RS/6000, about the processor equivalent of a 486/66. And it doesn't have much memory, either.

    I wrote the entire engine and interface in highly-tuned C++. Because of the requirements of the search, it must go through every single file's entry in the archive (or at least under the topmost directory specified). I made extensive use of the nature of UNIX filesystems for the actual database (each directory has a file index called .file.idx, and builds a recursive search index called .search.idx which contains the current directory and all search indices of lower directories). If I had done things a bit more intelligently I could have probably kept a single .search.idx cached in memory instead of having to reload it on every search (which is painful, since the toplevel directory's .search.idx is something like 2 megs now), but the fact remains that it still goes relatively quickly, particularly considering the nature of the search and the antiquity of the machine it's running on.

    I had briefly considered PERL, but I quickly shot that idea down for several ideas:

    1. I had initially prototyped an earlier version of the search engine in csh using grep. (This was before I wrote the new database engine; the old, flaky, always-dying-and-losing-whole-file-databases one made it so one basically had to just parse the 00index.txt files anyway.) The average search took well over 3 minutes. Completely unacceptable. Granted, csh isn't the fastest thing around, but the script wasn't using anything which PERL could have done better (it was just iterating through a bunch of files and running grep on them).

    Mind you, the C++ I used was incredibly low-level. I used it basically as C with objects; no STL, no inheritance, etc., because I didn't need them, and didn't want the overhead (though in hindsight, if I had used an STL vector instead of rolling my own dynamic arrays it would have been the same speed and involve a lot less bugfixes).

    But basically, what I learned from the example of the csh search engine: Parsing text databases slow. Parsing binary databases fast.

    Which reminds me, I still need to ask Dave Rocks (my boss at NMSU) if I can GPL the engine source. I'm sure it'll do someone else some good; not everyone out there wanting to run a file archive engine can afford the overhead of SQL. Hobbes had plenty of bandwidth for its purposes (10Mbit) but 32 megs of RAM and an ancient RS/6000 just aren't enough for anything SQL-based...
    ---
    "'Is not a quine' is not a quine" is a quine.

  • I've been making my living with Perl and CGI for almost three years. That's an eternity in Web-time, so I guess I'm qualified to comment on a few things.

    First, the question of 'Perl vs. C' isn't as simple as picking your favorite language. Remember that software engineering is as much a resource juggling act as anything else. The resources in this case are most often CPU time, your time, and your customer's time.

    Ways Perl saves your (programming) time:
    1) The huge library of utterly standard software on CPAN. If you feel you are re-inventing the wheel, sifting through CPAN for an hour or so will usually turn up a well-supported module that solves the problem. CGI.pm started life this way; now it ships with every new version of Perl since 5.003.

    2) Programs are /much/ shorter. If Fred Brooks is right (the number of lines of code a programmer can produce in a year is approximately constant, regardless of language), this is a big one. Things that might take thousands of lines in C are only a few hundred lines of Perl.

    3) It will likely run anywhere, on NT or Linux or OS/2 or whatever. Perl has come much closer to the ideal of platform independance than Java just because it's out there in use and solves the problem one disaster at a time, not with over-designing and hard sells.

    4) It's nice to have taint-checking built in. This is not something I have ever observed in C. I almost never use eval, but this can save you a lot of security breaks. Perl itself also does not rely on static buffer lengths for user code, and so you don't get those nasty buffer overruns. Oh, sure, you can write C so that it doesn't have buffer overruns - but in Perl this power is automatic.

    Ways Perl can save your customer and CPU time:
    5) Perl (and also PHP3, and even ASP) all have the advantage of super-fast support inside the web server via an extension module. This avoids the problem of forking off a new CGI program, loading Perl, loading the Perl program, interpreting, etc. Let me tell you from personal experience - the difference is night and day. For anything that gets more than a few hits a day, DON'T use straight CGI. Use mod_perl, FastCGI, or whatever. This also saves you server bloat, because that interpreter and program are shared by ALL CGI calls on the server.

    6) For the things that Perl is good at (string handling), it's not going to be slower than C. Think of Perl as the 'glue logic' and the C libraries under Perl as the things that actually do the processing. Fortunately for you, a large part of every WWW app is string handling - even just reading the parameters from the user's Post or Get. You'd end up implementing half of Perl's string handling functions anyway just to do this part in C.

    7) For the things where Perl is legitimately slower than straight C, it is possible to extend Perl in C. This lets you use C for your math-heavy library, a new fundamental data type (like matrices or complex numbers), or want to call a system library like Motif, Tk, or libcrypt.

    8) Shorter programs are easier to change when the customer decides that they want to do something else. Since it is almost impossible to get a good spec out of a customer, it is much better to have a short prototyping cycle. The prototyping circle in Perl is much shorter than C, so you can get more feedback from the customer in less time. This increases the probability the customer will like the finished product. You can still do it in C for the final step if you feel you must.

    And now the bad news.

    Perl's power of expression is also its problem. There are so many different ways to do things, and many of them are very, very ugly. Badly written Perl is the worst thing in the world to maintain - OK, maybe not quite as bad as badly-written TCL, but pretty bad.

    Perl will tolerate sloppiness. It takes some time to get competent enough with Perl that you don't need the sloppiness to make things happen. Experienced Perl people should be able to read each other's code; it's the bloody amateurs (eg almost everybody) that you have to watch out for.

    As a result, it's very easy to have 'write-only' Perl code. If it becomes necessary to maintain it... well, time to start over.

    This is really the only bad news, however. So it's basically 'pick your poison'. Do you want to get bit in maintainence, or in any of the half-dozen other ways that Perl saves you time? No language is going to be perfect, and this is the trade-off you make in Perl.

    Therefore, Perl is much more suitable for me because it lets me get the project finished to the customer's satisfaction in less time. You can easily document it all (and minimize the maintainence hit) while the site is up and making money - as opposed to having the customer on the phone screaming at you that the project is late.

    In the end, you have to choose for yourself. Decide if you have the time to get good at unravelling spaghetti Perl, or if you'd rather spend your time debugging spaghetti C.
  • Like he says, there's a lot of text processing in CGI prorgams. How would you handle a standard regular expression in C or C++? -- you'd call a subroutine. Once Perl gets into its regular expression subroutine, it's running C itself, not interpreted Perl, so there is no speed difference. Same with a lot of other Perl code -- hashes? You'd just be calling subroutines in C to generate keys and manipulate the hashes. String manipulation? People who use strcpy and strcat are probably writing slower code than Perl has built in, because it tracks the length of strings, and allocates in chunks, so it doesn't need as much realloc as some lazy C coders write.

    There is more memory overhead, and there is the compilation overhead, but using mod_perl helps bunches here.

    --
  • Q: If Perl's so hot, why hasn't it been ported to the PalmOS?

    A: Because it's a resource hog.
  • I've had people tell me this before. This assumption can be an illusion. While it's true that you are limited on a per connection basis in many cases, it's also true that the number of requests that can be processed at a given instant in time is also a bottleneck at that instant.

    This cant be stressed enough. I used to run a webserver with about 1000 virtual hosts, about a hundred or so of those which were high traffic, complex, sites that were big moneymakers for us and the respective customers. A badly written CGI essentially would kill performance for ALL servers on that machine so we were very in tune with what our boxes could do at any given time, and when it came down to it, it was NOT bandwidth defining the performance of the servers. They were slow becasue of poorly written or badly implemented CGI's. They did not discriminate against languages either. Bad code in C is just as slow as bad code in Perl or bad code in TCL or bad code in sh.

    -Rich
  • I read some comment from web guru Philip Greenspun [photo.net] that claimed Yahoo Stores' backend code was written in Common Lisp. Just a fun bit of trivia. :-)
  • When the first of us got on the web, it was the early 90's, before there was a Netscape Corporation or an Internet Explorer, when URLs didn't appear evrywhere and there weren't TV commercials for web sites. It was a time where not only did everybody and their uncle NOT have a web site, but if you said "World Wide Web", they'd say "What?!?"

    We generally had two browser choices: Lynx or Mosaic for Motif. Most of us ended up downloading and compiling the browser ourselves. Sometimes we had to hack the code to make it compile under SCO. Typically we were the system administrators and we wrote a lot of Perl anyway.

    So when we needed to choose a CGI language, perl was obvious. of course back then, we weren't doing transaction processing on our web sites we were writing phone lookups and simple query screens, but perl was it.

    Then others started to get onto the web, and they asked us what we used to write CGI. We said "perl", so the learned perl.

    Now it's 1999, we're so far removed from the time when there was a registered servers list at NCSA and there were only 1500 servers registered.

    Webmaster, and web developer are full-fledged job titles, not just more cool stuff your system administrator knows how to do.

    Perl for CGI came from the early adopters of WWW who were already using perl for their other tasks.

  • With Perl and Python, most of the meat is still in C++ or C. All the regular expression stuff is C code, highly optimized C code. I haven't done any scientific benchmarks but I have yet to see a situation where a C++ program would solidily outperform a perl script doing common CGI stuff.

    Between the time to develop, the bandwidth constraints, and the library functions you'd have to do some very specific profiling to show a C++ or C programming significantly out performing a perl script for typical CGI applications and a lot of that can probably be offset by the fact that you can embed python and perl interpreters into apache which cuts the startup time for your scripts (by as much as 1000% in some cases, or so I've heard) since they don't have to spawn an interpreter and start it up.

    Say that the C++ program was twice as fast (probably not, but assuming it was) how many hits would you have to get to saturate enough of your server capacity to justify porting your perl to C++? Processors are remarkably cheap compared to bandwidth anymore. Maybe if you run yahoo or infoseek or something it is justified to code all your CGIs in C or C++ but I think the typical web server can easily get away with perl or python.

  • Unless you have a unusual web-based application, you are much better spending the time optimizing the architecture than the implementation language. CGIs take such a fraction of time, assuming you're not doing something silly, that even if they're dog slow they'll still flood your network.

    The the web means most of the time is spend handshaking and transmitting. Try to eliminate those as much as possible, because they're even more critical for people on a modem. For those same people, a script that takes 2 seconds to run loads as fast as a 4KB GIF.

    I believe you should use the highest level language possible that will let still execute quickly enough. Perl is used because it is fairly high-level while still executing quickly enough.

    Higher level languages are usually more maintainable and make for easier implementation. This is almost always vastly more important than how fast the program executes.

    Invest the extra time you get buy using a higher level language, like Perl or Python, into coming up with an efficient architecture. It'll pay off in a big way.

  • I've been in several big scale projects requiring language decisions. And there are certainly places where Perl is just great; the string regular expression engine, and interpretation of on-the-fly generated code, make many things that are borderline multi-person projects, relatively simple one-person projects.

    But it makes sacrifices. You can't beat C/C++ and assembler for having the ability to do things the way you want them done. For one project, an automated reasoner, the code was designed down to a bitwise memory manager. This simply wouldn't be possible in Perl. But we did make a web interface to the automated reasoner using Perl, thus decreasing our overall cost, in terms of development time and effort. And in particular, frustration. We also modularize the project into Perl and C/C++ parts, making debugging, documentation, and reviewing (and reuse) much better.

    The best part about Perl was the documentation -- O'Reilly's books by Larry Wall et al., was superb. We bought the two books they publish (the "Cookbook", and the "Programming" guide), and were sufficently versed in the language and architecture of the interpreter to make robust, straight forward, and complex manipulation programs. Not very good for an automated reasoner (it'd probably take longer than the end of the universe to prove pigeon hole), but great for converting output between syntatical equivalents and testing validity (relatively trivial compared to generating a proof).

    I think every language has it's place. I'm keen on the diagonal languages (C/C++, Perl, etc.), rather than the orthogonal languages (Python, Java, etc.). The diagonal languages let you do things in many equally correct interpretations, whereas the orthogonal reflect single best interpretations. Perl is the living epitemy of a diagonal language, which is part of it's beauty, and one of the reasons I like it so much, but diagonal languages are more difficult to understand the second time you try and work on it. :)

    Like everything, there's reason to choose one language over another, and like everything in the real world surrounding choice, there's no perfect solution. Even COBOL has it's place.

  • I've been in several big scale projects requiring language decisions. And there are certainly places where Perl is just great; the string regular expression engine, and interpretation of on-the-fly generated code, make many things that are borderline multi-person projects, relatively simple one-person projects.

    But it makes sacrifices. You can't beat C/C++ and assembler for having the ability to do things the way you want them done. For one project, an automated reasoner, the code was designed down to a bitwise memory manager. This simply wouldn't be possible in Perl. But we did make a web interface to the automated reasoner using Perl, thus decreasing our overall cost, in terms of development time and effort. And in particular, frustration. We also modularize the project into Perl and C/C++ parts, making debugging, documentation, and reviewing (and reuse) much better.

    The best part about Perl was the documentation -- O'Reilly's books by Larry Wall et al., was superb. We bought the two books they publish (the "Cookbook", and the "Programming" guide), and were sufficently versed in the language and architecture of the interpreter to make robust, straight forward, and complex manipulation programs. Not very good for an automated reasoner (it'd probably take longer than the end of the universe to prove pigeon hole), but great for converting output between syntatical equivalents and testing validity (relatively trivial compared to generating a proof).

    I think every language has it's place. I'm keen on the diagonal languages (C/C++, Perl, etc.), rather than the orthogonal languages (Python, Java, etc.). The diagonal languages let you do things in many equally correct interpretations, whereas the orthogonal reflect single best interpretations. Perl is the living epitemy of a diagonal language, which is part of it's beauty, and one of the reasons I like it so much, but diagonal languages are more difficult to understand the second time you try and work on it. :)

    Like everything, there's reason to choose one language over another, and like everything in the real world surrounding choice, there's no perfect solution. Even COBOL has it's place.

  • According to the Apache Module Report [e-softinc.com] at E-Soft Inc, PHP is more widespread than Perl as an Apache Module. These figures [php.net] at www.php.net might be interesting as well...
  • I'm surprised nobody has mentioned taint mode.

    If your script is running with an assumed user id because of a setuid bit, Perl will prevent the script from calling certain system facilities (notably the shell) with any values which are derived from the cgi's parameters directly or indirectly. There are some loopholes, but by in large this makes it tougher to monkey around with a cgi, espcially if you don't know its internal structure.

    This alone is probably worth the price of admission.

    My own limited experience with monkeying around with a few different scripting options is that it's just plain faster to get things done in perl. Partly this is the language, and partly this is because there are so many tools for the language that are widely known and well thought out (cgi.pm etc.).

    Perl does certain very commonly useful things like pattern matching and hashing very quickly. In terms of runtime speed, once you factor in other kinds of overhead, I don't think most people would get much speed improvement from C++, and that only after a lot of effort. Of course, there are undoubtedly lots of exceptions to this.
  • If you spend a majority of your time on a project waiting for it to compile rather than writing code, you either have a really slow processor or a really small project.

    I just compiled GNOME from scratch, including glib, gtk+, ORBIt, all of the other support libraries, GIMP and a number of other applications. It took all morning and some of the afternoon. I have a six-month-old machine with 64MB of RAM. Would you care to re-think that statement?
  • The shells are all a poor way to code anything more than very simple commands. For example, I've done some timing of roughly the scenario you describe:

    It takes about 4 minutes and 22 seconds to go over every C file in the GNOME source tree, searching for the inline keyword using find/grep (grep only once via xargs). This is about as efficient as I can get with the shell.

    It took 3 minutes and 39 seconds in perl (depth-first, one stat per file, simple print "$file:$.:$_" if /inline/ type statement.

    I used sort/diff on the resulting files and they were the same. Perl's just faster by about 1.2 times. Of course, that gap widens considerably as soon as you start doing anything complex, and the gap between perl and C/C++ starts narrowing.

    The reason is that the shell can only communicate by moving around and parsing strings. Perl is managing real datastructures like file and directory handles, and never has to copy a filename from one process to another (e.g. the way find has to send filenames to grep).

    Now, try making it "find all strings that look like an author's name inside of a C source file, inside of a comment (either C or C++ style)". I can put money on perl vs. shell there, and C is going to take a lot of work to beat Perl by just a little (you'd basically have to write a simple, single-purpose scanner that matched C comments. Perl can do this with its built-in regular expressions (e.g. search_for_author(\$&) while m|/\*.*?\*/|g).

  • And it goes back to why dynamic, typeless, languages are used.

    1) Instantaneous Feedback. With Perl, you edit the "document" (script), click reload, and see the effect immediately.

    With C/C++, you edit, compile, relink, maybe even "restart the server" and if it crashes, it's debugger time. C/C++ sucks as a language for doing DatabaseText transformations ala Perl, Python, TCL, and 4GLs.

    2) A syntax that freely intermixes HTML and script code.

    What's better:

    $firstname$lastname

    or

    out " " "

    I would argue that a document that is mostly HTML, with some script fragments is far more editable and managable than a C++ source file that is mostly "printf" code with HTML interspersed.

    3) great for DatabaseText transformations.

    Perl blows away C/C++ in processing strings, slicing and dicing arrays, and doing Hashtable tricks. Anything even remotely close in C++ requires a third party non-standard library, and a syntax which still isn't as compact.

    The vast majority of CGI apps run DB queries and
    convert the dataset to a text or HTML representation.

    With scripting languages, it is much easier to perform the query, and write the template that dumps the output.


    CGI apps are *NOT* dominated by CPU performance. The biggest bandwidth is memory and I/O. Except in rare cases, compiling a CGI won't get you as much performance as simply adding more ram, more threading, and more I/O bandwidth.


    Why Perl? Should be "Why scripting" Answer? Fast, safe development, economical syntax, easy to debug, can blend with markup languages better.

    Cold Fusion is yet another example of something tuned to do one thing (DB->HTML transformation).

    The ability to simply "reload" to see you changes, even forced me to recode Java Servlets as JSP pages simply because my development speed was multiplied x 10.
  • One thing to consider is that CGI scripts impose a large delay in processing just because they have to fork off a process. Whether you use C or Perl may not matter if the time to invoke the script is significantly longer than the actual processing time. Perl is easier to use and faster for development (as mentioned by others), so it's better than C for scripting. Also, Perl is interpreted (rather than compiled), so bug fizes and patches are far easier to deploy. btw, you ca improve performance by abandoning CGI scripts altogether and looking at processes that use web-server threads (check out Philip and Alex's Guide to Web Publishing [photo.net] for info).
  • While I sympathize with your problems with spaghetti code, and I agree that obfuscation is laughably easy in Perl, I think your criticism is misplaced here.

    Perl is a tool, and like all tools can be misused. Specifically, Perl is a powerful tool, and so it can be really misused. :)

    I spend most of my working day writing Foxpro code, and if you think Perl is bad for spaghetti code, try maintaining old foxbase code. The language is huge, and old, and has at least two distinct ways of doing things. Plus, older versions of the language were weak on parameter passing, causing older code to look like one long rat's nest of case and while statements, instead of using functions.

    However, the only real problem is as always, when comments are LEFT OUT. FoxPro is a powerful data manipulation language, and like Perl, has its place in the tool box, and can be cryptic, but clear code can be written if the programmer a)knows what they are doing, and b)takes the time to let the future code reading programmer know what they were doing with comments.

    Don't blame Larry for creating a powerful tool just because the world has an "everyone can be a programmer" mentality right now, and Perl (and ahem all the M$ languages) is one of the languages being touted as such.

    Packers (see the Progammer's Stone articles here at /.) write bad code, language independantly. Perl can and is used correctly, and can be commented. Most people just don't bother. A lot of people have the idea that it is somehow more "elite" to write difficult to read code. A professional programmer knows that efficiency while desirable, is overshadowed by maintainability, because you lose money when you have to spend 4 hours figuring out what 100 lines of code does.

    As for cutting and pasting code from a book, that will get you in trouble in ANY language. Progamming is not about cook book solutions, but problem domain definition, and problem solving.

    To sum, If a person is writing bad code, blame the person, not the tool.

  • Try some (or all) of the following-
    1. #!/usr/bin/perl -w
    2. learn the funky stuff- it exists for a reason.
    3. use strict;
    4. don't hire dummies, demand examples of their code.
    5. #!/usr/bin/perl -w
    use strict;
    6. always define your needs up front and work out your algorithms on paper before coding.
    7. #!/usr/bin/perl -w
    use strict;
    use diagnostics;
  • this seems like a bit of a silly question to me. i've used both Perl and C (or C++) in my CGIs, but it entirely depends on what you're trying to implement.

    Perl is phenomenal at working with text, and it's dead easy to write. you don't have to worry about what type of information is stored in your variables (strings, ints, etc) and Perl does the conversions for you if you need to work with variables of different types. CGIs, historically, did a lot of work with text -- inputing forms, searching text files and dynamically creating HTML.

    if you're working on a huge, complex CGI that would be highly speed dependant and necessary to thousands of users, then use C. if not, use Perl (or whatever else fits your program best). in the past, CGI scripts were (generally) simple routines for very specific tasks. as they become more complex, Perl may not be the best answer.

    it's a simple case of the best tool for the job. use whatever language works best for you.

    - j
    --
    "The only guys who might buy [the Apple iBook] are the kind who wear those ludicrous baggy pants with the built-in rope that's used for a belt. - John C. Dvorak, PC Magazine
  • by Jon Peterson ( 1443 ) <jonNO@SPAMsnowdrift.org> on Tuesday November 02, 1999 @09:11AM (#1569433) Homepage
    Perl people take pains to point out that Perl is not a CGI language. It is a language. It is weak in areas such as GUI development, but for networking, database work, or many other things it is a good, solid application development language.

    It is no longer, IMHO, the best web development language - if all you are doing is web development, PHP will probably serve you better. The reason people are using Perl, is because the same skill set can be applied in so many places. Since starting work at COLT (EU telco/ISP) I have used Perl for:

    1. A few one liners to sort out a messy archive of documents left on a machine

    2. Automated admin of Security Dynamics ACE server (a one time password system)

    3. Writing an idiot-proof menu driven program for updating ipfilter rules on Solaris boxen

    4. Writing an interface so that Veritas Netbackup logs are written to a database.

    5. Writing a NOCOL client to alert if an Oracle database goes down for some reason.

    Perl is simply a good language for doing these kinds of tasks and many others. Those familiar with Netcraft may be interested to know that their survey software, which polls 99.x% of every web server on the web _every month_ is written in Perl (Highly modified version of LWP). Not a lightweight app.

    Goldman Sachs use Perl extensively to manage their mission critical databases.

    Perl may be a good CGI language, but it is more intersting, and provides greater benefits when used as an application language.
  • by Improv ( 2467 ) <pgunn01@gmail.com> on Tuesday November 02, 1999 @03:36PM (#1569434) Homepage Journal
    While I imagine equally optimized code would find
    this claim perfectly true, Perl gives you a lot
    of optimization for free, optimization that's
    hard to do in C. Things like hashing and
    good string support are not inherent in C, and
    many developers would just use C-strings as-is
    and linear associative arrays, and in the case
    of strings need to write quite complicated things
    to do good text manipulation, things that very
    often are slower than Perl's internal string
    implementation (and often buggy too). So...

    Yes, Perl has added compile time and interpretive
    penalties, but often the better algorithms that
    you end up writing in Perl will end up faster
    than in C, and so there are many times Perl will
    turn out much faster unless you're really taking
    a lot of time to optimize your C.
  • by tilly ( 7530 ) on Tuesday November 02, 1999 @02:18PM (#1569435)
    The -w line turns on optional run-time warnings. That is not a change in syntax, it is like adding a lot of debugging code all over the place. It does not change the syntax

    Using strict does change the syntax - it turns off the really flexible "Guess What I Mean" features. Yes, Perl has a lot of those. Yes, they are dangerous. They are fine in short scripts, but not in long ones.

    However when 2000 line C programs turn into 75 line Perl programs, maintainability is assisted. Overly verbose syntax, and overly low-level syntax, both are the cause of much grief. There is a trade-off, and for what I do Perl gets it right.

    And yes, explore some odd corners. Looping over a /g match is not something that is widely known, but if you have to take parsing to the next level it is utterly invaluable!

    Cheers,
    Ben
  • by fornix ( 30268 ) on Tuesday November 02, 1999 @09:13AM (#1569436) Homepage
    Precisely. I used C/C++ for about eight years before giving in to Perl for certain things.

    My advice (which with $1.50 will get you a cup of coffee)

    • Use C/C++ when you need fine grained control of memory and/or hardware, or if you need to perform numerical computations. These features make it nice for writing operating systems, device drivers, games, etc. But these capabilities become liabilities when you do not really need them.
    • Use Perl to manipulate text. After all, by its design, it excells in text manipulation - which what is happening in >99% of web applications. It has the added advantage of more rapid development, but I would still use it for web stuff instead of C/C++ even if it had to be compiled. Believe me, it is much nicer to use Perl's regular expressions capabilities than to do it in C/C++/lex/yacc (which I've used for many previous projects that needed sophisticated string handling - before I finally learned Perl).
    • I have mixed feelings about Java. It's somewhere in the middle. You lose the fine control of memory & hardware that C/C++ has but don't gain much in the way of higher level language features, except for automatic garbage collection and (the claim of) greater portability. It's sort of a spayed/neutered C++, and Perl is probably more portable anyway. Java will probably suceed on the strength of its libraries.
  • by dlc ( 41988 ) <dlc@NOsPAM.sevenroot.org> on Tuesday November 02, 1999 @08:01AM (#1569437) Homepage

    Although many people point to the ease of writing Perl as the primary reason it is on top for CGI programming, there are many other reasons. I would imagine that topping this list is the fact that more people know and use Perl regularly than most other languages, since it is such a flexible and portable language; when people start writing CGI scripts, Perl is a natural for them.

    Perl has tons of freely available libraries and modules that encapsulate almost every bit of functionality that you could ever want. Anything that takes more than a few lines of code has probably been done and encapsulated before. You just need to get it (CPAN [cpan.org]).

    Perl is a very expressive language, more so than most other languages I've seen or used. Because of this, people get very attached to Perl and what it can do.

    Perl has better support for regular expressions, parsing, and string manipulation than any other language or tool in existance. This becomes extrememly important when converting data on the fly, from text files, other formats (such as XML), from databases, or from any other thing you can think of. Grabbing raw data and parameters from the URLs and from POSTed scripts is easier in Perl than in most other languages, due to this support.

    What people usually bring up as downside to using Perl as a CGI language are not specific to Perl, they're specific to CGI. "Perl is a huge executable," they say, "and it is expensive to run a Perl script." But most of the overhead comes from the webserver forking a process, not from Perl itself. Yes, Perl scripts can become bogged down with tons of modules and libraries, but so can any language that has the capability to use libraries. As far as being a huge executable, take a look at the binaries produced by some of the Microsoft Web solutions (ahem, VB, ahem) and we'll talk about huge executables. The Tcl binary is about the same size as the Perl binary, and many complex C programs end up being very large also, once you incorporate regex parsing routines, CGI libraries, and image processing libraries.

    darren

  • by shawnhargreaves ( 66193 ) on Tuesday November 02, 1999 @07:58AM (#1569438) Homepage
    Most of the work in a CGI script consists of reading in template HTML files, substituting a few variables in them, and writing out the results. More complex work may involve a few more sophisticated text substitutions, a bit of database access, and perhaps traversing a few internal data structures as well. In all these cases, the work is being done inside library functions rather than in your code: when you call printf() or look something up in a hashtable, the performance is controlled by the implementation of that library function, rather than any code that you wrote yourself.

    As your calculations become more sophisticated, the balance shifts. If you are writing a CGI for generating fractal images or raytracing a 3d model, Perl would indeed be a stupid option (and in fact this is when you might even find C to be too highlevel, and prefer to start twiddling those bits around in asm). But if you profile something like the comments.pl that generated this page, I suspect you will find that only a tiny percentage of the time is being spent executing the loops and conditionals written by Mr. Malda, and the vast majority inside the I/O routines, string handling, hash lookups, and regexp functions, that were written (in C) by Mr. Wall :-)

    Also, higher level languages decrease development speed and make things easier to tweak. Since websites tend to be in continual development, and new features are required on internet time, this is a major win. In conventional application software it may be a useful tradeoff to spend an extra week coding in order to double your execution speed, but if you know you are going to have to change this program a fortnight from now, that suddenly starts to look much less sensible :-)
  • by Suydam ( 881 ) on Tuesday November 02, 1999 @07:59AM (#1569439) Homepage
    It seems that your real question pokes at something deeper. Mainly, why do web-folk use an interpreted language for their applications when it would be faster to use a compiled language like C.

    I think i can shed some light on this. There are several reasons which I'll outline below.

    • Rapid Development - A typical software project might last for months before hitting the shelves. Most web projects have to be spec'ed, built and online in a matter of weeks. Interpreted languages (like Perl and Python) make it easier to develop things in a hurry because you can make changes without re-compiling.
    • The performance bottleneck is bandwidth, not performance. Usually, it's the speed of someone's modem, or the crowded internet backbones that slow down a web-page's performance. Using a faster language isn't going to help that, so typically web-folk go for the easiest solution. The easiest solution is to use an interpreted language. The reason is listed in my first bullet point.
    • Much of CGI scripting consists of text-prodcessing. A high percentage of CGI programming can be summed by saying "Fetch. Parse. Print". Since Perl, Tcl/Tk, and Python have built-in, powerful pattern-matching and text-processing routines, these specific languages lend themselves nicely to web-app design. This might be less true as a whole than 2 years ago, but I know personally that I'm still doing an awful lot of $x=~s/blah/blahblahblah/g; in my web programs. Can't do that any easier than in perl/python/tcl.
    In the end, it comes down to whether or not you'll get a noticable benefit from using something that runs faster. I think that for 99% of web-apps, you don't get enough of a speed increase by switching to C/C++ so people just don't even bother testing them out.

  • by Ledge Kindred ( 82988 ) on Tuesday November 02, 1999 @09:28AM (#1569440)
    I've had to maintain Perl-written CGI and whenever I have to do so, I usually wish that Larry Wall had been hit by a truck before he could have come up with the idea for Perl...

    Not that I have anything against Larry, but his TMTOWTDI (There's More Than One Way To Do It) philosophy has made it way too easy for people to write really awful spaghetti code in Perl. Much more easily than in any other language. In Perl, precisely because of TMTOWTDI, you can have four scripts that do exactly the same thing and look completely different! ARGH!

    And that is the "danger" of using Perl for CGI, and this is the situation I think the "web programming" industry is in - everyone and their grandmother can pick up a "Perl for Dummies" at the local bookstore and, without any concept of "proper programming practices" (i.e. don't write spaghetti code, try to comment at least what's not immediately obvious, stay structured so you can make changes easier) can start banging out CGI code and since Perl will take just about anything, nobody knows the crap that's running until it comes time for that burger-flipper to leave this job and move on to the next.

    I've been given Perl CGIs to try to figure out why they're breaking that are 100's of K in size and 1000's of lines of Perl code and took 10's of megs of core and way too much time to execute. When they were written, the author literally cut-and-pasted from any source that looked vaguely like it would do something like what they wanted their script to do and just left all the extra cruft in and kept just appending to the existing code any time they modified the scripts with thousands of lines of code doing calulations the results of which were just being overwritten by later lines of code. Perl being Perl didn't care, but when it came time for someone else to figure out what the damned thing was doing - forget about it.

    Yes, I occasionally write stuff in Perl, but I don't use any of the funky streamlined grammatical constructs that make the code horrendous to try to read through if you go back and have to maintain it, which also means that I don't usually gain a whole lot by writing my code in Perl except that I can be sloppy and it doesn't care, which IMHO is A Bad Thing. If possible, I will rewrite the thing in C or as a Java servlet, which just makes it much easier to maintain since there's ONLY ONE way to do it in those languages and you have to do it THAT WAY; when you see a statement in C or Java, you can figure out what it means and it can only mean one thing. Not like Perl... *shudder*

    Personally I think Perl should be considered a "non-rewriteable" language; kind of the programming equivalent to a CD-R. Once the program is out there and has been running unmaintained for a while, don't even think about modifying it! If you need to do something else, write a new program because it'll probably be faster than trying to figure out what you did with the old program, and will almost certainly be faster than trying to find out what someone else has done in their Perl program.

    In small shops where a piece of code might have to be banged out for something quick-and-dirty and never looked at again, do it with Perl. If you're going to try to write something that will need to be maintained and changed and worked on by many people, you either have to make sure those people are going to be VERY CAREFUL and VERY RESTRAINED in their programmin, or just don't even do it with Perl - it will quickly become unmaintainable.

    -=-=-=-=-

THEGODDESSOFTHENETHASTWISTINGFINGERSANDHERVOICEISLIKEAJAVELININTHENIGHTDUDE

Working...