Please create an account to participate in the Slashdot moderation system


Forgot your password?

Journal tomhudson's Journal: NoSQL+ sprintf() == better. 7

Old technology doesn't die - it get re-implemented when newer ways get too bloated and turn everything it touches into Beavis and Butthead.

In the dying days of the last century (awk! - how time flies) I used to do web cgi using c, same as a lot of people. Used malloc and sprintfs() to insert variables into a "template" and then printf()s to output. It was easy to track memory allocation for such cases, so the whole "OMG you'll leak memory" issue was a non-starter.

And then along came the attack of the killer web scripting "pee" languages - php, perl, and to a lesser extent, python. The concept of a "templating language" evolved and eventually we ended up with "templating engines" - megabytes of code to make up for the shortfalls of the approach.

For example, output buffering. php includes stuff like ob_start() because even one stray newline emitted will prevent you from setting cookies on the client. c/c++ cgi programs didn't worry about a stray newline being output by an #include file because only printf() and putchar() would actually write stuff to stdout - so as long as you were just sprintf()ing to your format strings you were all good. In php, even one space before the opening tag or after the closing tag in index.php and you're hosed for sending cookies (which is why you should always omit the closing tag - the spec allows it).

Another advantage was that the ONLY character you needed to escape in any file you loaded as a template as a sprintf format string was the % symbol. No worrying about single or double quotes, angle brackets, or whatever.

For user input, the only sanitation needed was the left and right brackets (to prevent someone from entering raw html, such as script tags) and, again, the % symbol. No "escape_string", no "real_escape_string", no "really_really_escape_string", since the data was stored and read w/o needing sql.

In terms of performance and memory use, sprintf() easily beats regexes. You really can't help but notice the difference. And it sure beats the so-called "compiled templates" produced by templating engines like smarty.

Yet another advantage is portability - any language that supports sprintf() can be used w/o modifying your template files. This means that if you need the best possible performance on some really really HUGE files, you can always do it directly from a shell in c, or if you're so inclined, java.

So I decided to re-implement my old approach from scratch yesterday in a couple of hours in php. The entire code - including for variable range-checking, reading and writing data (strings and arrays), meta tag files, html, reading and parsing config files, getting and setting cookies, posts and gets along with verification and using sane defaults and coercing the values to those default types, loading templates, creating those little "go to page 1 2 3 4" clickies for larger web documents and everything else, is under 9k, including the site's index.php file.

THAT is a lot more maintainable than the 1.1 meg download for smarty templates (and smarty doesn't do the reading and type coercion from the client or the minmax range checking or some of the other stuff).

So, +130 files for smarty, or 2 for the old way (and one is index.php,so it really doesn't count ...)? Oh, and the template files look a LOT cleaner. For example, no embedded program logic like {include file='whatever'} in the templates, so stuff like

<input name="first_name" value=$smarty.get.first_name> // no default values!!!
<input name="last_name" value=$smarty.get.last_name> // no type coercion!!!
<input name="address" value=$smarty.get.address>
<input name="city" value=$>
<input type="submit" value="Save">
<input type="reset">


<input name="%s">, etc ...

... so your template looks like this instead:

<input name="first_name" value="%s">
<input name="last_name" value="%s">
<input name="address" value="%s">
<input name="city" value="%s">
<input name="age" value="%s">
<input type="submit" value="Save">
<input type="reset">

and your index.php file looks like

$BASE = '../'; all files live outside of public_html space
include "$BASE/php/libfoo.php";

$HTML = read_tpl("test_page"); // read_tpl automatically prepends "$BASE/tpl/", appends ".tpl" extension.

$js = "new_js_lib";

$head = read_tpl("head");
$meta = read_meta("test_metadata");
$desc = $meta[0];
$keywords = $meta[1];

// want to test a new skin, new javascript libs
$HEAD = sprintf($head, $desc, $keywords, $css, $js);

$form = read_tpl("junk");
// get, post, cookie, gpc_pg, etc all sanitize the %, < and > symbols.
// also use an optional default value, and coerce any entered data to that type,
// so, if you ask for an integer and specify -42 as the default, anyone entering "FOO" returns -42
$first_name = get('first_name', 'Enter first name here');
$last_name = get('last_name', 'Enter first name here');
$address = get('city', 'Enter address here');
$city = get('address', 'Enter city here');
$age = get('age', -1);

// do any additional validation, data manipulation, etc.
// no need to do output buffering ... it's all in memory until you do the next line.
$FORM = sprintf($form, $first_name, $last_name, $address, $city, $age);

$footer = read_tpl("footer");
$FOOTER = sprintf($footer, "have a nice day!");

//okay, now write the whole thing
printf($HTML, $HEAD, $FORM, $FOOTER);

There is zero programming logic in the template itself - and that's the way it should be. Templates like smarty fail in the "presentation should be separate from code" department.

Plus, since most templates won't include variable names. they're pretty generic, again promoting template re-use. The footer, for example, could contain the output of several other templates instead of a simple message, and you'd never touch the main page template OR the footer template.

This discussion has been archived. No new comments can be posted.

NoSQL+ sprintf() == better.

Comments Filter:
  • From what I can follow, looks might sweet.

  • I've never done CGI web work, so the idea of letting sprintf replace placeholders in order, without having to name them in the template, is novel to me.

    However, it also strikes me as about as wise as doing a SQL INSERT without specifying the column names (for professional work, that is). I.e. it's more error-prone than it has to be.

    So, while I agree that programming logic shouldn't be in the template, I do think identifiers should.

    I also don't like the idea of waiting until everything is put together before

    • You bring up some interesting points ... so here's my thoughts on them :-)

      I also don't like the idea of waiting until everything is put together before sending the page, for every page.

      You don't *have* to - you can use printf()s instead of sprintf()s at any point ... it just makes it easier to not have to worry about including a file that has a stray character not enclosed in php tags so you can't set cookies properly, and to let you "accumulate" generated code - you might even want to generate template f

      • Ah, I see, you don't have just one template file for the whole web page, you have many, so the number of unnamed placeholders in each is small and (more) manageable.

        This is why I like to hear from people who are capable of a lot of independent thought (even when they're usually wrong... just kidding!). I do heavy functional decomposition of the programming logic for a page (being a big fan of tight functions that have high cohesion (do one thing and one thing only) and fit in one screenful) but I don't usua

        • There are some CMS that take it too far - they have a separate file not just for every major component, but also for trivial things that really should be combined - but to each their own, I guess. I try to find a balance between reduced disk activity and breaking things into different parts. p Oh - one other advantage is that this can be converted to a c/c++ cgi program with only a bit of effort, so you won't even have to worry about php bugs or security holes, or someone "borrowing" your work.
        • oh - I forgot to mention - you don't need to escape user input if you're just going to stuff it on disk and then manipulate it as raw data. Leave the single quotes, the double quotes, the % sign, as well as the "usual suspects" - ampersands, left and right angle brackets, and everything else.

          It sure makes for a LOT less head-scratching.

          Now, if you want to re-display it in a browser, obviously you'll have to encode it, but that's a lot less of a problem, since you don't have to worry abut "double-encodin

  • Funny you mention this. It's been a long time I wanted to move my pages from static HTML to generated HTML, but PHP and it's ilk simply doesn't stick to me. I though of doing exactly what you did: old style C, using simple templates (aka, HTML with the content parts left out).

    Oh, and I did program C and C++ CGIs before. Heck, one of the largest bigger projects before I left the programming world, was a C based CGI for the European Commission (DGT, the translation part). It was a pretty cool project, an

Love may laugh at locksmiths, but he has a profound respect for money bags. -- Sidney Paternoster, "The Folly of the Wise"