What's in Your HTML Toolbox? 192
Milo_Mindbender asks: "I've just ended up in charge of cleaning up an old and rather large website created by some non technical people. It has all the usual problems: paragraph tags with no ending tag; mixed case file names that work on Windows but not on a Linux webserver; files with mixed Windows/Linux/Mac line endings; duplicates or partial duplicates of files created when working on pages; and the list goes on. I'm wondering what tools you guys keep in your HTML/website toolboxes that work good for cleaning up this sort of mess. Things like pretty-printers, HTML 'lint' programs, dead file detectors, batch renamers (that change links and the files they point to into OS neutral names), and 'diff' programs that ignore HTML whitespace. I'm particularly interested in batch processing tools that actually fix problems (not just report them) because I've got a lot of files to deal with and don't have the time to edit every one by hand. So what's in YOUR toolbox?"
HTMLKit for Windows (Score:5, Interesting)
Creating white space (Score:4, Interesting)
I create a class:
line-height:0;
font-size:0;
}
Then I replace all those hundreds (and sometimes THOUSANDS) of references to s.gif with the following:
I use a span sometimes, as required - if the DIVs alone cause layout issues.
Say hello to faster web pages instantly!
Re:Why use static HTML? (Score:3, Interesting)
If what you need is very simple (including footers would count as simple), here's more information about server side includes [apache.org] (SSI). Either rename your pages
If you want something more complex, you can use SSI to include a mini-CGI script into the middle of your HTML. CGI scripts can be written in any language, even a shell script:
#!/bin/sh
echo Content-type: text/html
echo
echo (insert HTML here)
Firefox with plugins (Score:3, Interesting)
Two tidbits (Score:3, Interesting)
Tidy is great as others mentioned. Will even allow if you feel confident to cherrypick the data you want to scavenge with XSLT.
Separating grain from chaff
A static HTML project has numerous index2.old.html, index2.html, index_2.html, project2.html.old and so on - files that you just aren't sure are useful?
Copy the project directory (touch all the files) and do a wget -r on the tree; by looking at the access time, you'll know all internal referenced files. Alternatively, scan the webserver logfiles to know which files are useful.
Be sure your filesystem is configured to register access times if you pick the first method...
(As a bonus, a close peek on the 404s might give you some answers on mis-used capitalization of filenames.)
Lynx / Links / ELinks
Can be used to dump the text data of old and unmaintainable HTML documents; most useful when trying to scavenge only the text contents to put in a database or so.
Re:FTW (Score:3, Interesting)
vim has excellent syntax highlighting, predictive typing, line numbers, search and replace (with regular expressions), code folding, spell-check, built-in help, and more.
Give yourself two weeks with an open-mind, and you might be surprised about it. The easiest way to get started is to type vimtutor from almost any shell account.
Dreamweaver (Score:3, Interesting)