What's in Your HTML Toolbox? 192
Milo_Mindbender asks: "I've just ended up in charge of cleaning up an old and rather large website created by some non technical people. It has all the usual problems: paragraph tags with no ending tag; mixed case file names that work on Windows but not on a Linux webserver; files with mixed Windows/Linux/Mac line endings; duplicates or partial duplicates of files created when working on pages; and the list goes on. I'm wondering what tools you guys keep in your HTML/website toolboxes that work good for cleaning up this sort of mess. Things like pretty-printers, HTML 'lint' programs, dead file detectors, batch renamers (that change links and the files they point to into OS neutral names), and 'diff' programs that ignore HTML whitespace. I'm particularly interested in batch processing tools that actually fix problems (not just report them) because I've got a lot of files to deal with and don't have the time to edit every one by hand. So what's in YOUR toolbox?"
Perl (Score:2, Informative)
Depending on how bad it is, consider rewriting the HTML and CSS part of the website from scratch. It may be easier than fixing old code.
Tidy or Meyer (Score:5, Informative)
Good luck. You're going to need it.
HTML Tidy (Score:3, Informative)
Re:FTW (Score:3, Informative)
Obligatory (Score:2, Informative)
Or, in emacs
M-% (AKA Meta(usually Alt)-Shift-5)
Query Replace: ^M with [nothing]
P.S. Note that ^M is not Caret-M. It is a single character. I usually just copy it out of the file, and then do it in emacs.
Re:Creating white space - apologies (Score:5, Informative)
<div class="spacer" style="width:Xpx; height:Ypx;"></div>
Re:Obligatory (Score:3, Informative)
Question for you: how would you do that across multiple files in emacs?
The global search and replace command using vim on a single file would be simply:
%s/^m//g
on macosx (Score:1, Informative)
Re:HTML Tidy (Score:3, Informative)
Re:Oh, the usual (Score:2, Informative)
br is not now br
em and strong are still alive and well as of XHTML 2.0.
b and i are still available in XHTML 1.0.
There is no HTML 4.1. Presumably you meant 4.01 strict, which is pretty much XHTML 1.0 Strict.
Cheat with PHP (Score:2, Informative)
tidy, web developer FF extension, search & rep (Score:4, Informative)
Install the 'Web Developer' extension for Firefox, and use some of the HTML/CSS validators in the Tools submenu.
Get a good handle on regex searching & replacing (if you're doing this from Windows, I suggest Funduc's "Search & Replace").
If you're migrating your GIFs to PNG (which I would recommend), then you need to get yourself pngout, to compress them to their smallest possible size (Photoshop SUCKS at this).
And as someone else said, make an empty new standards compliant template, and get to cutting and pasting; it can be a *brutal* initial process, but you'll probably save yourself time in the long run, depending on how clean you want to eventually get the code. If you just want it to be standards compliant, then you can just do a clean up job. If you want to do it 'right,' you'll want to develop a new template and coding style to properly integrate the HTML and CSS. Things like not putting everything in a DIV (a sure sign you're a newbie to CSS), just to style something. Figure out why you should be using H1, H2 tags (& TBODY & TH tags if you're using tables for outer layout), etc, without having to use a lot of unnecessary DIVs all over the place. Inline styles = bad.
Figure out why XHTML may not be the best choice over HTML. Know which DTDs to specify. Know the difference in IE6 between standards mode and quirks mode, and which DTD to use to make IE6 behave. Know that IE7's quirks mode is supposedly identical to IE6's; you supposedly won't get the new 'more-standards compliancy' in IE7 without a DTD.
Oh yeah - the guy who posted about replacing spacer gifs with 'spacer DIVs'? Don't do that to yourself, okay? Yikes.
Learn about usability and readability. Learn about typography, and how light-on-black text should be sized differently from black-on-light. Thinking about grey text on black or grey text on white? Don't be stupid. Make the stuff readable! Learn that sans serif fonts are more easily read at screen density (opposite of print). Learn why Verdana is usually not your friend (go for Trebuchet MS or even Arial).
Oh, and learn to intent your freaking HTML!
Some nice resources:
Activating the Right Layout Mode Using the Doctype Declaration [hsivonen.iki.fi]
Quirksmode [quirksmode.org] - a GREAT resource. Awesome info here. Memorize it.
Re:Creating white space - apologies (Score:5, Informative)
Re:Creating white space - apologies (Score:4, Informative)
margin-top: 1px;
margin-right: 2px;
margin-bottom: 3px;
margin-left: 4px;
or margin: 1px 2px 3px 4px;
padding-top: 1px;
padding-right: 2px;
padding-bottom: 3px;
padding-left: 4px;
or padding: 1px 2px 3px 4px;
Re:Macros (Score:4, Informative)
q<register> to record a macro, q to finish recording. Execute the macro with @<register>, then you can execute it again with @@. Obviously the @ commands can be prefixed with a number to repeat them that many times, 5@@ would repeat the last macro 5 times, for example.
Re:Creating white space - apologies (Score:3, Informative)
Re:Obligatory (Score:4, Informative)
Web Developer and HTML Validator Extensions! (Score:5, Informative)
The latter does instantaneous HTML validation using Tidy and displays any errors or warnings on the "view source" page. It also gives me LINE NUMBERS in the view soucrce window, which is a blessing. The beta version (which I prefer) lets you pick between the Tidy algorithm and the W3C's SGML parser. The SGML parser version gives the same errors as the W3C's own online validator, but without any need to submit the page through an online form.
As for editing HTML, I generally use SciTE [scintilla.org] or one of its derivatives (eg Notepad2). Sadly, those aren't available under Mac OS X, so when I need to work on a Mac box I use Smultron [sourceforge.net]. THAT, however, is just an editor. People get religious about their editors, so my advice is just to pick one that suits you and ignore anybody what sniggers at you.
Line endings - use dos2unix (Score:4, Informative)
Re:Obligatory (Score:3, Informative)
Use dired on the directory where files are located (i.e. M-x dired), mark the files you are interested in (with 'm'), then use 'Q' (uppercase) to perform what basically is a query-replace-regexp on all the marked files (actually it is dired-do-query-replace-regexp).
See the GNU Emacs dired documentation for further details.