Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×

What's in Your HTML Toolbox? 192

Milo_Mindbender asks: "I've just ended up in charge of cleaning up an old and rather large website created by some non technical people. It has all the usual problems: paragraph tags with no ending tag; mixed case file names that work on Windows but not on a Linux webserver; files with mixed Windows/Linux/Mac line endings; duplicates or partial duplicates of files created when working on pages; and the list goes on. I'm wondering what tools you guys keep in your HTML/website toolboxes that work good for cleaning up this sort of mess. Things like pretty-printers, HTML 'lint' programs, dead file detectors, batch renamers (that change links and the files they point to into OS neutral names), and 'diff' programs that ignore HTML whitespace. I'm particularly interested in batch processing tools that actually fix problems (not just report them) because I've got a lot of files to deal with and don't have the time to edit every one by hand. So what's in YOUR toolbox?"
This discussion has been archived. No new comments can be posted.

What's in Your HTML Toolbox?

Comments Filter:
  • Perl (Score:2, Informative)

    by hahafaha ( 844574 ) * <lgrinberg@gmail.com> on Sunday September 03, 2006 @11:58PM (#16035465)
    I know many of the geeks out there have forsaken Perl, but it is still, in my opinion, an indisposable tool. I am currently fixing up a website similar to the one you described, especially in terms of the HTML problems. Write a Perl script to fix capitalization, closing of tags, etc. But understand that if code is not written well to begin with, than in many cases, it is impossible to automate the process of fixing it. You are going to have to do some things by hand.

    Depending on how bad it is, consider rewriting the HTML and CSS part of the website from scratch. It may be easier than fixing old code.
  • Tidy or Meyer (Score:5, Informative)

    by hedronist ( 233240 ) * on Monday September 04, 2006 @12:01AM (#16035477)
    There are two approaches: live with it and make as few changes as possible, or bite the bullet and do a complete rebuild. To do a cleanup, checkout tidy - it does a good analysis of the existing pages and can generate CSS that is OK, but not beautiful. If you want the final pages to look the same, but be standards compliant, see meyerweb.com and read his books on rebuilding pages ("Eric Meyer On CSS" and "More Eric Meyer on CSS"). Pragmatic is his keyword: lots of examples and he makes sense.

      Good luck. You're going to need it.
     
  • HTML Tidy (Score:3, Informative)

    by d3ik ( 798966 ) on Monday September 04, 2006 @12:01AM (#16035478)
    Been there, try this [sourceforge.net]
  • Re:FTW (Score:3, Informative)

    by mr_stinky_britches ( 926212 ) on Monday September 04, 2006 @12:03AM (#16035486) Homepage Journal
    You can also do batch file processing with vim by using the following commands: vim *.match.files.* then once in vim: :argdo:%s/[^m]//ge | w this would remove the funky windows line endings (mind you, ^m = ctrl-v ctrl-m in vim).
  • Obligatory (Score:2, Informative)

    by hahafaha ( 844574 ) * <lgrinberg@gmail.com> on Monday September 04, 2006 @12:07AM (#16035508)
    > :argdo:%s/[^m]//ge | w this would remove the funky windows line endings (mind you, ^m = ctrl-v ctrl-m in vim).

    Or, in emacs

    M-% (AKA Meta(usually Alt)-Shift-5)
    Query Replace: ^M with [nothing] :-)

    P.S. Note that ^M is not Caret-M. It is a single character. I usually just copy it out of the file, and then do it in emacs.
  • by M0b1u5 ( 569472 ) on Monday September 04, 2006 @12:12AM (#16035538) Homepage
    Oops Sorry!

    <div class="spacer" style="width:Xpx; height:Ypx;"></div>
  • Re:Obligatory (Score:3, Informative)

    by mr_stinky_britches ( 926212 ) on Monday September 04, 2006 @12:15AM (#16035550) Homepage Journal
    Or, in emacs
    M-% (AKA Meta(usually Alt)-Shift-5)
    Query Replace: ^M with [nothing] :-)

    Question for you: how would you do that across multiple files in emacs?
    The global search and replace command using vim on a single file would be simply:
    %s/^m//g
  • on macosx (Score:1, Informative)

    by Anonymous Coward on Monday September 04, 2006 @12:46AM (#16035667)
    TextWrangler or BBEdit Lite, vi, telnet, ftp, Photoshop CS (not CS2), GraphicConverter, Firefox, Safari.
  • Re:HTML Tidy (Score:3, Informative)

    by Anonymous Coward on Monday September 04, 2006 @01:05AM (#16035715)
  • Re:Oh, the usual (Score:2, Informative)

    by reanjr ( 588767 ) on Monday September 04, 2006 @01:22AM (#16035773) Homepage
    Nope.

    br is not now br /, one must simply write well-formed documents. Well-formed HTML (with all tags closed) also uses br /.
    em and strong are still alive and well as of XHTML 2.0.
    b and i are still available in XHTML 1.0.
    There is no HTML 4.1. Presumably you meant 4.01 strict, which is pretty much XHTML 1.0 Strict.
  • Cheat with PHP (Score:2, Informative)

    by GloomE ( 695185 ) on Monday September 04, 2006 @01:49AM (#16035890)
    $doc = new DOMDocument();
    $doc->loadHTML($junky_html);
    echo $doc->saveHTML();
    Reads in your crappy HTML, turns it into compliant XML, then dumps it out as nice clean HTML.
  • by Tumbleweed ( 3706 ) * on Monday September 04, 2006 @01:51AM (#16035910)
    Tidy, as others have already mentioned, will be your very best new friend.

    Install the 'Web Developer' extension for Firefox, and use some of the HTML/CSS validators in the Tools submenu.

    Get a good handle on regex searching & replacing (if you're doing this from Windows, I suggest Funduc's "Search & Replace").

    If you're migrating your GIFs to PNG (which I would recommend), then you need to get yourself pngout, to compress them to their smallest possible size (Photoshop SUCKS at this).

    And as someone else said, make an empty new standards compliant template, and get to cutting and pasting; it can be a *brutal* initial process, but you'll probably save yourself time in the long run, depending on how clean you want to eventually get the code. If you just want it to be standards compliant, then you can just do a clean up job. If you want to do it 'right,' you'll want to develop a new template and coding style to properly integrate the HTML and CSS. Things like not putting everything in a DIV (a sure sign you're a newbie to CSS), just to style something. Figure out why you should be using H1, H2 tags (& TBODY & TH tags if you're using tables for outer layout), etc, without having to use a lot of unnecessary DIVs all over the place. Inline styles = bad.

    Figure out why XHTML may not be the best choice over HTML. Know which DTDs to specify. Know the difference in IE6 between standards mode and quirks mode, and which DTD to use to make IE6 behave. Know that IE7's quirks mode is supposedly identical to IE6's; you supposedly won't get the new 'more-standards compliancy' in IE7 without a DTD.

    Oh yeah - the guy who posted about replacing spacer gifs with 'spacer DIVs'? Don't do that to yourself, okay? Yikes.

    Learn about usability and readability. Learn about typography, and how light-on-black text should be sized differently from black-on-light. Thinking about grey text on black or grey text on white? Don't be stupid. Make the stuff readable! Learn that sans serif fonts are more easily read at screen density (opposite of print). Learn why Verdana is usually not your friend (go for Trebuchet MS or even Arial).

    Oh, and learn to intent your freaking HTML!

    Some nice resources:

    Activating the Right Layout Mode Using the Doctype Declaration [hsivonen.iki.fi]

    Quirksmode [quirksmode.org] - a GREAT resource. Awesome info here. Memorize it.
  • by masklinn ( 823351 ) <slashdot.org@mCO ... t minus language> on Monday September 04, 2006 @02:33AM (#16036128)
    This is worse than image spacer, please go die in a fire
  • by Anonymous Coward on Monday September 04, 2006 @03:13AM (#16036303)
    Or you could just use the padding / margin features provided by CSS.

    margin-top: 1px;
    margin-right: 2px;
    margin-bottom: 3px;
    margin-left: 4px;
    or margin: 1px 2px 3px 4px;

    padding-top: 1px;
    padding-right: 2px;
    padding-bottom: 3px;
    padding-left: 4px;
    or padding: 1px 2px 3px 4px;
  • Re:Macros (Score:4, Informative)

    by Mad Merlin ( 837387 ) on Monday September 04, 2006 @03:22AM (#16036338) Homepage
    Can't even consider vim because of the macro capability in emacs. I have remapped Crtl-z to be equivalent to 'Ctrl-x e' (repeat last macro -- since I don't use 'suspend', the normal Ctrl-z function). Then I can record a macro ('Ctrl-x (' and type *anything* then close with 'Ctrl-x )') and use Ctrl-z to rapid-repeat the last macro. Makes repedetive editing very efficient. Can also do 'Ctrl-u 50 Ctrl-z' to repeat a macro 50 times, etc.

    I'd move to vim if it had similar ease with macro creation / execution. Does it? Huh? Well, does it? Come on, preach it, brother! Make me a vim believer!

    q<register> to record a macro, q to finish recording. Execute the macro with @<register>, then you can execute it again with @@. Obviously the @ commands can be prefixed with a number to repeat them that many times, 5@@ would repeat the last macro 5 times, for example.

  • by julesh ( 229690 ) on Monday September 04, 2006 @03:43AM (#16036408)
    Err.. this approach just doesn't work. Images are inline elements, you can't replace them with an equivalently sized block element and expect the page layout to be the same. And setting the CSS 'width' attribute of an inline element doesn't work in Explorer, so the entire approach is flawed. Sorry.
  • Re:Obligatory (Score:4, Informative)

    by maxwell demon ( 590494 ) on Monday September 04, 2006 @05:38AM (#16036795) Journal
    I'd rather use recode. After all, there might be other Windows specific stuff in there, like replacement of certain ISO-8859-1 high-bit control characters by graphic characters. With recode, those can be handleded as well (ideally convert it directly to Unicode).
  • by Selanit ( 192811 ) on Monday September 04, 2006 @06:54AM (#16036970)
    My biggest web devel tool is Firefox, with the Web Developer [chrispederick.com] extension and the HTML Validator [skynet.be] extension. The former does all sorts of amazingly neat things like letting me get precise info about any element within a page (using "Dispaly Element Information" under the "Information" menu, CTRL+SHIFT+F for short), showing me the HTTP response headers to any given page, add custom styles to a page, validate links, check for Section 508 accessibility compliance, resize the window for simulating lower screen resolutions, and on and on and on!

    The latter does instantaneous HTML validation using Tidy and displays any errors or warnings on the "view source" page. It also gives me LINE NUMBERS in the view soucrce window, which is a blessing. The beta version (which I prefer) lets you pick between the Tidy algorithm and the W3C's SGML parser. The SGML parser version gives the same errors as the W3C's own online validator, but without any need to submit the page through an online form.

    As for editing HTML, I generally use SciTE [scintilla.org] or one of its derivatives (eg Notepad2). Sadly, those aren't available under Mac OS X, so when I need to work on a Mac box I use Smultron [sourceforge.net]. THAT, however, is just an editor. People get religious about their editors, so my advice is just to pick one that suits you and ignore anybody what sniggers at you.
  • by bcmm ( 768152 ) on Monday September 04, 2006 @07:36AM (#16037072)
    There is a small utility called dos2unix which changes MS-style line endings in text files to Unix style. /usr/bin/mac2unix is symlinked to dos2unix on my Gentoo box, so I guess it can fix MacOs line endings too.
  • Re:Obligatory (Score:3, Informative)

    by ianezz ( 31449 ) on Monday September 04, 2006 @01:48PM (#16038802) Homepage
    Question for you: how would you do that across multiple files in emacs?

    Use dired on the directory where files are located (i.e. M-x dired), mark the files you are interested in (with 'm'), then use 'Q' (uppercase) to perform what basically is a query-replace-regexp on all the marked files (actually it is dired-do-query-replace-regexp).

    See the GNU Emacs dired documentation for further details.

The only possible interpretation of any research whatever in the `social sciences' is: some do, some don't. -- Ernest Rutherford

Working...