Forgot your password?
typodupeerror

What's in Your HTML Toolbox? 192

Posted by Cliff
from the utilities-of-the-trade dept.
Milo_Mindbender asks: "I've just ended up in charge of cleaning up an old and rather large website created by some non technical people. It has all the usual problems: paragraph tags with no ending tag; mixed case file names that work on Windows but not on a Linux webserver; files with mixed Windows/Linux/Mac line endings; duplicates or partial duplicates of files created when working on pages; and the list goes on. I'm wondering what tools you guys keep in your HTML/website toolboxes that work good for cleaning up this sort of mess. Things like pretty-printers, HTML 'lint' programs, dead file detectors, batch renamers (that change links and the files they point to into OS neutral names), and 'diff' programs that ignore HTML whitespace. I'm particularly interested in batch processing tools that actually fix problems (not just report them) because I've got a lot of files to deal with and don't have the time to edit every one by hand. So what's in YOUR toolbox?"
This discussion has been archived. No new comments can be posted.

What's in Your HTML Toolbox?

Comments Filter:
  • by AKAImBatman (238306) * <akaimbatman@gmai ... m minus language> on Sunday September 03, 2006 @10:56PM (#16035457) Homepage Journal
    So what's in YOUR toolbox?


    CAPITAL ONE!

    [...]

    Wait, what was the question again?
  • Dreamweaver FTW! It would be a huge timesaver in this situation.

    Good luck!
    • Re: (Score:3, Informative)

      You can also do batch file processing with vim by using the following commands: vim *.match.files.* then once in vim: :argdo:%s/[^m]//ge | w this would remove the funky windows line endings (mind you, ^m = ctrl-v ctrl-m in vim).
      • Obligatory (Score:2, Informative)

        by hahafaha (844574) *
        > :argdo:%s/[^m]//ge | w this would remove the funky windows line endings (mind you, ^m = ctrl-v ctrl-m in vim).

        Or, in emacs

        M-% (AKA Meta(usually Alt)-Shift-5)
        Query Replace: ^M with [nothing] :-)

        P.S. Note that ^M is not Caret-M. It is a single character. I usually just copy it out of the file, and then do it in emacs.
        • Re: (Score:3, Informative)

          Or, in emacs
          M-% (AKA Meta(usually Alt)-Shift-5)
          Query Replace: ^M with [nothing] :-)

          Question for you: how would you do that across multiple files in emacs?
          The global search and replace command using vim on a single file would be simply:
          %s/^m//g
          • Re: (Score:3, Informative)

            by ianezz (31449)
            Question for you: how would you do that across multiple files in emacs?

            Use dired on the directory where files are located (i.e. M-x dired), mark the files you are interested in (with 'm'), then use 'Q' (uppercase) to perform what basically is a query-replace-regexp on all the marked files (actually it is dired-do-query-replace-regexp).

            See the GNU Emacs dired documentation for further details.

    • Eek. I'd say avoid Dreamweaver at all costs. It causes exactly these kinds of problems. And, if not configured correctly, it can even put out malformed code itself.

      Debugging HTML/PHP/etc files: UltraEdit-32 [ultraedit.com]. $40 for the single best Swiss Army Editor I've ever found. In cominbation with Tidy (which it has 100% integrated in the interface), it can handle every file-related problem mentioned except for the names themselves. Out of the box, it can do everything from line ending conversion to your standard synt
      • Only UltraNoobs use UltraEdit. You need to spend some time learning VIM (http://www.vim.org [vim.org]) and you'll never look back.
        Cheers.
        • Re: (Score:3, Interesting)

          by Baricom (763970)
          Agreed. I dismissed people who kept suggesting vim as "crazy UNIX people." I still felt that way about a week into playing with it, but soon after, I realized how powerful it is once you've figured out how the keystrokes work. Since then, I've used vim on every computer I've worked with and gvim (the GUI-enhanced version of vim) is my primary editor on my Windows box.

          vim has excellent syntax highlighting, predictive typing, line numbers, search and replace (with regular expressions), code folding, spell-
          • try Notepad++. syntax highliting for html php js and conversion for windows/unix line ending, macros, hex editor, html tidy-ier-upper, and more. Lots o nifty stuff and i's OSS.



            Place a curse on Microsoft [i-curse.com]
      • by Jaruzel (804522)

        ebugging HTML/PHP/etc files: UltraEdit-32. $40 for the single best Swiss Army Editor I've ever found.

        Seeing as you got flamed for this opinion, I thought I'd help you out.

        UltraEdit-32 is damn good, I'm sure it's not as slick as some of the other Editors out there, but it has good syntax highlighting, tabs, and the ability to run macros or spawn sub processes and capture the output. Yes you have to put a bit of work in to get it how you like it, but overall it fits the bill, and if you can actually write co

        • by ozbon (99708)
          Rather than UltraEdit32, I always swear by TextPad [textpad.com]. It's just about the first thing I install on any computer I'm working on - hell, I even paid for a license!

          OK, most of the time it's just minor-annoyance nagware, but I figured, I use it so much, might as well pay them for some kind of use.

          The only other thing I absolutely swear by for HTML/CSS is BradSoft's "TopStyle Editor" for CSS. Yeah sure, I can use a text editor for the same thing, but TopStyle makes my life easier.
    • Dreamweaver FTW!

      What kind of advantage would using dreamweaver give you in a situation like this?

      I first started with HTML/websites in the mid 90s with AOLPress, then Adobe Pagemill, NetObjects FUSION, GoLive Cyberstudio (which was bought by adobe and turned into GoLive), and eventually, I dropped all of these studio apps in favour of vim using PERL and eventually moved on to PHP.

      I've since started using this great app called TextMate [macromates.com], and when I get a complete site that I need to work on, I pipe the code t
  • Perl (Score:2, Informative)

    by hahafaha (844574) *
    I know many of the geeks out there have forsaken Perl, but it is still, in my opinion, an indisposable tool. I am currently fixing up a website similar to the one you described, especially in terms of the HTML problems. Write a Perl script to fix capitalization, closing of tags, etc. But understand that if code is not written well to begin with, than in many cases, it is impossible to automate the process of fixing it. You are going to have to do some things by hand.

    Depending on how bad it is, consider rewr
  • Tidy or Meyer (Score:5, Informative)

    by hedronist (233240) * on Sunday September 03, 2006 @11:01PM (#16035477)
    There are two approaches: live with it and make as few changes as possible, or bite the bullet and do a complete rebuild. To do a cleanup, checkout tidy - it does a good analysis of the existing pages and can generate CSS that is OK, but not beautiful. If you want the final pages to look the same, but be standards compliant, see meyerweb.com and read his books on rebuilding pages ("Eric Meyer On CSS" and "More Eric Meyer on CSS"). Pragmatic is his keyword: lots of examples and he makes sense.

      Good luck. You're going to need it.
     
  • HTML Tidy (Score:3, Informative)

    by d3ik (798966) on Sunday September 03, 2006 @11:01PM (#16035478)
    Been there, try this [sourceforge.net]
  • I know it's a huge mickey mouse and there's probably (scratch that-- definitely) better ways, but when I need to do repetitive, but relatively simple, that can be done via command line, I use JavaScript to automatically create all the commands, copy them into a batch file, and done.
    • by Jerf (17166)
      If you're on Windows, look into using the Windows Scripting Host, which you almost certainly already have installed. Windows Scripting Host can run Javascript directly as sort of a super-batch file.

      You can start by simply directly executing the commands you generate, but you should learn the basic filesystem objects so you can manipulate files more safely and directly.

      I'm assuming you're not on UNIX, because in that case you should either learn some shell or Perl. I prefer Perl because its escaping mechanis
  • I use PHP. Server side includes are perfect for standard headers/footers. I check server variables to change behavior based on whether it's on the dev server or the final webserver.

    I'd paste an example, but slashdot seems to think PHP code is "junk characters".

    • I'd paste an example, but slashdot seems to think PHP code is "junk characters".

      It is.

      PHP was designed for about what you're describing, when there were better technologies out there to do the same thing. It really looks like it was just supposed to be a bunch of PHP tags you'd mix in with your HTML tags, so you didn't have to think too much like a programmer, and could think more like a web coder.

      This is a bad idea in the first place -- if you want to do dynamic stuff, learn to program. Worse, for som

  • HTMLKit for Windows (Score:5, Interesting)

    by SocialEngineer (673690) <{invertedpanda} {at} {gmail.com}> on Sunday September 03, 2006 @11:07PM (#16035506) Homepage
    HTMLKit [htmlkit.com] has a lot of great options for developers, and a good plugin system.
  • by grammar fascist (239789) on Sunday September 03, 2006 @11:09PM (#16035518) Homepage
    My toolbox has a little white pill that I take every time I get a hankering to work with HTML. It fixes me up right quick.
  • Creating white space (Score:4, Interesting)

    by M0b1u5 (569472) on Sunday September 03, 2006 @11:10PM (#16035527) Homepage
    The disaster that was "s.gif" (or "trans.gif" in some circles) used as a layout tool was horribly over-used - and the 'net is a worse place because of it. In most projects now, I seek to replace all instances with a "compatible" approach.

    I create a class: .spacer{
            line-height:0;
            font-size:0;
    }

    Then I replace all those hundreds (and sometimes THOUSANDS) of references to s.gif with the following:



    I use a span sometimes, as required - if the DIVs alone cause layout issues.

    Say hello to faster web pages instantly!
    • by M0b1u5 (569472) on Sunday September 03, 2006 @11:12PM (#16035538) Homepage
      Oops Sorry!

      <div class="spacer" style="width:Xpx; height:Ypx;"></div>
    • by suv4x4 (956391)
      It was called "spacer.gif". It was not abused at the time since empty div/span didn't work. In fact Netscape barely supported any div/span.

      Also same can be said for table cells with in them which in some browsers would collapse or misbehave.

      You create empty space with "spacer div" today which is not better that what people did back then. In fact it's worse since they had no much alternatives while you do: padding/margin/border where applicable.

      White space is rarely just a block of empty space floating arou
  • Vim, grep, and sed. I heard they make movies, too! :-)
  • Really, the only way to do a cleanup of your typical dog's breakfast collection of html is

    1. Tidy the pages (using htmltidy)
    2. Use a custom written script in whatever language (perl is good) to do as much of the task as possible automatically (things like replacing static headers with includes) - you'll need to be good with regex
    3. Open the pages manually, and finish the job - I like Dreamweaver for this particularly if it's a complicated table based layout

    whatever the case, it's going to take you a lot of
  • Firefox with plugins (Score:3, Interesting)

    by bhav2007 (895955) <bhav2007&houston,rr,com> on Sunday September 03, 2006 @11:29PM (#16035605)
    Firefox with the IE Tab (or IE View), Web Developer, View Formatted Source, and HTML Validator extensions.
    • I'll second all that, though I have no experience with View Formatted Source. I'll make a note to check it out - in debugging page display and layout issues, those other three extensions are absolutely indispensable.
    • by deek (22697)
      Web Developer gets the two thumbs up from me. An absolutely essential plugin for html creators.

      I'd also recommend Live HTTP Headers. OK it's not a html tool, strictly speaking, but it is extremely useful for debugging any web server issues. There's no other way to track HTTP issues down, I believe, unless you telnet to the webserver. Error/access logs on the webserver don't often contain enough info, unfortunately.
      • by an_mo (175299)
        Other than web developer plugin, I recomment the color picker plugin; it is not just for colors: it displays the DOM path on the status bar, that is priceless, combined with the web developer plugin tools
    • I think this is great advice, regardless of the other tools you like to use. Firefox has some remarkably powerful and mature plug-ins available for web developers now. With the right combination, you can navigate your document's object model and see how the displayed page is being built from it, view the effective CSS on any element in your document (including any calculated values), view annotated source that shows standard compliance problems, and more, all from within your browser.

      Rather than listing s

  • Vim for the editting, Emacs for the web server, interpreted language, games, database, web browser to check it with, source code management, image editor, vector graphics editor, e-mail client, e-mail server, ...
  • Actually... Frontpage.

    No, really, stop laughing.

    Frontpage, once you convince it to stop the WYSIWTG crap, has three tools that will make fixing a non-technical user's webpage easy. (Never, ever, let a non-technical user use Frontpage without supervision. It's worse than Word.)
    1. "Site Management", where you can let Frontpage check for dead files, orphan files, broken links, and do mass re-names of all HTML-based links. (No script correction here, but non-techies don't do that.)
    2. Regular Expresions (or a workable subset thereof)
    3. VBA, to invoke things like "optimize HTML" and "standardize name"

    I'd be shocked if there aren't better tools out there -- but by and large either they don't do as much, or they cost a significant chunk of change.

    (Hey, you, with the laughing -- point me to a app that can do #1 with compatible replacements for #2 and #3, and, er, you'll get good karma for being so mean and laughing.)
    • by masklinn (823351)

      I know that DreamWeaver has very strong "Site Management" features but I don't know if it can check for dead files/broken link and do mass renames (I tend not to use dreamweaver when I can avoid it). It also has a very good (PC)RE support, and you can build "extensions" to the software by using CSS & the DOM to manipulate your documents.

  • Leave it as a giant tangled mess and secure your job for the next 3 years. When they threaten to lay you off, tell them you need at least 1 more years of work before you can straighten up the code and 'hand off' the job to the new webmaster.
  • Two tidbits (Score:3, Interesting)

    by ptaff (165113) on Monday September 04, 2006 @12:03AM (#16035708) Homepage

    Tidy is great as others mentioned. Will even allow if you feel confident to cherrypick the data you want to scavenge with XSLT.

    Separating grain from chaff

    A static HTML project has numerous index2.old.html, index2.html, index_2.html, project2.html.old and so on - files that you just aren't sure are useful?

    Copy the project directory (touch all the files) and do a wget -r on the tree; by looking at the access time, you'll know all internal referenced files. Alternatively, scan the webserver logfiles to know which files are useful.

    Be sure your filesystem is configured to register access times if you pick the first method...

    (As a bonus, a close peek on the 404s might give you some answers on mis-used capitalization of filenames.)

    Lynx / Links / ELinks

    Can be used to dump the text data of old and unmaintainable HTML documents; most useful when trying to scavenge only the text contents to put in a database or so.

  • First, before bitching about something, you should take a moment to learn about it.

    "It has all the usual problems: paragraph tags with no ending tag"

    There's no end tag required for paragraphs, as per the official spec: http://www.w3.org/TR/REC-html40/index/elements.ht m l [w3.org]

    HTML is not XML. Closing tags are optional for some elements, and forbidden for several others. and putting a slash at the end of a tag that doesn't have a closing tag, so it looks "xml-y" is an affectation and a waste of bytes.

    • Hey, 1998 is calling, it want's it's post back.

      HTML is not XML.


      It is now. [w3.org]
      • by masklinn (823351)
        no it's not [w3.org] HTML is still SGML, and still alive and well.
        • Re: (Score:3, Insightful)

          by Bazman (4849)
          The great thing about web standards is... there's so many of them!
      • by tomhudson (43916)
        No its not. The valid doctypes listed by W3C include the one I link to. What you link to is NOT an HTML specification, or did you take the short bus to school?

        This specification defines the Second Edition of XHTML 1.0, a reformulation of HTML 4 as an XML 1.0 application, and three DTDs corresponding to the ones defined by HTML 4.

        So no, XHTML != HTML.

    • Although you're right, flamewar in 5..4..3..2..1.. *ducks*

  • Actual Search and replace [divlocsoft.com] $30, windows only. But Lord have mercy, if you need to do massive replavement in text files it is worth every cent. It does perl regex searches and plain old english pattern matching. Good customer service.....yada yada yada, and no - It is not mine, just a satified customer.

    Sera

  • Previous posts have mentioned Perl and PHP; seconding those for high-intensity search-and-destroy missions. As for software, you can't go wrong with TextPad [textpad.com], WinSCP [winscp.net], and PuTTY [greenend.org.uk].

    For best practices (separation of content from structure from behavior, mostly) keep an eye on are listed in and around A List Apart [alistapart.com] and the Web Standards Project [webstandards.org]. And if you're looking for several sets of outstanding presentation and behavior tools, check out the YUIBlog [yuiblog.com] and the Yahoo! Developer Network [yahoo.com]. (Hint: their page grid l [yahoo.com]
  • Cheat with PHP (Score:2, Informative)

    by GloomE (695185)
    $doc = new DOMDocument();
    $doc->loadHTML($junky_html);
    echo $doc->saveHTML();
    Reads in your crappy HTML, turns it into compliant XML, then dumps it out as nice clean HTML.
  • by Tumbleweed (3706) * on Monday September 04, 2006 @12:51AM (#16035910)
    Tidy, as others have already mentioned, will be your very best new friend.

    Install the 'Web Developer' extension for Firefox, and use some of the HTML/CSS validators in the Tools submenu.

    Get a good handle on regex searching & replacing (if you're doing this from Windows, I suggest Funduc's "Search & Replace").

    If you're migrating your GIFs to PNG (which I would recommend), then you need to get yourself pngout, to compress them to their smallest possible size (Photoshop SUCKS at this).

    And as someone else said, make an empty new standards compliant template, and get to cutting and pasting; it can be a *brutal* initial process, but you'll probably save yourself time in the long run, depending on how clean you want to eventually get the code. If you just want it to be standards compliant, then you can just do a clean up job. If you want to do it 'right,' you'll want to develop a new template and coding style to properly integrate the HTML and CSS. Things like not putting everything in a DIV (a sure sign you're a newbie to CSS), just to style something. Figure out why you should be using H1, H2 tags (& TBODY & TH tags if you're using tables for outer layout), etc, without having to use a lot of unnecessary DIVs all over the place. Inline styles = bad.

    Figure out why XHTML may not be the best choice over HTML. Know which DTDs to specify. Know the difference in IE6 between standards mode and quirks mode, and which DTD to use to make IE6 behave. Know that IE7's quirks mode is supposedly identical to IE6's; you supposedly won't get the new 'more-standards compliancy' in IE7 without a DTD.

    Oh yeah - the guy who posted about replacing spacer gifs with 'spacer DIVs'? Don't do that to yourself, okay? Yikes.

    Learn about usability and readability. Learn about typography, and how light-on-black text should be sized differently from black-on-light. Thinking about grey text on black or grey text on white? Don't be stupid. Make the stuff readable! Learn that sans serif fonts are more easily read at screen density (opposite of print). Learn why Verdana is usually not your friend (go for Trebuchet MS or even Arial).

    Oh, and learn to intent your freaking HTML!

    Some nice resources:

    Activating the Right Layout Mode Using the Doctype Declaration [hsivonen.iki.fi]

    Quirksmode [quirksmode.org] - a GREAT resource. Awesome info here. Memorize it.
  • When I clicked the page I was so sure I would see at least one "What's in My HTML Toolbox? vi!" comment, modded Funny of course, but no...

    Maybe I should check again later...

    • by Ashtead (654610)

      Of course vi is used. I use it myself, for everything, and I've seen a couple others above mentioning it.

      I've also made me a little utility that would take a string from the document I'd be writing in, and generate a link <a href="..."> ... </a> style, and put it back in, courtesy of vi. Useful when writing up documentation about programming and configuration.

      Then there is the matter of getting rid of carriage-returns and get the text file into the true format, with lines separated by newlin

  • CSSEdit by Macrabbit.

    Awesome program and worth checking out if you use a Mac.
  • A hammer for hitting myself over the head, and a bottle of whiskey to numb the pain... of dealing with HTML.
  • Changing old code to new code could rarely be automated, it's not a simple syntax change, it's aq paradigm shift, and computers are not as smart yet as to figure out the semantics of old code and rewrite it into HTML/CSS combo.

    HTML Tidy is something free and available which will do the very basic work of cleaning up and fixing the HTML where possible.
  • I use Adobe Golive for this, and it's served me well. It detects errors like broken links, and offers batch fixing.

    Failing that, perl is probably your best bet.
  • Dreamweaver (Score:3, Interesting)

    by Leroy_Brown242 (683141) on Monday September 04, 2006 @02:48AM (#16036425) Homepage Journal
    As much as WYSIWYG editors some times suck, Dreamweaver is alright. I like that it helps with the organization but also lets me get as geeky as I'd like.
  • jEdit (www.jedit.org) - best editor in existance, unmatched functionality
    Dreamweaver 8 (on OS X) DW is an outdated way to do things, but it still is very powerfull
    Quanta (Quanta Gold for Win or OS X - > http://www.thekompany.com/products/quanta/ [thekompany.com]; Quanta Plus for Linux -> http://quanta.kdewebdev.org/ [kdewebdev.org])
    PHPEclipse (has anoyances but very good PHP tools)

    For a redo of that old site of yours I recommend simply installing a CMS and migrating the content by hand if neccesary. That's probably faster and more ef
  • mixed case file names that work on Windows but not on a Linux webserver

    In my experience, mixed case is respected by Linux but not Windows (e.g. if you're testing links on your Windows box, FILE.HTML is the same as File.html; but on Linux they're distinct), which is I assume is what is meant. I use a tiny DOS app called tolow [programmersheaven.com] which makes all filenames lowercase, as Windows apps often do capitalise names spontaneously.

  • I will no doubt get replies that "Scripting Language X would be better", but I have the most experience with Perl. So if time was of the essence, that's what I'd use. Perl is a Swiss Army Knife in this kind of situation, and you can easily get just about any kind of blade or tool you might want to deal with files and formatting via CPAN.

    You can use Perl to fix the file names, restructure the directories, extract the content, put it into a database, and even drive the new site if you'd like. No matter what t
  • On all of my Windows machines, I keep a copy of CSamp [netreach.net] running in the systray at all times. It's a tiny little app that will grab the RGB/Hex values for any pixel on the screen. Great for matching colors in images, or if you like me are too lazy to view source and go digging for a color attribute.
  • by Selanit (192811) on Monday September 04, 2006 @05:54AM (#16036970)
    My biggest web devel tool is Firefox, with the Web Developer [chrispederick.com] extension and the HTML Validator [skynet.be] extension. The former does all sorts of amazingly neat things like letting me get precise info about any element within a page (using "Dispaly Element Information" under the "Information" menu, CTRL+SHIFT+F for short), showing me the HTTP response headers to any given page, add custom styles to a page, validate links, check for Section 508 accessibility compliance, resize the window for simulating lower screen resolutions, and on and on and on!

    The latter does instantaneous HTML validation using Tidy and displays any errors or warnings on the "view source" page. It also gives me LINE NUMBERS in the view soucrce window, which is a blessing. The beta version (which I prefer) lets you pick between the Tidy algorithm and the W3C's SGML parser. The SGML parser version gives the same errors as the W3C's own online validator, but without any need to submit the page through an online form.

    As for editing HTML, I generally use SciTE [scintilla.org] or one of its derivatives (eg Notepad2). Sadly, those aren't available under Mac OS X, so when I need to work on a Mac box I use Smultron [sourceforge.net]. THAT, however, is just an editor. People get religious about their editors, so my advice is just to pick one that suits you and ignore anybody what sniggers at you.
  • by bcmm (768152) on Monday September 04, 2006 @06:36AM (#16037072)
    There is a small utility called dos2unix which changes MS-style line endings in text files to Unix style. /usr/bin/mac2unix is symlinked to dos2unix on my Gentoo box, so I guess it can fix MacOs line endings too.
  • I prefer to use vi (of the elvis variety) unless I'm editing a page some a$$hole has used dreamweaver or frontpage to create. I can't stand "^M"! If I'm doing some heavy php work then I use Bluefish.
  • http://validator.w3.org/ [w3.org]

    http://jigsaw.w3.org/css-validator/ [w3.org]

    Along with awk, sed, vi/pico/nano, and occasionally perl for really complex alterations.
  • Perl, especially Template Toolkit, with Emacs takes care of most things.
  • I had to do some extraction of text from HTML, so I wrote a program for it. It may or may not be useful in this case, and it doesn't always do 100% of the job (but I've found the 98% it does do to be very useful). It is (OF COURSE!) open source so you are free to tinker and improve it for your own use. Download from:

    http://jsoftco.8m.com/download.html [8m.com]

  • Keep the old version around to review with... then rebuild the whole thing in a CMS.

    - Set up your stylesheet to cover all the examples in the old version... just click through the old site and pick out consistent examples of html entities... don't forget to scope your entities by providing IDs around such areas as menus, masthead, sidebars, advertising, etc.

    - Ignore anything that is similar enough to look almost the same, no one will complain if you resolve inconsistencies... but will if you make unilateral

The speed of anything depends on the flow of everything.

Working...