What's in Your HTML Toolbox? 192
Milo_Mindbender asks: "I've just ended up in charge of cleaning up an old and rather large website created by some non technical people. It has all the usual problems: paragraph tags with no ending tag; mixed case file names that work on Windows but not on a Linux webserver; files with mixed Windows/Linux/Mac line endings; duplicates or partial duplicates of files created when working on pages; and the list goes on. I'm wondering what tools you guys keep in your HTML/website toolboxes that work good for cleaning up this sort of mess. Things like pretty-printers, HTML 'lint' programs, dead file detectors, batch renamers (that change links and the files they point to into OS neutral names), and 'diff' programs that ignore HTML whitespace. I'm particularly interested in batch processing tools that actually fix problems (not just report them) because I've got a lot of files to deal with and don't have the time to edit every one by hand. So what's in YOUR toolbox?"
What's in your... (Score:4, Funny)
CAPITAL ONE!
[...]
Wait, what was the question again?
FTW (Score:2)
Good luck!
Re: (Score:3, Informative)
Obligatory (Score:2, Informative)
Or, in emacs
M-% (AKA Meta(usually Alt)-Shift-5)
Query Replace: ^M with [nothing]
P.S. Note that ^M is not Caret-M. It is a single character. I usually just copy it out of the file, and then do it in emacs.
Re: (Score:3, Informative)
Question for you: how would you do that across multiple files in emacs?
The global search and replace command using vim on a single file would be simply:
%s/^m//g
Re: (Score:3, Informative)
Use dired on the directory where files are located (i.e. M-x dired), mark the files you are interested in (with 'm'), then use 'Q' (uppercase) to perform what basically is a query-replace-regexp on all the marked files (actually it is dired-do-query-replace-regexp).
See the GNU Emacs dired documentation for further details.
Re:Obligatory (Score:4, Informative)
Re: (Score:2)
Debugging HTML/PHP/etc files: UltraEdit-32 [ultraedit.com]. $40 for the single best Swiss Army Editor I've ever found. In cominbation with Tidy (which it has 100% integrated in the interface), it can handle every file-related problem mentioned except for the names themselves. Out of the box, it can do everything from line ending conversion to your standard synt
Re: (Score:2)
Cheers.
Re: (Score:3, Interesting)
vim has excellent syntax highlighting, predictive typing, line numbers, search and replace (with regular expressions), code folding, spell-
windows... (Score:2)
Place a curse on Microsoft [i-curse.com]
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re:Macros (Score:4, Informative)
q<register> to record a macro, q to finish recording. Execute the macro with @<register>, then you can execute it again with @@. Obviously the @ commands can be prefixed with a number to repeat them that many times, 5@@ would repeat the last macro 5 times, for example.
Re: (Score:2)
My main complaint about emacs (I tried it for about a month) was the key structure. I didn't like holding down Ctrl whenever I want to do something - I prefer vim's modal command system. I could see how it could annoy some people, however.
I honestly haven't found the need for particularly sophisticated macros while I'm editing. The . (repeat last command) and ! (pipe) keys have always been enough for what I need.
I'm still learning vim, but I like wh
Re: (Score:2)
This is why editor-vs-editor arguments get so silly sometimes. People often fail to realise that requirements differ.
Clever use of emacs keyboard macros (and presumably vim too, I wouldn't know) makes a huge difference to a lot of the common tasks I perform. For example one common task is to take an API document and turn it into a class file (missing only the code in the method bodies). There was a time when I use
Re: (Score:2)
Seeing as you got flamed for this opinion, I thought I'd help you out.
UltraEdit-32 is damn good, I'm sure it's not as slick as some of the other Editors out there, but it has good syntax highlighting, tabs, and the ability to run macros or spawn sub processes and capture the output. Yes you have to put a bit of work in to get it how you like it, but overall it fits the bill, and if you can actually write co
Re: (Score:2)
OK, most of the time it's just minor-annoyance nagware, but I figured, I use it so much, might as well pay them for some kind of use.
The only other thing I absolutely swear by for HTML/CSS is BradSoft's "TopStyle Editor" for CSS. Yeah sure, I can use a text editor for the same thing, but TopStyle makes my life easier.
Re: (Score:2)
I use the WebDeveloper toolbar in Firefox all the time, and the edit css piece of it is great, but it does have limitations.
Re: (Score:2)
I think that you've been partaking too much of the crack-pipe, dear boy.
Seriously, though, it's your choice - it's a program I find useful, and others may. It's their choice too. Like I said it's free/nagware, so they can try it and like it or ditch it.
Just my two pence/cents.
Re: (Score:2)
what's so good about dreamweaver? (Score:2)
What kind of advantage would using dreamweaver give you in a situation like this?
I first started with HTML/websites in the mid 90s with AOLPress, then Adobe Pagemill, NetObjects FUSION, GoLive Cyberstudio (which was bought by adobe and turned into GoLive), and eventually, I dropped all of these studio apps in favour of vim using PERL and eventually moved on to PHP.
I've since started using this great app called TextMate [macromates.com], and when I get a complete site that I need to work on, I pipe the code t
Perl (Score:2, Informative)
Depending on how bad it is, consider rewr
Tidy or Meyer (Score:5, Informative)
Good luck. You're going to need it.
HTML Tidy (Score:3, Informative)
Re: (Score:2)
And if you're of the OSX persuasion there's a port here [balthisar.com].
Re: (Score:3, Informative)
JavaScript and batch files (Score:2)
Re: (Score:2)
You can start by simply directly executing the commands you generate, but you should learn the basic filesystem objects so you can manipulate files more safely and directly.
I'm assuming you're not on UNIX, because in that case you should either learn some shell or Perl. I prefer Perl because its escaping mechanis
Why use static HTML? (Score:2)
I use PHP. Server side includes are perfect for standard headers/footers. I check server variables to change behavior based on whether it's on the dev server or the final webserver.
I'd paste an example, but slashdot seems to think PHP code is "junk characters".
Re: (Score:2)
It is.
PHP was designed for about what you're describing, when there were better technologies out there to do the same thing. It really looks like it was just supposed to be a bunch of PHP tags you'd mix in with your HTML tags, so you didn't have to think too much like a programmer, and could think more like a web coder.
This is a bad idea in the first place -- if you want to do dynamic stuff, learn to program. Worse, for som
Re: (Score:3, Interesting)
If what you need is very simple (including footers would count as simple), here's more information about server side includes [apache.org] (SSI). Either rename your pages
If you want something more complex, you can use SSI to include a mini-CGI script into the middle of your HTML. CGI scripts can be written in any language, even a sh
Re: (Score:2)
HTMLKit for Windows (Score:5, Interesting)
MY toolbox... (Score:4, Funny)
Creating white space (Score:4, Interesting)
I create a class:
line-height:0;
font-size:0;
}
Then I replace all those hundreds (and sometimes THOUSANDS) of references to s.gif with the following:
I use a span sometimes, as required - if the DIVs alone cause layout issues.
Say hello to faster web pages instantly!
Re:Creating white space - apologies (Score:5, Informative)
<div class="spacer" style="width:Xpx; height:Ypx;"></div>
Re:Creating white space - apologies (Score:5, Informative)
Re:Creating white space - apologies (Score:4, Informative)
margin-top: 1px;
margin-right: 2px;
margin-bottom: 3px;
margin-left: 4px;
or margin: 1px 2px 3px 4px;
padding-top: 1px;
padding-right: 2px;
padding-bottom: 3px;
padding-left: 4px;
or padding: 1px 2px 3px 4px;
Re: (Score:3, Informative)
Re: (Score:3, Insightful)
Re: (Score:2)
Also same can be said for table cells with in them which in some browsers would collapse or misbehave.
You create empty space with "spacer div" today which is not better that what people did back then. In fact it's worse since they had no much alternatives while you do: padding/margin/border where applicable.
White space is rarely just a block of empty space floating arou
Ohh! (Score:2)
Tidy, script, then manually clean (Score:2)
1. Tidy the pages (using htmltidy)
2. Use a custom written script in whatever language (perl is good) to do as much of the task as possible automatically (things like replacing static headers with includes) - you'll need to be good with regex
3. Open the pages manually, and finish the job - I like Dreamweaver for this particularly if it's a complicated table based layout
whatever the case, it's going to take you a lot of
Firefox with plugins (Score:3, Interesting)
Re: (Score:2)
Re: (Score:2)
I'd also recommend Live HTTP Headers. OK it's not a html tool, strictly speaking, but it is extremely useful for debugging any web server issues. There's no other way to track HTTP issues down, I believe, unless you telnet to the webserver. Error/access logs on the webserver don't often contain enough info, unfortunately.
Re: (Score:2)
Re: (Score:2)
I think this is great advice, regardless of the other tools you like to use. Firefox has some remarkably powerful and mature plug-ins available for web developers now. With the right combination, you can navigate your document's object model and see how the displayed page is being built from it, view the effective CSS on any element in your document (including any calculated values), view annotated source that shows standard compliance problems, and more, all from within your browser.
Rather than listing s
Vim and Emacs (Score:2)
Actually... Frontpage (Score:3, Funny)
No, really, stop laughing.
Frontpage, once you convince it to stop the WYSIWTG crap, has three tools that will make fixing a non-technical user's webpage easy. (Never, ever, let a non-technical user use Frontpage without supervision. It's worse than Word.)
I'd be shocked if there aren't better tools out there -- but by and large either they don't do as much, or they cost a significant chunk of change.
(Hey, you, with the laughing -- point me to a app that can do #1 with compatible replacements for #2 and #3, and, er, you'll get good karma for being so mean and laughing.)
Re: (Score:2)
I know that DreamWeaver has very strong "Site Management" features but I don't know if it can check for dead files/broken link and do mass renames (I tend not to use dreamweaver when I can avoid it). It also has a very good (PC)RE support, and you can build "extensions" to the software by using CSS & the DOM to manipulate your documents.
Dude... (Score:2)
Two tidbits (Score:3, Interesting)
Tidy is great as others mentioned. Will even allow if you feel confident to cherrypick the data you want to scavenge with XSLT.
Separating grain from chaff
A static HTML project has numerous index2.old.html, index2.html, index_2.html, project2.html.old and so on - files that you just aren't sure are useful?
Copy the project directory (touch all the files) and do a wget -r on the tree; by looking at the access time, you'll know all internal referenced files. Alternatively, scan the webserver logfiles to know which files are useful.
Be sure your filesystem is configured to register access times if you pick the first method...
(As a bonus, a close peek on the 404s might give you some answers on mis-used capitalization of filenames.)
Lynx / Links / ELinks
Can be used to dump the text data of old and unmaintainable HTML documents; most useful when trying to scavenge only the text contents to put in a database or so.
First, you better learn HTML before complaining .. (Score:2)
First, before bitching about something, you should take a moment to learn about it.
"It has all the usual problems: paragraph tags with no ending tag"
There's no end tag required for paragraphs, as per the official spec: http://www.w3.org/TR/REC-html40/index/elements.ht m l [w3.org]
HTML is not XML. Closing tags are optional for some elements, and forbidden for several others. and putting a slash at the end of a tag that doesn't have a closing tag, so it looks "xml-y" is an affectation and a waste of bytes.
Re:First, you better learn HTML before complaining (Score:2)
It is now. [w3.org]
Re: (Score:2)
Re: (Score:3, Insightful)
Re: (Score:2)
So no, XHTML != HTML.
Re:First, you better learn HTML before complaining (Score:2)
Although you're right, flamewar in 5..4..3..2..1.. *ducks*
Re: (Score:2)
This has got to qualify as the WTF of the Day:
"One would assume, seeing as it's 2006 and all, that he intends to rebuild the site as a modern standards compliant site. Even if he chose html 4.01 instead of xhtml it's still best practice to close all your tags."
HTML 4.01 IS the current HTML (as opposed to XHTML) standard. And some of those bullshit "best practices", like "closing all tags", are forbidden by that very standard. Its not like its hard to read. I linked to the specific page on the W3C site.
Search and Relplace (Score:2)
Sera
WebdevTML Survival Kit (Score:2)
For best practices (separation of content from structure from behavior, mostly) keep an eye on are listed in and around A List Apart [alistapart.com] and the Web Standards Project [webstandards.org]. And if you're looking for several sets of outstanding presentation and behavior tools, check out the YUIBlog [yuiblog.com] and the Yahoo! Developer Network [yahoo.com]. (Hint: their page grid l [yahoo.com]
Cheat with PHP (Score:2, Informative)
tidy, web developer FF extension, search & rep (Score:4, Informative)
Install the 'Web Developer' extension for Firefox, and use some of the HTML/CSS validators in the Tools submenu.
Get a good handle on regex searching & replacing (if you're doing this from Windows, I suggest Funduc's "Search & Replace").
If you're migrating your GIFs to PNG (which I would recommend), then you need to get yourself pngout, to compress them to their smallest possible size (Photoshop SUCKS at this).
And as someone else said, make an empty new standards compliant template, and get to cutting and pasting; it can be a *brutal* initial process, but you'll probably save yourself time in the long run, depending on how clean you want to eventually get the code. If you just want it to be standards compliant, then you can just do a clean up job. If you want to do it 'right,' you'll want to develop a new template and coding style to properly integrate the HTML and CSS. Things like not putting everything in a DIV (a sure sign you're a newbie to CSS), just to style something. Figure out why you should be using H1, H2 tags (& TBODY & TH tags if you're using tables for outer layout), etc, without having to use a lot of unnecessary DIVs all over the place. Inline styles = bad.
Figure out why XHTML may not be the best choice over HTML. Know which DTDs to specify. Know the difference in IE6 between standards mode and quirks mode, and which DTD to use to make IE6 behave. Know that IE7's quirks mode is supposedly identical to IE6's; you supposedly won't get the new 'more-standards compliancy' in IE7 without a DTD.
Oh yeah - the guy who posted about replacing spacer gifs with 'spacer DIVs'? Don't do that to yourself, okay? Yikes.
Learn about usability and readability. Learn about typography, and how light-on-black text should be sized differently from black-on-light. Thinking about grey text on black or grey text on white? Don't be stupid. Make the stuff readable! Learn that sans serif fonts are more easily read at screen density (opposite of print). Learn why Verdana is usually not your friend (go for Trebuchet MS or even Arial).
Oh, and learn to intent your freaking HTML!
Some nice resources:
Activating the Right Layout Mode Using the Doctype Declaration [hsivonen.iki.fi]
Quirksmode [quirksmode.org] - a GREAT resource. Awesome info here. Memorize it.
vi anyone? (Score:2)
When I clicked the page I was so sure I would see at least one "What's in My HTML Toolbox? vi!" comment, modded Funny of course, but no...
Maybe I should check again later...
Re: (Score:2)
Of course vi is used. I use it myself, for everything, and I've seen a couple others above mentioning it.
I've also made me a little utility that would take a string from the document I'd be writing in, and generate a link <a href="..."> ... </a> style, and put it back in, courtesy of vi. Useful when writing up documentation about programming and configuration.
Then there is the matter of getting rid of carriage-returns and get the text file into the true format, with lines separated by newlin
CSSEdit (Score:2)
Awesome program and worth checking out if you use a Mac.
What's in my toolbox? (Score:2)
Paradigm shift (Score:2)
HTML Tidy is something free and available which will do the very basic work of cleaning up and fixing the HTML where possible.
It's not FOSS, but... (Score:2)
Failing that, perl is probably your best bet.
Dreamweaver (Score:3, Interesting)
My HTML Toolbox (IAAWD - I am a web developer) (Score:2)
Dreamweaver 8 (on OS X) DW is an outdated way to do things, but it still is very powerfull
Quanta (Quanta Gold for Win or OS X - > http://www.thekompany.com/products/quanta/ [thekompany.com]; Quanta Plus for Linux -> http://quanta.kdewebdev.org/ [kdewebdev.org])
PHPEclipse (has anoyances but very good PHP tools)
For a redo of that old site of yours I recommend simply installing a CMS and migrating the content by hand if neccesary. That's probably faster and more ef
mixed case? (Score:2)
In my experience, mixed case is respected by Linux but not Windows (e.g. if you're testing links on your Windows box, FILE.HTML is the same as File.html; but on Linux they're distinct), which is I assume is what is meant. I use a tiny DOS app called tolow [programmersheaven.com] which makes all filenames lowercase, as Windows apps often do capitalise names spontaneously.
My Toolbox == Perl (Score:2)
You can use Perl to fix the file names, restructure the directories, extract the content, put it into a database, and even drive the new site if you'd like. No matter what t
CSamp Color Picker (Score:2)
Web Developer and HTML Validator Extensions! (Score:5, Informative)
The latter does instantaneous HTML validation using Tidy and displays any errors or warnings on the "view source" page. It also gives me LINE NUMBERS in the view soucrce window, which is a blessing. The beta version (which I prefer) lets you pick between the Tidy algorithm and the W3C's SGML parser. The SGML parser version gives the same errors as the W3C's own online validator, but without any need to submit the page through an online form.
As for editing HTML, I generally use SciTE [scintilla.org] or one of its derivatives (eg Notepad2). Sadly, those aren't available under Mac OS X, so when I need to work on a Mac box I use Smultron [sourceforge.net]. THAT, however, is just an editor. People get religious about their editors, so my advice is just to pick one that suits you and ignore anybody what sniggers at you.
Line endings - use dos2unix (Score:4, Informative)
elvis (Score:2)
My primary tools for fixing sites: (Score:2)
http://jigsaw.w3.org/css-validator/ [w3.org]
Along with awk, sed, vi/pico/nano, and occasionally perl for really complex alterations.
Perl and Emacs (Score:2)
HTML to Text (Score:2)
http://jsoftco.8m.com/download.html [8m.com]
Just rebuild..... (Score:2)
- Set up your stylesheet to cover all the examples in the old version... just click through the old site and pick out consistent examples of html entities... don't forget to scope your entities by providing IDs around such areas as menus, masthead, sidebars, advertising, etc.
- Ignore anything that is similar enough to look almost the same, no one will complain if you resolve inconsistencies... but will if you make unilateral
Re: (Score:2)
Re: (Score:2)
Re: (Score:2, Informative)
br is not now br
em and strong are still alive and well as of XHTML 2.0.
b and i are still available in XHTML 1.0.
There is no HTML 4.1. Presumably you meant 4.01 strict, which is pretty much XHTML 1.0 Strict.
Re: (Score:2)
Re: (Score:2)
WTF? Not a single one of these is gone in either XHTML or HTML 4.01... TT is still in, I is still in, B is still in, BIG is still in and SMALL is still in, the only elements that have been deprecated from HTML3 are STRIKE, S and U... And "<br>" is now not "<br/>", BR is an element, <br> is a self-closing empty HTML tag and <br/> is an empty XML tag. The former is semantics, the latters are grammar.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Brief aside: during the dot com webpage crunch we hired loads of "people who know html". We tried to keep enough knowledgeable web developers reviewing their work, but some odd ideas still slipped by. I watched somebody -- by hand -- converting all tags to lowercase because they "were smaller" and "would make the file size smaller and make the page load quicker". The guy was very proud of his "hand edited code".
--
Ev
Re: (Score:2)
Anyway, thank God for HTML Tidy -- no more doing that by hand, even if you still think it makes your filesize smaller.
Re: (Score:2)
I may install it at some point, but I'll probably go to something like Dojo first, so I let someone else worry about browser issues.
You might like this, if you don't have it already (Score:2)
Initialisation:
$XHTML_COMPATIBILITY = 1
# set to 0 if you don't want XHTML like "
" (simply "
" instead)
define TagEnd {
start = search("= 0) && (search_string(tag, ">",0) == -1)) {
# it really looks like an HTML tag: " (otherwise, comparison operators in PHP are hard to type)
newtag = replace_in_string(tag, "^\\= 0) {
# If this is a tag without content (like
or
Steve Irwin in your toolbox?!? (Score:2)
To fix the holes in your code?
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)