"147 line of code" which does not cover most of what we are talking about.
It does some of what someone (maybe you) said couldn't be done with this approach, thus proving its possibility...
To use the ingredient analogy. If wget is equivelant to a tomato and you change the wget code it is no longer a tomato but a genetically modified tomato that can only be used in that one recipie.
Again, so what? (And who says it can only be used in that recipe? I use the feature I added to wget all the time.)
Does your system handle hundreds of sites without hand editing a config file or script? Does your system monitor runs to see if they complete and figure out what to do if they do not? Does your system tell the difference between a no data timeout and a slow data timeout? Have you solved the problem of coordinating multiple wgets with host spanning?
I already answered the last question (don't use host spanning). With regard to the others, it doesn't matter. More requirements mean more coding, but the general approach of starting with wget instead of coding from scratch is going to get shit done as quickly as possible without reinventing the wheel. You can come up with features X Y and Z that aren't already simple switches (although, notice that others in this thread are listing features that are already simple switches) -- but that in itself is a poor argument for coding A, B, C... from scratch, when there's a lot already done for you.
Now, frankly, the issues you are listing seem pretty damn trivial to me. I just don't see what the big problem is. Still, I don't want to address them point by point in this thread. (I also recognize that, in principle, more difficult features to implement could be thought up.)
I think the real point underlying the article (IIRC...) is that perfectionism (and implementing everything yourself is a form of this) sure can waste a lot of time. If you're trying to make the most of your effort -- instead of trying to make the best piece of software possible -- you need an attitude that searches for a lazy "good enough" solution. If your business model depends on having a better web scraper than anyone else, then you might write one -- but if you're not selling a proprietary web scraper, then it probaly doesn't, and you're wasting your time, losing sight of the big picture.
BTW, the code I'm talking about retries failures infinitely, but only when specifically instructed to do so. Certainly good enough for my purposes at the time I wrote it. It would be trivial in that code to continually devote, say, x% of processes to retrying errors, if you wanted more automation. Just need to decide on x.
The original poster posited that everything can be done using generic unix functions with a little glue. That is patently false considering that there are many features that are part of system requirements that are not covered by standard Unix calls.
(NB. when you say OP, you mean the article; when I say OP, I mean the OPon slashdot, who disagreed.) My point is much more specific: that wget can do a lot more than OP said. I agree that 'xargs' won't suffice to drive wget for this purpose -- unless it does. Depending on your purpose, maybe you should just run it and use the results you get, accepting limitations -- at least you would get results, and without debugging any code.
Anyway, like I say above, although the limitations you list here are real, they still seem surmountable to me, and not with all that much effort. I certainly don't find the possibility of doing so "patently absurd", although the definition of "a little" glue is of course arbitrary. I originally got into the thread because I saw people saying things were not possible which I had already seen done.
The biggest failing of the Taco Bell analogy is that no matter how you combine the eight ingredients you still come up with crappy pseudo Mexican food; you do not create French Food, Itallian Food, Chineese food, etc. The same thing applies to Unix utilities; they do almost way you want but rarely everything.
It's not a failing, it's the point: you can make a lot of money (i.e., succeed in your goal as the proprietor of a corporate restaurant chain) by being content to produce crappy pseudo Mexican food, instead of trying some expensive gourmet menu (with so much more opportunity to fail).
(Is your goal to make the best food or sell the most food?)