Comment There is an easy fix for this!!!! (Score 3, Interesting) 136
Instead of crawling websites, why doesn't amazon and other companies just require you to have formated index of all the links you provide on your website. Could be amazon.xml in the root. And this file could be dynamic or hand-typed...
http://www.yourwebsite.com/amazon.xml http://www.somewebsite.com/~yoursite/amazon.xml
http://www.yourwebsite.com/amazon.xml http://www.somewebsite.com/~yoursite/amazon.xml