I can't comment in more detail about Alexa's bulk crawl strategy because it is only documented to the public (and us at the Internet Archive) in general terms: it is a broad survey crawl of the public web, weighted by Alexa's internal measures of site/page importance and legitimacy (which are at least partially based on the same toolbar data that drives their site rankings). While we expect to continue receiving the Alexa donations indefinitely, a growing proportion of the public archive is likely to come from other sources, including the IA's own crawling and other outside donors, in the future.
The Archive is funded by a combination of private donations from individuals and foundations (sometimes for general operations and sometimes for specific projects), and fees for services provided to our partners, who are public libraries and archives themselves. With 11+ year history, and long partnerships with customers and funding sources, we're pretty stable in the world of technology nonprofits.
I wasn't directly involved in the Ubuntu choice, but it's been nice to have our developer desktops in close sync with cluster servers.- Gordon @ IA