Actually, mip-maps improve visual quality and performance if you have the memory available. They improve visual quality due to the reduced aliasing they can provide. The performance improvements are due to caching. You are correct that swapping textures can be a performance problem - even now it's best to sort your primitives to reduce state changes - but when you're using mip-maps, texels (texture elements - a pixel of a texture) are effectively grouped together in a smaller memory footprint. Video cards have caches much the same as CPUs do. If you have to move the entire big texture into this cache it exceeds the available space, whereas the smaller mip-mapped version is a quarter the size (and even smaller for the lower mip-map levels). The great thing about graphics is they have a lot of spatial coherence. If you select the correct mip-map level, adjacent pixels are highly likely to feature texels from within a small area. Without mip-mapping and with large textures the rasterizer has to jump all over the texture - and there goes cache coherency.
And it's not just GPUs. Back in Quake 1 and 2, they maintained a surface cache where they mixed the base texture with a lightmap (in software). Keeping the texture resolution as low as possible meant less space in the surface cache and less time mixing textures. You can read a bit about that here.
I hope that's all clear.