I need to offer you credit; you are right. The issue isn't really PAE, it's how the kernel manages memory on 32 bit x86 architectures with more than 1GB of memory installed. PAE simply exacerbates the problem. Here's an explanation of the complaint:
On ia_32 systems, the kernel splits memory into 3 zones; DMA, NORMAL, and HIGHMEM.
ZONE_DMA is the first 16MB of memory, and is generally avoided unless needed (due to lack of available higher memory, or for DMA mappings.) The kernel tries to reserve this address range for devices that use DMA mapping.
ZONE_NORMAL is an address space that is directly accessible to the kernel, and extends from 16MB to 896MB. Kernel data structures are stored in this space, including the kernel page tables. Memory mappings start to consume a lot of memory in ZONE_NORMAL, and thus PAE on ia_32 with a lot of installed memory can cause out of memory issues, even when there is a lot of available physical memory. User data can be allocated into ZONE_NORMAL, but is preferred to be placed in ZONE_HIGHMEM to free ZONE_NORMAL for kernel data structures.
ZONE_HIGHMEM is memory above the 896MB barrier. This address range is not directly accessible to the kernel. In order for the kernel to access anything in this zone, a temporary map must be made into ZONE_NORMAL. These mappings consume pages of ZONE_NORMAL, and suffer a performance hit. User space processes can access these pages directly (handled by the virtual memory manager system, of course.)
Generally, memory will be allocated to ZONE_HIGHMEM, ZONE_NORMAL, or finally ZONE_DMA in that order of preference.
The x86_64 architecture eliminates the need ZONE_HIGHMEM. ZONE_NORMAL extends all the way from 16MB to the end of physical memory. This approach simplifies memory management, improves performance, and is generally more flexible.
You're correct that there was a major issue with my original post... My memory of the kernel architecture had garbled HIGHMEM with PAE, and I was thinking that PAE required mapping pages above 4GB into lower memory. This would of course cause a huge performance penalty for any process consuming memory above 4GB. I deserve downmods for the technical inaccuracy.
Here's a very brief summary of the problems with HIGHMEM:
http://linux-mm.org/HighMemory
Here's a bunch of links used to refresh my memory:
http://www.makelinux.net/ldd3/chp-15-sect-1
https://www.kernel.org/doc/gorman/html/understand/understand005.html
http://unix.stackexchange.com/questions/5143/zone-normal-and-its-association-with-kernel-user-pages