Due to noise, actual converters are limited to about 21 bits of dynamic range. 32, and even 64-bit floats can be useful for in memory representations, but provide zero benefit for fixed storage.
For a final mix, properly dithered 44.1kHz, 16-bit will cover everything from a whisper to about as loud as you can listen to without incurring permanent hearing loss. And that includes golden eared humans. No one can ABX between a high quality 16-bit ADC-DAC pair, and a wire, at say 40-90dB recording and listening levels.
At the recording and mixing stages, the extra precision/dynamic range of a "24-bit" recording is useful, mainly to avoid clipping. Along with some slight gains when mixing 10s to hundreds of sources together.
Nominally going 96, instead of 44/48 is the equivalent of having 1 more bit of precision at 44/48. At the cost of doubling the storage required.
You'd really be best off recording at 96kHz/24-bit and then immediately re-sampling to 48kHz/24-bit. You'd have what little extra human hearing range information the 96kHz recording provides, without doubling the storage space.
Of course in reality, the 21-bit converter may perform worse at 96, than it does at 44/48, so you may not gain anything at all. Though, I think, these days, most of the high quality converters can get you 20ish bits at up to 192kHz. But even at 192, you could re-sample back down to 44/48, and keep all the audible precision. You'd have to go up to about 700kHz, while maintaining 21 bits of base recording precision, to be able to capture anything in the human hearing range, beyond what a 44/48 24-bit recording can contain.
These days, there's virtually no difference between 44 and 48 kHz, you can re-sample between the two nearly perfectly with modern computers, and modern(circa 1990s and later) over-sampling converters can do a nearly perfect job with either.
In the late 70s and early 80s, 48kHz, let you get slightly better quality, out of slightly cheaper analog anti-aliasing/reconstruction filters. 48kHz also fit more neatly into some ancient digital tape format. It doesn't really matter which you use these days. Not to mention, it also let component manufacturers, extract more money from professionals by offering a pro model with 4 more kHz.
To sum up, as a storage format, with modern audio hardware, 44.1kHz 24-bit can contain everything that can be captured with a microphone, that is within the human hearing range. 96kHz workflows exist due to marketing, not science.