AI analysing video material for concepts/objects/ideas.

How much description would the AI need to produce to get across the intended concept/object/idea?

I dont know, you could test it yourself by .. reading a book and imagining its contents - did it feel like a movie playing in your head?

Never say never. There are ways around this problem.
For example in case memristors/holographic memory/some other magic form of storage is invented we could put 1PB universal dictionary in every computer, compressed material would always use this dictionary instead of adding it to the result file.
Another idea is a strong (very stronk) AI analysing video material for concepts/objects/ideas. Instead of compressing picture RGB values you send description of scenes, object locations and motion vectors, intents and moods. Decoder AI puts it back together and dreams up a video sequence according to the script.

so it sounds like a simple FFT to pick one with biggest high frequency coefficients. You can do it yourself smapping pictures at different focus and simply comparing file sizes: better focus = finer detail = more high frequency content = more information to store. Think JPG Quantization table and quality setting.

