That's been doable for 20 years.
It doesn't work on really complicated music (eg vocals, chorus, orchestration) but it works for the vast majority of low-effort music, which includes AI music, guitar and piano solos, drum solos, anything that has a very solid melody. You can see this in youtube when it's AI identifies music "melodies" without claiming the actual track.
Anyone who is really musically inclined, can completely reverse engineer any song just from listening to it. I can do that, if I think about it. but I don't have the tools or time to reverse engineer a song because I have no incentive to. If someone said "here's a million dollars, I need you to re-create my song from 30 years ago that has never been on CD." I could probably do that.
But the effort to do that would be substantial, because part of it is identifying the musical instruments, and part of it is identifying the human characteristics. Humans operate in the analog spectrum. Computers do not. So if a song is F, A, D, F, A, D, E, G#, C# (that is "the phantom of the opera") there are two things happening at the same time. The main instrument is an Organ, so F A D is held down for that, but the left hand is the "beat" which is "D" for the same 10 notes. So when a human plays it on an Organ, the instrument itself is responsible for for the connection between the notes. If you play it as a MIDI file, it can't match what the actual 1986 song sounds like. In fact, usually a MIDI version sounds rushed because it's played on an instrument that is not an Organ. So any time you come across a MIDI of a song, it usually sounds like instruments are missing.
Hence the idea about an AI being able to transcribe music back into sheet music/midi. MIDI's are just digital representations of sheet music. In theory you should be able to listen to anything and create a sheet music of it. But for a lot of really practical reasons, there are different sheet music representations, usually one per instrument, and when you have an orchestra, that slight latency of human's playing it is what isn't reproducible in MIDI form, and an AI can neither transcribe or recreate it.
All present machine learning when it comes to producing output, average it's training input. So the result is that AI generated music is a gross facsimile of real music. Sure it sounds like music, but it's an average of what thousands of other songs sound like. So this works like "inpainting" for music. It guesses what is most common next sound, without any understanding of the music theory.
It's been possible for decades to procedurally generate music as well, which unlike "AI", actually does take into account music theory. If you played Portal/Portal2, No Man's Sky, Deep Rock Galactic, some of that music was procedural. You'd likely never of really noticed that much because the procedural stuff doesn't stick out. Yet "Still Alive" from Portal does, because it's actually sung.