A monad is not a specific data structure (like an array) or a language feature (like a loop or conditional). It is not something that is put together by the compiler (like a compiled function) and it is not magic, either.
"Being a monad" is a mathematical property, like "being commutative" or "being symmetric". It has to do with values following a particular pattern, having operations defined on them that interact in a particular way. (These are called "monad laws", and they're the same sort of thing as the "identities" you get in high school algebra.)
Any sort of thing that follows the monad laws is a monad. In Haskell, we say that any type that follows the monad laws can be made a member of typeclass Monad. Casually, we say that "lists are a monad" and "IO actions are a monad".
What does it mean to be a monad? Roughly, it means that you can chain operations together. You can map a function over a list, or filter the list's elements to get a new list, or more generally "fold" a list by recursively applying an operation to collect a result. (Like the factorial function repeatedly applies multiplication.) Also you can join a list of lists together into a single big list.
It turns out that monads can represent the idea of "sequence" quite well. Lists are, after all, a sort of sequence. But so are IO actions -- all those "non-functional" things a program has to do, like read a file from disk, or write its contents to the terminal. IO actions form a sequence in time: if you want to write the file's contents to the terminal, you have to read them from the disk first.
The way that this works, under the hood, is to express the "later" actions as having the "earlier" actions' results as function arguments. Since a function's arguments have to be evaluated before the function can be applied, the disk gets read before the terminal-writing happens.
So monads are not a loophole that lets you cheat and do non-functional things inside a functional language. Rather, they are a model that lets you describe IO actions as just another sort of functional value. The only bit of "magic" necessary is that the main function of a program has the IO type.