So they are a bit different, hardware wise. A big difference is unified memory. There is only one pool of memory which both the CPU and GPU access. That's makes sense since the CPU and GPU are also on the same silicon, but it is a difference in the way you program. Also in the case of the Xbone they decided to use DDR3 RAM, instead of GDDR5, which is a little slow for graphics operations, but the APU (what AMD calls the CPU/GPU combo chips) has 32MB of high speed embedded RAM on it to try and buffer for that.
Ok so there are some differences. However that aside, why the problem with the target? Visual quality. Basically, a video card can only do so much in a given time period. It only can push so many pixels/texels, only run so many shaders, etc. So any time you add more visual flair, it takes up available power. There's no hard limit, no amount where it stops working, rather you have to choose what kind of performance you want.
For example if I can render a scene with X polygons in 16ms then I can output that at 60fps. However it also means that I can render a scene of 2X polygons in about 33ms, or 30fps.
So FPS is one tradeoff you can make. You don't have to render at 60fps, you can go lower and indeed console games often do 30fps. That means each frame can have more in it, because the hardware has longer to generate it.
Another tradeoff is resolution. Particularly when you are talking texture related things, lowering the output resolution lowers the demand on the hardware and thus allows you to do more.
So it is a tradeoff in what you think looks best. Ya, you can design a game that runs at 1080p60 solid. However it may not look as good overall as a game that runs at 720p30 because that game, despite being lower FPS and rez, has more detail in the scenes. It is a choice you have to make with limited hardware.
On the PC, we often solve it by throwing more hardware at the problem, but you can't do that on a console.