Oh, it definitely helps. The main problem is the fresnel... fresnels have artifacts when you look through them off-axis (ie, not directly through the optical center). The more you shift the image (while leaving the lens unchanged), the more off-axis your view of the image is.
A better design (within the constraints of "one screen with two rectangles" and "fresnel lens") would allow the left & right images to independently shift on the screen, combined with 3D-printable carriers that can be customized to shift the lens itself up, down, left, right, and adjust its tilt & wrap (relative to the eye & screen) so each eye is mainly looking through the fresnel lens along its least-compromised optical path.
With Google Cardboard + on-screen image shift (via custom-config barcode I made) to converge the images, I saw fresnel halos & the image was a little "fuzzy", but it at least converged and wasn't *painful* to use. I used it with +0.25 lenses (putting my distance far-plane just slightly beyond the apparent distance of the screen) and no prism correction (at the time, I didn't know I needed it... and shifting the images on-screen made it unnecessary anyway).
With my Quest2 and prism-free +0.50 lenses, I can only fuse the images if I tilt the headset *just* right... into a position it slips from within moments. It "sort of" works if I tilt my head, but then the rendering becomes fuzzy & my neck hurts from both the tilt & added headset weight.
With Quest2 and +0.50 lenses with prism correction, there's a tiny keyhole of sharp, fused clarity when the headset is positioned *exactly* right... but really, it's almost never *in* that position, and everything looks awful.
The key to accommodating vertical heterophoria with a headset is, center both images & the optical center of the lenses on the eye looking at them. In the real world, glasses require prism to shift the image relative to the eye looking at it. In a virtual world, you can shift the image *itself* to match the position of the eye looking at it.
Cardboard ironically "got it (more) right" by assuming the images on the single screen required shift adjustment along both the x and y axes. Quest fucked it up in the name of "simplification" by taking away the option to shift both eyes' images along X and Y axes.
The thing about vertical heterophoria is, human eye muscles simply *can't* independently shift one eye up & one eye down. The nerve bundle just isn't wired/coded to allow it. So even a *small* amount of vertical heterophoria multiplies & compounds any other issues that might be going on. By the same token, accommodating it makes any other issues a lot easier to compensate for.
Think of it this way: your brain has finite abilities to fuse misaligned images. Fixing one "at the source" (by shifting left/right images vertically) gives you more spare processing power to compensate for others (like lateral convergence).
Binocular fusion works until you exceed the misconvergence limit, then falls apart. The first sign it's at the breaking point is when the image you're seeing seems to "vibrate". At that point, your brain is randomly ignoring one eye or the other, and the vibration is caused by the constant left/right perspective shift.
Push the misconvergence a little more, and individual elements (like words on a page) look like they're drifting around. What actually happens is, one eye locks onto the "scene", while the other eye locks onto a detail while drifting around. Your brain edits them together into a seemingly normal scene where the element that has your attention is drifting around.
Finally, your brain just mostly ignores one eye... but uses its freed-up capacity normally used for binocular fusion to instead create fake 3D you won't even notice isn't "real" until you stub your toe on something "in plain sight". Except, it wasn't... it was on the other side of your nose, and the eye that "should" have seen it was being ignored until the moment the pain registered. Your brain & visual system trick you like that.