Different device clocks drift at different rates, mostly due to heat. Oven-controlled crystal oscillators are employed in more expensive devices to provide more reliable clocking.
Sources that are simply synchronized at the start will typically quickly drift and lose sync. Unfortuantely the drift is often not linear. So if you sync them at the start, and then stretch or shrink one source via re-sampling, they may sync at the start and end, but not in the middle.
A good solution for the OP's problem would compare the audio streams, identifying common reference points. The sources would then be adaptively corrected, matching those reference points. I've always thought that would be a very fun project, and useful to many people, but haven't been able to align my clocks to make it happen.