Obviously you realize there are differences in how people send CW. While I applaud your drive to make a smarter decoder - the reality is that you need to make sure it works on live traffic. So in that respect, you should hook it into some kind of SDR software like HRD or even make your own that can decode multiple streams of CW. If you don't have a radio, I suggest maybe a SoftRock receiver?
1. It gives you actual live conversations with all the mistakes and alterations. Not everyone uses computer generated CW. In fact, most brass pounders dislike it because it's boring to listen to and dry to copy.
2. There are sanctioned CW events all the time... QSO parties, commemorative stations and even at the beginning of next month there are straight key nights where people put the paddles away and break out the straight key.
3. I'm going to assume the end goal is to put this listening to live feeds anyways. You should work toward that goal now as then you can write code to compensate for QRM and QRN/fading.
Having people feed you 'tapes' won't accomplish your goal. You need to have it work with the real source.