I also haven't looked at the study in detail, but I agree that there are certainly a lot of people who don't fit the generalizations one might conclude from the article and summary (which are both perfect examples of scientific reporting - i.e. they are shit). For example, I (over)use emoticons and exclamation points myself, depending on the tone I'm trying to convey, and how important I feel it is to get the tone right. Maybe it is a "feminine" trait to care about tone and trying to express it clearly, but obviously that doesn't require someone with that trait to be female. Don't conflate statistical averages (i.e. more women exhibit feminine traits than men, hence their designation as feminine) with absolute accuracy in judging an individual (i.e. because a person exhibits feminine traits she must be a woman). The former is useful when trying to target ads to millions of people and hoping to improve your click-through rate, but the latter is socially dangerous.
You do raise some interesting points about whether their model would be stable over time - for example, perhaps women are just early adopters w.r.t. determining how to express tone in tweets via emoticons, and stupid men will eventually figure it out and use it as well, destroying the predictive capability of that particular feature of their model. On the other hand, it would seem they determined their features automatically via data mining of gender-tagged data, so presumably they can continue to feed new gender-tagged data to their system to evolve the model to handle shifts in writing style over time, e.g. lowering the weight of the emoticon feature as men use it more and discovering a new feature to replace it.
Simply identifying all tweets from a sample as 'male' would yield a higher success rate.
Great contribution, but how do you propose you might improve performance beyond 72.8%?
See Figure 9, "Performance increases with more tweets from target use." Guess they thought about this for more than 5 seconds.