I don't think that word "tokens" means what you think.
There are several ways to tokenize sentences in languages like English, that use spaces and/or punctuation to separate words. One is to tokenize based on whitespace. In that case, the sentence
John, Bill, and Mary went to the store.
has 8 tokens, which happens to be the same as the number of words. But the first token is "John," (with the comma, i.e. not a word as we usually think of words). Another way is to tokenize on whitespace and punctuation, in which case that sentence has 11 tokens (the extra tokens are the two commas and the final period). And another way is to tokenize on whitespace and then throw away punctuation tokens, in which case we're back to 8 tokens for that sentence, the first of which is "John" (no comma).
There are complications, like the apostrophe in "won't" or "John's", hyphens, and so forth. Other languages with alphabetic writing systems (or abjads, like Arabic and Hebrew) have other issues. And while Vietnamese uses an alphabetic writing system, it doesn't really break at word boundaries. But by and large, tokenization works ok, and gives you something resembling words.
There are languages with other writing systems where tokenization becomes more difficult, like Chinese (there are no delimiters between words in written Chinese, except where the characters come on either side of a sentence break, numeral, etc.).
Perhaps what you're thinking of is character N-grams, i.e. overlapping sequences of N characters. In that case, the sentence has 3-grams that include "#Bi", "Bil", "ill", "il#" etc. (I'm using the '#' to represent a white space character. Things get a little more complicated if you're doing PDFs, where the spaces between words are not really characters, but I digress.) If you're using N-grams, it would indeed be difficult (not impossible) to count the words in a text.