From an optimal information-theory viewpoint:

1000 bits, 168 of which are true (the 168 primes below 1000) and 832 are false.

Given this model of the data, the optimal encoding of the 1's should use 2.573 bits each, while the optimal encoding of the 0's should use 0.265 bits each.:

ln(1000/168) / ln(2) = 2.573... bits

ln(1000/832) / ln(2) = 0.265... bits

The space required for the entire sequence:

(168 * ln(1000/168) + 832 * ln(1000/832)) / ln(2) = 653.109... bits

The model would have to be significantly better than this. Improvements to the model must produce a better probability estimate than the raw statistics. The most obvious single improvement to the model is to treat the even numbers as special, where a simple rule can give you 100% model accuracy and thus use 0 bits to encode all the even positions.

To get the entire sequence down to *only 10 bits* implies a very large computational overhead, because even with the prime probability all the way down to 1/1000 the list would still be:

(1 * ln(1000/1) + 999 * ln(1000/999)) / ln(2) = 11.407... bits.

So i'm thinking the O(N^1/3) doesnt apply to small values of n, but instead only to enormous values of n.