Here is my paper on developing a list of function words for use in corpus linguistics. It consists of 387 word types and covers 43.3% of 43.4% (or 99.9% coverage and differentiation) of all function and content words (tokens).
I am working on a slimmer list of 196 words at the moment which will hopefully cover all the same words as well. The difference is in the way contents words were defined. This time I am attempting to make a stricter definition but still cover 99.9 percent of all tokens.