I hadn’t done web as corpus before. That is until now.
People say the web as corpus linguistic data is unreliable. But then they said that too when the first corpora were made all those years ago. Undoubtedly how good the sample is is an important factor. One can say the same thing about any scientific experiment with a small sample size. Thus choice of sample as well as size is important.
All language is language. We can use literature as the yardstick or some other medium. So why not the web.
Martin Weisser was nice enough to inform me about his work in ICEWeb, a program for web corpus analysis. It is an easy to use interface with a simple help menu to explain the basics. How one chooses and analyses a web corpus is something else, something which I have yet to master.
I recommend that you try it if you are interested in studying language and the web.