ICEWeb – the web as corpus

I hadn’t done web as corpus before. That is until now.

People say the web as corpus linguistic data is unreliable. But then they said that too when the first corpora were made all those years ago. Undoubtedly how good the sample is is an important factor. One can say the same thing about any scientific experiment with a small sample size. Thus choice of sample as well as size is important.

All language is language. We can use literature as the yardstick or some other medium. So why not the web.

Martin Weisser was nice enough to inform me about his work in ICEWeb, a program for web corpus analysis. It is an easy to use interface with a simple help menu to explain the basics. How one chooses and analyses a web corpus is something else, something which I have yet to master.

I recommend that you try it if you are interested in studying language and the web.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s