Playing with the ix500

Bought a Fujitsu ix500 scanner last week. Wow! I don’t know how I had lived without this incredible machine for so long. The scanning is so quick – 30 pages double-sided in a minute. By default it saves as PDF. But one click and it is converted into a Word document. I scanned a novel…More


How words (collocates) relate to a particular word (keyword or node). In corpus, this usually means within a certain distance from the node. For example, ±5 words to either side of the node which are then collated and summed for quick comprehension. Words often come together with greater-than-chance regularity. This can either be within the…More

ICEWeb – the web as corpus

I hadn’t done web as corpus before. That is until now. People say the web as corpus linguistic data is unreliable. But then they said that too when the first corpora were made all those years ago. Undoubtedly how good the sample is is an important factor. One can say the same thing about any…More


Short for Key Word In Context. It is a way of looking at a search term (type) in a concordance program with the keyword centred so as to see the patterns created by the other words, its context. Below is an example of a concordance search of the term ‘violence’ in a corpus. The words…More


The unique form of the tokens (words) in a corpus. Often accompanied by frequency data. Meaning is treated as secondary. Corpus linguistic analysis does not directly reveal the various meanings of a word. This must be inferred from its usage. In corpus linguistics this usually done by concordancing, collocations, clusters, etc.More


The individual forms (words) of a corpus. The sum of the tokens is the size of the corpus. The term contrasts with type in order to distinguish how we are observing the form, whether as one instance in the corpus (token), or as combined instances relating to its frequency within a corpus (type).More

AntConc Tutorials page updated

I have finally updated the AntConc Tutorials page. It is now called AntConc Basics. It summarizes the mechanics of using AntConc without unnecessary detail on how to analyse a corpus (I leave that up to you). It also comes with a two page PDF version. The online version also comes with a quick-reference guide to…More

Short Book Review – Corpus Linguistics: Method, Theory and Practice

Corpus Linguistics: Method, Theory and Practice (Cambridge Textbooks in Linguistics) by Tony McEnery and Andrew Hardie, Cambridge University Press (2012). ISBN 9780521547369. 294 pages. As part of the Cambridge Textbook in Linguistics Series this book stays true to its title and doesn’t disappoint. Broken down into nine chapters on 1) a basic definition of the…More

Exposure to Language

“Reading is more important than writing.” — Roberto Bolaño Without exposure to a language one will never master it. That exposure can come in many forms but the best form is culture. Culture and language are essentially the same thing. There will be no language if there is no culture the opposite is also true.…More

A Simple Guide to Using AntConc now in French

A Simple Guide to Using AntConc is now available in French! A big thanks to Stefania Solofrizzo for doing the translation on her own volition and nice enough to send it to me to share with you. A Simple Guide to Using AntConc (English) Un Guide Simple Pour Utiliser AntConc (français) translation by Stefania Solofrizzo…More