AntWordProfiler short review

I have been playing with Laurence Anthony’s AntWordProfiler for a bit now. It is a corpus linguistic tool to “profile” coverage of texts in terms of comprehensibility particularly for reading in a second language. To understand how it works one must understand its predecessor Range by Paul Nation.

Paul Nation is a researcher in Vocabulary Acquisition a subfield in Applied Linguistics. His interest was mainly how much coverage of a text is needed before vocabulary can be acquired from reading without the aid of dictionaries and from textual context alone. To this end he created a the Range program. The Range program has two main functions: 1) to show the distribution of words across mutliple files or texts, and 2) to show how much of the text is covered by carefully designed wordlists based on frequency or knowledge. It does this by number crunching and showing this through statistics.

AntWordProfiler essentially does the same thing but is an upgrade in terms of functionality. Instead of just statistics we can now look at coverage in the text itself. And with a little tweaking of the word definition you can make AntWordProfiler mirror the Range program’s  (and AntConc’s) results.

Again, Mr Anthony has come up with a slick and easy-to-use product. The controls are less intuitive than AntConc (but more so than the Range program) but still it does not take much to figure out the functions.

The selling points, for me, are:

  1. the ability to creating identical results with other products thus making research results compatible and comparable;
  2. transferability of its results in plain-text to other platforms;
  3. speed (not as fast as AntConc (especially the old versions)) but still fast, and;
  4. the ability to process large volumes of text (Range crashes at about 250,000 tokens)

This is the tool to use if you need to profile texts or look at type occurrence over multiple files.

[Update] Mr Anthony has kindly pointed out two omissions to me – that AntWordProfiler is free (yes, free!) to use, and that it is available on different platforms (Windows, Mac OS X and Linux) which Range is not.

type-token ratio

The type-token ratio (or TTR) is used to compare two corpora in terms of lexical complexity. The formula is the number of types divided by the number of tokens. The closer to 1 the greater the complexity. The closer to 0 the greater the repetition of words. There is not a specific ratio which can be said to be ideal as such but that one corpus can be said comparably more or less complex to another only when they are of similar size. Therefore approximate size equivalence is an important criterion in using TTR.