a glossary of corpus types

There are many types of corpus depending on their use, and they may be of one or more type. Below is a list some of the main types.


diachronic – a corpus which looks at changes across a timeframe.

learner – a corpus of L2 learner writing of speech.

monitor – a type of diachronic corpus which may continue to grow with new texts added over time.

monolingual – includes only one language.

multilingual – a corpus with two or more languages.

parallel – a corpus with both a target language (L2) and first language (L1).

reference – a corpus to which other corpora are used to compare with, usually through statistical data analysis.

synchronic – a corpus that has been constructed at a certain time (like a snapshot) to represent a language.

raw – a corpus with no annotation.

tagged – a corpus with annotation (for example, Parts-Of-Speech tags).

target – a corpus that is compared to a reference corpus.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s