A glossary of corpus types

There are many types of corpus depending on their use. Below is a list some of the main types.


diachronic – a corpus which looks at changes across a timeframe.

learner – a corpus of L2 learner writing of speech.

monitor – a type of diachronic corpus which may continue to grow with new texts added over time.

monolingual – includes only one language.

multilingual – a corpus with two or more languages.

parallel – a corpus with both a target language (L2) and first language (L1).

reference – a corpus to which other corpora are used to compare with, usually through statistical data analysis.

synchronic – a corpus that has been constructed at a certain time (like a snapshot) to represent a language.

raw – a corpus with no annotation.

tagged – a corpus with annotation (for example, Parts-Of-Speech tags).

target – a corpus that is compared to a reference corpus.

Published by

signature103

Language teacher and researcher. Object Philosopher. Buddhist.

4 thoughts on “A glossary of corpus types”

  1. May I suggest you another corpus type? Developmental, corpus of texts produced by speakers/writers in the process of acquiring and/or developing their first language, like Lucy, Solar or Doeste (https://doeste.ufersa.edu.br/).

    I congratulate you for spreading linguistics!

    Liked by 1 person

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s