Corsis corpus tool

Corsis (formerly Tenka Text) is a now an open source project at Sourceforge. It is a freeware designed to look and feel like the commercial software and corpus tool standard, Wordsmith Tools. For more details and download see the Corsis main page.

Wildcards

This tutorial will look at ways to make more specific search terms by using wildcards in Antconc.

So you have found an interesting word to focus on… a verb perhaps. The problem is verbs have more than one form. The verb “play”, for example, also has forms such as “plays” (the thirds person singular), “playing” (the continuous tense), and “played” (the past tense and past participle). But how do you search for it in a single search?

The asterisk (*) wildcard
If I now enter play with an asterisk immediately after it as in play* into the search term box it will search for any word which begins with play, including plays, playing and played. However another problem occurs. By using the wildcard also finds, player, playboy, playful, etc if they are in the corpus. This problem can be solved by deleting the lines (please read the “readme.txt” file to learn how to do this).

It is also possible to put the asterisk in a different location in the word. For example, if you want to search for verbs ending with ~ing then you can use the wildcard like this: *ing. This will bring up all words that end in ~ing.

The vertical bar (|) wildcard
What if the example verb you are looking at is irregular (example: eat, eats, eating, ate, and eaten)? It can be solved with this wildcard. If you entered “eat*|ate|eaten” then you will get all of these plus “eatable” if it is there in the corpus.

Please read the “readme.txt” to learn about other wildcards available in the program.

Congratulations! You now know how to make more sophisticated searches.

<< back to the Tutorials page

Concordancing

This tutorial will look at some of the ways to enhance your concordance searches and views for Antconc.

So you have made concordance lines. But you cannot see any “pattern” in the lines. What you need to do is sort them.

Sort
sort.jpg
Under the Search Term box are some controls labeled “KWIC sort”. KWIC stands for “Key Word In Context”. The way the search word is centred and shown in a concordance is known as KWIC. It allows you to find word patterns more easily. Returning now to the KWIC sort feature, to the right of the checked first digit box is an up and down arrow. Click the up arrow once. The “0” in the digit box should now read “1R”. This means “one word to the right”. Click the “Sort” button. The words on the right of the KWIC word should now be red (default setting) and sorted in alphabetical order. In general, sorting more than three or four to the left or right of the search term is unnecessary since the further away the word is the less it is connected to the word in question.

Another feature in Antconc is that it allows to up to three levels of sorting. Play with this feature and see what kind of patterns emerge.

Concordance Plot
Concordance lines do not show us where they are in the corpus. In order to see this we need to look at the Concordance plot. Click on the “Concordance Plot” tab. If you have multiple files open you should see several bar code-like graphics. The lines in the bars represent where the occurrence of the search term is in each particular file. You can click on the lines in the bar to see the word in the original file (or File View).

Congratulations! You now can sort your concordance lines and see where they are in your files.

<< back to the Tutorials page

Opening files; word list; concordance

This tutorial will teach you the basics of using the Antconc concordancer.

Run the Antconc program.

In order to analyse a corpus we need to 1) open a file or files, 2) make a word list of it, and 3) make concordance lines. Strictly speaking making a word list is not a necessary step in Antconc. But since more than likely we do not know what we are looking for, it is the norm to do so.

Opening a file or files
To open a file, open the “File” menu and click “Open File(s)”.

openfile

A dialogue box will appear. Choose a text file (with extension .txt). To choose more than one file select the desired files while holding down the “Ctrl” key, then click “Open”. To open all the files in one folder, open the “File” menu and click “Open Dir”. All files within it, including subfolders will be opened.

Making a word list
Once your files have been selected click on the “Word List” tab. Check the “Treat all data as lowercase” box (this is optional), then click the “Start” button.

wordlist1

Making concordance lines
Click on any word in the “Word” column, or type in a search term (if you know what you are looking for) in the search box, to make concordance lines.

Congratulations! You have just made your first concordance lines in Antconc.

<< back to the Tutorials page

How to convert Word Document files into plain-text files

In order to use the contents of a Word Document (“.doc” or “.docx” extension) in a concordancer it must be converted or saved as a plain text file (“.txt” extension). I will outline two different ways you can do this below.

Method 1 (recommended)

  1. open the document in Word,
  2. do a “select all” (ctrl+A),
  3. “copy” (ctrl+C),
  4. open Notepad (found in Start > All Programs > Accessories),
  5. “paste” (ctrl+V) the content into Notepad,
  6. save the file

Method 2

  1. open the document in Word,
  2. do a “Save as” in Word (goto File > Save as),
  3. select “Save as type” (see image) as “plain text”,
  4. click “Save”,
  5. when the dialogue box appears (for non-English OSs) check “allow character substitution” and then click “OK”,

This can be tedious however if you have many files to convert. There are freeware programs that can automate this task. But please be careful as some programs available may be malicious, that is, adware, malware or spyware.

<< back to the AntConc Tutorial Page