How to convert Word Document files into plain-text files

In order to use the contents of a Word Document (“.doc” or “.docx” extension) in a concordancer it must be converted or saved as a plain text file (“.txt” extension). I will outline two different ways you can do this below.

Method 1 (recommended)

  1. open the document in Word,
  2. do a “select all” (ctrl+A),
  3. “copy” (ctrl+C),
  4. open Notepad (found in Start > All Programs > Accessories),
  5. “paste” (ctrl+V) the content into Notepad,
  6. save the file

Method 2

  1. open the document in Word,
  2. do a “Save as” in Word (goto File > Save as),
  3. select “Save as type” (see image) as “plain text”,
  4. click “Save”,
  5. when the dialogue box appears (for non-English OSs) check “allow character substitution” and then click “OK”,

This can be tedious however if you have many files to convert. There are freeware programs that can automate this task. But please be careful as some programs available may be malicious, that is, adware, malware or spyware.

<< back to the AntConc Tutorial Page

39 thoughts on “How to convert Word Document files into plain-text files

  1. Method 1 and method 2 results are not the same. Only method 1 gives true plain text (try to save table with method 2 to see the difference).

  2. wow!

    just kidding.

    Now, for real, how do you convert 50 word documents into .txt files at once?

    • You can try something like SpiceLogic. It is the only one I have tried and worked but that was a while back. Document format has also changed so it may not have kept pace.

      Otherwise do a “doc-to-txt” search on your favourite search engine to find the latest. Good luck.

  3. I worked as a proposal desktop publisher for years, through all the different versions of Word. Lots of version compatibility problems! It seems that, as soon as I learn one version and all the tricks about how to handle it, here comes another updated Word. Sigh.

  4. I would like to have some pseudo formatting in the textfile like empty lines between paragraphs, underlines as a second line with dashes and a character in front of lines of a list. Does anybody know a toll that does this? Example:

    This is an underlined Header
    ————————————-

    See this list:

    * entry 1
    * entry 2
    * entry 3

    • You are talking ‘regex’ or ‘regular expressions’. These characters pertaining to layout and formatting of texts. Do a search of these terms and you will find your answer. It can be done in Microsoft Word but it takes a bit of getting used to.

  5. So many thank you’s. This doesn’t help if you have received a .docx file but you don’t have Word on your computer. “Open Word” God must hate me. Everywhere I look on the internet gives that as the first step. […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s