How to convert Word Document files into plain-text files

In order to use the contents of a Word Document (“.doc” or “.docx” extension) in a concordancer it must be converted or saved as a plain text file (“.txt” extension). I will outline two different ways you can do this below.

Method 1 (recommended)

  1. open the document in Word,
  2. do a “select all” (ctrl+A),
  3. “copy” (ctrl+C),
  4. open Notepad (found in Start > All Programs > Accessories),
  5. “paste” (ctrl+V) the content into Notepad,
  6. save the file

Method 2

  1. open the document in Word,
  2. do a “Save as” in Word (goto File > Save as),
  3. select “Save as type” (see image) as “plain text”,
  4. click “Save”,
  5. when the dialogue box appears (for non-English OSs) check “allow character substitution” and then click “OK”,

This can be tedious however if you have many files to convert. There are freeware programs that can automate this task. But please be careful as some programs available may be malicious, that is, adware, malware or spyware.

<< back to the AntConc Tutorial Page

39 thoughts on “How to convert Word Document files into plain-text files

  1. To preserve international characters (with accents, etc.), save as Unicode, not as ASCII (“Text”). ASCII is the original coding dating back to the dawn of the computer age; it uses one byte per character (7 of the 8 bits), for a maximum alphabet of 128 possible characters. (Mac and Windows each started using the other 128 for differing sets of special characters, which is why “curly quotes” on one machine will come up as odd characters on the other.)

    Unicode uses two bytes per character, allowing for 65,000 characters in a typeface, ample for all the alphabetic languages in the world. (That’s a lot of characters for a typeface designer!)

    Like

    • David,
      Thanks for this.

      When I wrote this I had been talking about English corpus linguistics. Much of the problems people had had to do with unwanted Japanese unicode encoded punctuation (apostrophes were a big problem), which is why I showed this method. So I guess I need to write another post to clarify this point.

      Like

    • If you need access to Word or Excel files you can use the suite of software from openoffice.org which will first allow you access them then save them as a .txt or .csv file then you can read them from a text editor like Notepad.

      Hope that helps.

      Like

  2. Thank you so so so much!!!!!!!!!!!!!!!!!!!mwah.. it really helped me.. it’s funny but it took me 2 days on how to figure out this !! hehe ;D

    Like

    • Sorry I don’t know anything about C. But there is a program which I have been using recently called Zilla Word to Text which can convert multiple files simply. The output is usable but some characters don’t convert right for my purposes.

      Hope that helps.

      Like

  3. It’s really a good tip for me because when i copy something from word file in wordpress it really mess.

    Thanks for sharing it.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s