How to convert Word Document files into plain-text files

In order to use the contents of a Word Document (“.doc” or “.docx” extension) in a concordancer it must be converted or saved as a plain text file (“.txt” extension). I will outline two different ways you can do this below.

Method 1 (recommended)

  1. open the document in Word,
  2. do a “select all” (ctrl+A),
  3. “copy” (ctrl+C),
  4. open Notepad (found in Start > All Programs > Accessories),
  5. “paste” (ctrl+V) the content into Notepad,
  6. save the file

Method 2

  1. open the document in Word,
  2. do a “Save as” in Word (goto File > Save as),
  3. select “Save as type” (see image) as “plain text”,
  4. click “Save”,
  5. when the dialogue box appears (for non-English OSs) check “allow character substitution” and then click “OK”,

This can be tedious however if you have many files to convert. There are freeware programs that can automate this task. But please be careful as some programs available may be malicious, that is, adware, malware or spyware.

<< back to the AntConc Tutorial Page

39 thoughts on “How to convert Word Document files into plain-text files

  1. It is so simple yet I couldn’t figure out how to do it. Let alone not wanting to pay for some cheesy program to do it for me. Thanks

  2. Thank you so much! I had spent hours trying to figure out how to convert document to plain text. Method 2 worked. I did not have the option in the first method. Again – Thanks!

  3. a thank you also. had to put a word doc into a folder for someone and it had to be in .txt format. didn’t know how. now I do. thanks, again.

  4. Thank you so much. I have been trying to figure this out for months and your explanation made it as easy as possible. Again my thanks!

  5. I thank you so much for this info; I can’t convey to you how frustrating it is to see a job you are fully qualified for and cannot send a “readable” resume to a prospectful employer.
    Your quick and easy tutorial really helped me out!

    Steve

  6. hey this is so simple.thank you very very much.i wasted so much of my time without knowing this.once again thank q.

  7. I am glad this page is of help. I never expected it to be. But this page is the most popular page on Corpora by a long way.

    To be honest, I only use Method Two

    The reason I recommended Method One is because I thought most people who use Word would find this easier and more understandable as it involves fewer software and less external tools and know-how (how many people can’t fathom CTRL-a, -c and -v).

    I was wrong.

    The truth is Method Two is more accurate and, in my opinion, better.

    I am sure you too have also found Method Two more accurate. This is because Notepad cannot and does not handle meta-information which is the root of the problem.

    It goes to show I should have stayed with my initial judgement.

    I have changed my recommendation as I now understand people have also found (as I had) the Word-method error-ridden and annoying … and really not better at all.

  8. Both the methods are not working for me because I have a very large file more than 6600 pages of word document (moreover unicode text). The problem is if I use Method 1 the MS word just hangs up. Method 2 also does not work, when I try to paste in text file, it pastes nothing. I think my data is too big for clipboard. I can copy paste the text by parts but it is taking just too long. Do you have any idea to handle the issue?

    • Nayyara,
      Pasting into Notepad will work … if you wait long enough. From personal experience I have waited one night for a file to paste. Your computer seemed to have hanged but it hasn’t. It is working hard to paste everything. To see that it is working open up Task Manager (‘Ctrl’, ‘Alt’ and ‘Delete’ button pressed simultaneously ONCE ONLY. Vista requires you to click the link to Task Manager). Click the ‘CPU’ column to bring up all the working processes. You should see Notepad and/or Word working hard. This is a good sign which means your wait will not be in vain.

      Pasting to Notepad will give you a cleaner result than saving in Word. Also pasting in Vista is much faster than XP.

      Good luck.

  9. It’s really a good tip for me because when i copy something from word file in wordpress it really mess.

    Thanks for sharing it.

    • Sorry I don’t know anything about C. But there is a program which I have been using recently called Zilla Word to Text which can convert multiple files simply. The output is usable but some characters don’t convert right for my purposes.

      Hope that helps.

  10. Thanks so much! With the Newgrounds redesign, all of my stories were looking ridiculous, because it kept changing my special characters to jumbled up nonsense!

  11. Thank you so so so much!!!!!!!!!!!!!!!!!!!mwah.. it really helped me.. it’s funny but it took me 2 days on how to figure out this !! hehe ;D

    • If you need access to Word or Excel files you can use the suite of software from openoffice.org which will first allow you access them then save them as a .txt or .csv file then you can read them from a text editor like Notepad.

      Hope that helps.

  12. To preserve international characters (with accents, etc.), save as Unicode, not as ASCII (“Text”). ASCII is the original coding dating back to the dawn of the computer age; it uses one byte per character (7 of the 8 bits), for a maximum alphabet of 128 possible characters. (Mac and Windows each started using the other 128 for differing sets of special characters, which is why “curly quotes” on one machine will come up as odd characters on the other.)

    Unicode uses two bytes per character, allowing for 65,000 characters in a typeface, ample for all the alphabetic languages in the world. (That’s a lot of characters for a typeface designer!)

    • David,
      Thanks for this.

      When I wrote this I had been talking about English corpus linguistics. Much of the problems people had had to do with unwanted Japanese unicode encoded punctuation (apostrophes were a big problem), which is why I showed this method. So I guess I need to write another post to clarify this point.

  13. Method 1 and method 2 results are not the same. Only method 1 gives true plain text (try to save table with method 2 to see the difference).

  14. wow!

    just kidding.

    Now, for real, how do you convert 50 word documents into .txt files at once?

    • You can try something like SpiceLogic. It is the only one I have tried and worked but that was a while back. Document format has also changed so it may not have kept pace.

      Otherwise do a “doc-to-txt” search on your favourite search engine to find the latest. Good luck.

  15. I worked as a proposal desktop publisher for years, through all the different versions of Word. Lots of version compatibility problems! It seems that, as soon as I learn one version and all the tricks about how to handle it, here comes another updated Word. Sigh.

  16. I would like to have some pseudo formatting in the textfile like empty lines between paragraphs, underlines as a second line with dashes and a character in front of lines of a list. Does anybody know a toll that does this? Example:

    This is an underlined Header
    ————————————-

    See this list:

    * entry 1
    * entry 2
    * entry 3

  17. So many thank you’s. This doesn’t help if you have received a .docx file but you don’t have Word on your computer. “Open Word” God must hate me. Everywhere I look on the internet gives that as the first step. [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s