4 thoughts on “What is meta-text information … and how to get rid of it?

  1. signature103 Post author

    Cutting and pasting is very tedious. There is a freeware program called Zilla Word to Text Converter I use but the encoding is in Indian and so the quotation marks and apostrophie are not read correctly in the concordancer I use. This problem has annoyed me for some time now. I cannot get it to correct in perl programs I have written either.

    Perhaps you can help me understand what the problem is (see you are from Pakistan and may be able to help).


  2. Muhammad Shakir Aziz

    I understood your point about meta text. As far as I know, OOo Writer used a similar scheme to show bold, headings etc as it is used in html. I mean angle brackets. I got this while working on odt and doc files in OmegaT.
    And idea to eliminate these tags using notepad is fairly simple. I myself do the same while building a corpus. Copy texts from doc files, paste in notepad, save, go to next file….


  3. signature103 Post author

    Hello Muhammad,
    Thank you for your comment. I am fully aware of what you are trying to say. Please the link to understand the menaing of the term meta-text in this context. The problem many of my readers are facing is how to get rid of information hidden in Word documents, not meta-tags (HTML, XML, etc).


  4. Muhammad Shakir Aziz

    Corpus files should be in plain text, rule no. 1.
    Wanna remove meta text i.e. text within angle brackets: use regular expressions. (rule no. 2)
    I’ve learn these two rules in last few years while working with corpora.


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s