# java - How to convert some limited cases of *.tex files to plain text *.txt

So I've tried using tokenizers, but I can only figure out how to replace or remove single delimiters in java.

Like for this input:

\box { Boxed words }

{\boldface This line in bold. }

I want to be able to remove \box and some other guidelines I have to follow which are:

The rules that we are going to apply are very simple .

1. Remove all commands backslash followed one or more lowercase letters and terminated

with a blank.

2. Remove all braces: } or {.
3. Substitute all math display (characters in between \$), by the words FORMULA 1

, FORMULA 2 etc...

4. The environment ( a special command) .

\begin{enumerate}

\item First item, \fer and only this.

\item Second line \iterate and maybe more. \item Third.

...

\end{enumerate}

puts everything between backslash item in a new paragraph with a number. So the

above should look:

5. First item and only this.
6. Second line and maybe more.
7. Third.

The (IMO) sensible way to is to use a stand-alone TeX to text (or TeX to HTML) converter. That should:

• Save you a lot of work in implementing your own converter.
• Do a better job ... assuming you pick a decent converter.
• Insulate you from having to deal with a stream of special cases where your heuristic / pattern-based approach fails.