Doc to HTML Converter

ModelText's Doc to HTML converter is a utility which converts Microsoft Word documents to clean XHTML.

The conversion discards all style and font information, leaving only clean XHTML. This can help you to republish a Word document as a web page, if you add your own CSS.

Output

The output from the converter is in XHTML format. The following elements of the document are preserved:

The following is an example of the output from the converter, from a simple document:

Input

The converter works with the Word 2003 XML Document format. Documents that already have this format can be converted without Microsoft Word being present. The converter knows the Word 2003 XML schema (which is called "wordml"), opens the file using a generic XML reader, extracts elements and text, and saves them as XHTML.

The converter can also work with any of the other Word document formats. It will do this by using Microsoft Word 2007, to open the document and to save it again as a temporary file in the Word 2003 XML Document format. To work with these document formats therefore requires having Microsoft Office 2007 installed on the machine (which is not required if the documents are already in the Word 2003 XML Document format).

Command Line

The utility is a console application, named tidywordcmd.exe. It requires two parameters:

  1. The first parameter is the filename of the input file to be converted
  2. The second parameter is the filename of the output file.

The following is an example of the command-line, to convert an input file named test.docx to an output file named text.html:

tidywordcmd.exe test.docx test.html

License

License to use the ModelText Doc to HTML Converter (version 1.1) is as follows.

Copyright 2008-2012, Christopher Wells <info@modeltext.com> ("Licensor")

Permission to use without fee

Permission to use, copy, and/or distribute this software for any purpose with or without fee is hereby granted to you, provided that you accept all the terms of this license.

Transferable

You may copy and distribute this software to other parties ("third parties"), provided that the above copyright notice and this permission notice appear in all copies, and that third parties are bound by the terms of this license.

Closed source, no modification

This is closed source, proprietary software. The software's source code (except for some sample code) has not been released. Although permission is hereby granted to write software which uses this software component, and to use this software as a component within other software, permission is not granted to modify this software component, nor to use nor to distribute modified copies.

No warranties

THE SOFTWARE IS PROVIDED "AS IS" AND THE LICENSOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE LICENSOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

Download

Click here to download the most recent version of the ModelText Doc to HTML Converter.

Installation

After you download the zip file which contains the release, you can simply run the tidywordcmd.exe executable using the command line parameters specified above.

The tidywordcmd.exe executable depends on version 2.0 (or greater) of the .NET framework, which must be installed before you run the utility (it is probably installed on your machine already).

If you want to convert Word documents that are not already in the Word 2003 XML Document format, then you should also:

Contact Us

Please post suggestions, and any bug reports and support issues, to the ModelText discussion group.

You can also contact the author by sending email to info@modeltext.com.

.NET Components

About Us

News Summary

August 2012
New Product Roadmap.
July 2010
First release of the ModelText Doc to HTML Converter.