Now, here comes the most difficult bit. At this point, you’ll have fixed the layout and typography and you’ll have fixed the structure of your document in your word processor (or decided you’ll just edit the code).
Here are some options.
Option 1
cKEditor is an open source online HTML editor: I’ve installed a copy here.
If your book’s fairly simple (like a novel) and you don’t foresee having to make many more changes, then I’d give this a go. I’m reluctant to recommend it for intensive editing, because I don’t know how stable it is for long sessions and I haven’t tested whether it can cope with very big documents. But for pasting in your text and extracting the code, it works very well.
So, copy all the your text from your word processor and press the ‘paste from Word’ button in cKEditor. Paste your text into the dialog box and press OK. Press the Source button to display the code and copy all the text.
Paste the text into Notepad ++, then select TextFX > TextFX HTML Tidy > Tidy, which will add some standards-compliant code and put your copy between the <body></body> tags.
When you save the document as HTML, the tags will become colour-coded, making them easier to read.
The resulting file will have the icon of your default browser; double-click on it to open it in the browser and see if everything looks okay.
An example
Here’s an MS Word 2003 file containing chapters one and two of Jane Austen’s Emma. It’s been formatted to the guidelines I laid out here – if you click on the headings, they’re all in the Heading 1 style and the body text is in the Paragraph style.
If you follow the instructions in this section using this file, it will produce perfect HTML. The copy is still fairly readable even as markup, although you’ll notice several punctuation characters have been converted into codes.
I’ve also copied text over from Open Office/Libre Office using ‘Paste from Word’ and had good results. Pasting straight from Google Docs was less successful; however, Google Docs can export files to Open Office (.odt), so you could still do it that way if you prefer to use Google Docs for composition. I have yet to try this out using something like iWork on the Mac – if anyone would like to try, and write about it in the comments, I’d be very grateful.
Option 2
The second option is to use an HTML editor that can show code and layout at the same time (or just hide the code entirely, letting you work on the visuals). A good, free open-source editor is Amaya (although I’m sure others are available).
Unlike the first option, there’s no specific ‘paste from Word’ function. Pasting from Word works reasonably well, but you lose all the formatting – any defined headings and italics will be lost. So it might be wise to use the first option.
Once you’ve done this, however, you may want to open your HTML file in Amaya to further work on your text. This would be useful if you’ve converted your document, but then find you need to make changes.
Option 3
In MS Word, even if you use Styles & Formatting very strictly, the Save as HTML feature still outputs horrible HTML. Open Office/Libre Office, on the other hand, outputs fairly straightforward HTML if you use strict formatting, giving results comparable to option 1. It still needs some work, but it’s essentially a task of getting rid of some long HTML tags with Find & Replace and replacing them with simpler ones. And replacing the styles between the <style></style> tags.
For instance, in one of my attempts, in place of what should be <p> tags were:
<P CLASS="frame-contents-western">
Which can be easily fixed using Find and Replace. Remember that Open Office/Libre Office is able to open Word (.doc) files.
Leave a Reply