eBook Conversion

eBook conversion for Kindle and ePub readers

Styles & Formatting and document conversion

I can’t repeat how much of a timesaver – and good practice – it is to use styles in your word processor document. Not only does it give consistent formatting throughout but it’s also a real time-saver and it makes conversion a whole lot easier.

In both MS Word and Open Office/Libre Office, you do this via the Styles and Formatting pane (go to Format > Styles and Formatting). If you are unfamiliar with this function, go to this site for an explanation.

For a lot of novels, you probably only need three styles – one for the chapter headings, one for the body text (preferably with an indent) and one for the first paragraph of a chapter (without an indent). If you use spaced paragraphs with no indents, then you might only need two.

Even on more technical publications I’ve worked on, I’ve often never needed more than eight or nine – mainly more heading levels and styles for numbered lists, bulleted lists and captions.

A typical layout

basicstyles.doc

This Word 2003 file will open without problems in Word and Open/Libre Office. Once open, display the Styles and Formatting dialog – in Word, select ‘Formatting in Use’ from the ‘Show’ drop-down box in the Styles and Formatting pane; in Open/Libre Office, select ‘Applied Styles’ from the drop-down box at the bottom of the Styles and Formatting box.

This sample document uses the following styles:

  • Normal – normal (indented) paragraph text
  • First – same as normal, but without an indent, for the first paragraph after a heading, indent or list
  • Heading 1 – chapter heading
  • Heading 2 – second-level heading
  • Heading 3 – third-level heading
  • Bulleted
  • Numbered
  • Indent – an indented paragraph for quotes

To be honest, that’s probably all the styles you’ll need, unless you need one for photo captions or another level of headings. Remember, the Kindle isn’t capable of displaying much more (see CSS paragraph styles).

Also, press the Show Hide ¶ button and you’ll notice there is no manual formatting at all – there are no page breaks, tab stops or double paragraph marks and all the spaces are single ones (see this page for common formatting problems).

It may be easier to paste your document into this one and apply these styles, rather than to do it yourself from scratch. However, you need to be careful, because simply pasting text, selecting Select All and applying the style can make your bold and italic formatting disappear. Try the following:

  • Delete the dummy text. Go to Edit > Select all and click on the style that will be most abundant in the document (i.e. ‘normal’). If another style appears in the list, delete it by hovering over the style, going to the drop-down list and selecting delete
  • Go to Edit > Paste Special and select ‘formatted text’ from the list. Press OK
  • You’ll then need to go through the document and apply styles for headings and first paragraphs. Rather than selecting the whole paragraph and applying the style, put the cursor within the paragraph and apply it; this should keep any bold or italics in the copy

Converting word processor files to HTML

cKeditor

If the formatting in your book is simple, then I’d recommend cKeditor, as I outlined in this post. It produces very clean, easy-to-understand HTML that doesn’t need tidying up. It has a couple of downsides:

  1. Custom paragraph styles – such as indented quotes and first paragraphs – are not exported; they have to be restored manually in the HTML code. Therefore if your book is highly formatted, it’s probably not the best choice
  2. It doesn’t save internal links (although it does honour footnotes)

In this case, you’re better off doing the conversion in Open/Libre Office.

Word processors

Although doing your composition in MS Word is fine, its HTML export – at least on Word 2003, which many people (including me) are still using – is awful.

Fortunately, the HTML export from Open/Libre Office is fairly good, if you’ve been strict with applying the styles. So, even if you don’t want to write using this program, you can use it to open your Word file when you’ve finished it and do the HTML export. Here’s a step-by step guide on the basis of this sample file:

  1. In Word, check for unwanted styles by selecting ‘available formatting’ in the Styles and Formatting pane. To get rid of any, select the drop-down list of the style and select ‘Select All’ in the list. Then click on the style you want and the unwanted one should disappear
  2. In Word, ensure all the text has the same language – select all the text and go to Tools > Language > Select Language and pick the preferred one from the list. Sometimes if you don’t do that, the converter will apply language tags to every paragraph
  3. Save and close your document, then open it in Open/Libre Office. If the program prompts you to save in other format, select ‘keep current format’; it doesn’t make much difference to the conversion process
  4. In Open/Libre Office, text with the style ‘normal’ in Word needs to be ‘text body’. You can do this using Find and Replace – open the dialog box, press the More Styles button and select Styles. Replace ‘normal’ with ‘text body’
  5. To save the document, go to File > Save As > HTML. Close the document

Tidying things up

At the top of the page, delete every line starting with <META … > h1{ text-align:center; page-break-before: always; text-decoration: underline; text-decoration: bold; margin-bottom: 2em;} h2{ text-align:left; text-decoration: bold; margin-top: 1em; margin-bottom:0em;} h3{ text-align:left; font-style: italic; margin-top: 1em; margin-bottom:0em;} p.first {text-indent: 0;} p {text-indent: 1em; margin-top: 0; margin-bottom: 0;} p.indented {text-indent:0; margin-left: 2em; margin-right: 0em;}

You now need to do some Find and Replacing. Some complicated tags have been generated, but they’re all consistent and can be fixed in five minutes or so. Refer back to your original Word or Open/Libre Office file to match up what the styles are.

Below is a list of the tags I got in my output, but they may differ slightly depending on your configuration. But it should explain the idea. Remember the tags always come before a paragraph.

Remember, in Notepad++ if you highlight some text then press Find (Ctrl-F), the text will appear automatically in the Find box.

You can keep checking how everything’s looking in the browser in Notepad++ by going to Run > Launch in … (and selecting which browser you’re using).

Type of paragraph Tag outputted by software Replace with
Normal (indented) paragraph <P CLASS=”western”> <p>
Paragraph without indent <P STYLE=”text-indent: 0cm; margin-bottom: 0cm”> <p class=”first”>
Chapter heading <H1 CLASS=”western” STYLE=”margin-top: 0cm”> or
<H1
CLASS=”western”>
<h1>
Second-level heading <H2 CLASS=”western”> <h2>
Third-level heading <H3 CLASS=”western”> <h3>
Indented text <P STYLE=”margin-left: 1cm; text-indent: 0cm; margin-top:
0.21cm”>
<p class=”indented”>

Converting your text to HTML

Now, here comes the most difficult bit. At this point, you’ll have fixed the layout and typography and you’ll have fixed the structure of your document in your word processor (or decided you’ll just edit the code).

Here are some options.

Read the rest of this entry »

An introduction to HTML and CSS

I’ll admit it: HTML can be scary. Somewhere in your browser menu is a command called View Source or View Page Source – if you select that, you get a load of indecipherable code that looks very intimidating.

However, at the level of a file for an eBook, it’s really isn’t so complicated. HTML is a plain text format – you can write and edit any webpage using Notepad or TextEdit. The basic idea is to surround every element with a container, which take the form of <> and </>.

For instance to make a sentence bold, you would use the <strong> tag:

<strong>This text is entirely bold.</strong>

Read the rest of this entry »

© Paul Brookes, 2015. Powered by Wordpress.