Version: 0.9
Copyright (c) 2005 Robert Bamler <Robert dot Bamler (a) gmx dot de>
An encyclopodia eBook stores all articles (chapters) of the eBook in one file. The file consists of three parts:
The following specifies in detail the layout of an article in the second section of the eBook file.
eBook-Files are built of four different classes of data structures:
Three different special characters are used to indicate the beginning of a block, control sequence or string chunk or the end of a block. To make the format more error tolerant, a fourth special character (the escape character) is used to eleminate all occurencies of the special characters in the content of strings. Note that the escape char would in fact not be necessary if it was sure that neither the file nor the reader application contain any errors. However, this would be a very optimistic assumption.
The following special characters are used:
A string is a sequence of one or more string chunks and represents an array of characters, usually to be displayed on the screen. A string chunk has the following layout:
When we refer to an entitiy, this simply means that a particular integral value is written to the eBook-file without any meta data. The following entity types are defined:
Type name | length | Description | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
uint1 | 1 |
There are 8 different types of integers. Integer types starting with "uint" are interpreted as unsigned integers, integer types starting with "sint" are interpreted as signed integers. The number (eg. "2" in "uint2") specifies the length of the integer in bytes. The first byte contains the most significant bits and the last byte contains the least significant bits (big endian). For example, an uint2 that represents the value "42" has a length of two bytes and is stored as: 00 2a |
||||||||||||||
sint1 | 1 | |||||||||||||||
uint2 | 2 | |||||||||||||||
sint2 | 2 | |||||||||||||||
uint3 | 3 | |||||||||||||||
sint3 | 3 | |||||||||||||||
uint4 | 4 | |||||||||||||||
sint4 | 4 | |||||||||||||||
Timestamp | 8 |
This is merely a sequence of several integer types, each describing a part of a date and time. The parts appear in the following order:
|
Control sequences appear within the contents of blocks. There are various types of control sequences and each type has a well defined amount of stored data and thus, a well defined size. For example, control sequences of the type "Paragraph Break" do not store any data, because they only denote the beginning of a new paragraph at the position they appear in. Thus, control sequences of this type have a well defined size of two bytes (special char + type ID). In contrast, control sequences of the type "Change Font Family" do store data and have a well defined size of 4 bytes (special char + type ID + 2 bytes for font family).
The generall layout of all control sequences is defined as follows:
field | size | value |
---|---|---|
special char: begin of block or control sequence | 1 | 0xFF - Denoting the beginning of a block or control sequence (in this case: control sequence) |
type ID | 1 | Identifies the type of the control sequence. In order to discriminate blocks and control sequences, all control sequences have a type ID lower than 128. Type IDs equal or higher than 128 are reserved for blocks. |
contents | ? | Depending on the type of the control sequence, any entities that store the data associated to the control sequence. All entities have to appear in the order specified for the particular control sequence type. |
More concrete, the following control sequences are defined:
type | value |
---|---|
ID 0: Dummy — Sometimes needed for disambiguation. Does not have any meaning. (empty) | |
ID 16: Change font family — Changes the current font family. | |
uint2 |
The ID of the font family to change to: 0 = default font 1 = default sans serif font 2 = default serif font 3 = monospace font |
ID 17: Change font size — Changes the current font size. | |
uint1 |
The size of the font to change to. Currently, the following sizes are supported: TODO; The default font size is 10, but nobody knows how large this actually is. |
ID 18: Line break — Causes the following text to start at a new line. (empty) | |
ID 19: Paragraph break — Terminates the previous and begins a new paragraph. (empty) | |
ID 20: Hline — creates a horizontal line. This control sequence is typically surrounded by two paragraph breaks. (empty) | |
ID 21: Begin italic — Text after this control sequence will be displayed italic. (empty) | |
ID 22: End italic — Text after this control sequence will not be displayed italic any more. (empty) | |
ID 23: Begin bold — Text after this control sequence will be displayed bold. (empty) | |
ID 24: End bold — Text after this control sequence will not be displayed bold any more. (empty) | |
ID 25: Begin underlined — Text after this control sequence will be displayed underlined. (empty) | |
ID 26: End underlined — Text after this control sequence will not be displayed underlined any more. (empty) | |
ID 27: Table Row — Indicates the end of the previous and the beginning of the next row of a Table. This control sequence is only allowed between table rows, ie. neither at the beginning nor at the end of a table. Thus, a valid "Table"-block contains exactly one less "Table Row" control sequence than table rows. | |
ID 28: Table Cell — Indicates the beginning of a table cell. The contents of the cell is the chain of Elements that come after this control sequence and before the next "Table Row", "Table Cell", or "Table Header Cell" control sequence or the end of the "Table" block. | |
uint1 | The number of table rows this cell covers. Must not be zero. You will want to use the value "1" for most table cells. (Similar to the "rowspan"-attribute in HTML, but required.) |
uint1 | The number of table columns this cell covers. Must not be zero. You will want to use the value "1" for most table cells. (Similar to the "colspan"-attribute in HTML, but required.) |
ID 29: Table Header Cell — Same as control sequence "Table Cell", except that the contents of this cell is asumed to contain header data and should be displayed in a highlighted style by a reader application. | |
uint1 | rowspan; see description for control sequence "Table Cell". |
uint1 | colspan; see description for control sequence "Table Cell". |
Blocks are designed to hold complex data of various size. There are several different types of blocks. Blocks can contain other blocks (subblocks), control sequences, entities, and/or strings.
The generall layout of all blocks is defined as follows:
field | size | value |
---|---|---|
special char: begin of block or control sequence | 1 | 0xFF - Denoting the beginning of a block or control sequence (in this case: block) |
type ID | 1 | Identifies the type of the block. In order to discriminate blocks and control sequences, all blocks have a type ID equal to or higher than 128. Type IDs lower than 128 are reserved for control sequences. |
contents | ? | Depending on the type of the block, any other blocks (subblocks), control sequences, entities, and/or strings that store the data associated to the block. All elements have to appear in the order specified for the particular block type. |
special char: end of block | 1 | 0xFE - Denoting the end of a block |
type ID | 1 | Same identifier as used at the beginning of the block. This is for error tolerance only. |
More concrete, the following blocks are defined:
category | type | amount | value |
---|---|---|---|
ID 128: Article — Contains an article (chapter) of the eBook. | |||
string | string | 1 | The title of the article, in the form it should be displayed on the screen. |
control sequence | dummy | 1 | To avoid ambiguities, this control sequence makes it possible to find the end of the list of string chunks. |
entity | Timestamp | 1 | The date and time of the last modification of the article. |
subblock | int list | 1 ... ∞ |
A list of the IDs of all authors that contributed to this article. If the article comes from a wiki, only authors who were logged in when submitting their contribution are listed here. A program that displays the users who contributed to the article should also display a note that there might be other (anonymous) users who also contributed to this article, iff the article came from a source where anonymous users were able to contribute (such as Wikipedia). If there are more than one "int list"-subblocks, then a reader application concatenates the items of the lists to one large list. |
control sequence | dummy | 1 | This control sequence makes the format more error tolerant and also makes reading eBooks much easier. |
various | In any order and repetition, any of: String, Change font family, Change font size, Line break, Paragraph break, Hline, [Begin/End] [italic/bold/underlined], List, Text link, Anchor point, Table | 0 ... ∞ | The contents of the article. |
ID 129: Int list — Contains a list of 0 to 255 integers of the same type. | |||
entity | uint1 | 1 | This entity contains the number of list items. Because this is a uint1, the list cannot contain more than 255 integers. However, whenever an int list may appear as a subblock of some parent block, the parent block is specified to allow an arbitrary amount of subsequent occurencies of entity lists. Thus, int lists of arbitrary length are possible |
entity | sint1 | 1 | Specifies the length (in bytes) of each integer value in the list. If the value of this entity is positive, then all list items are unsigned integers with the given length. If the value of this entity is negative, then all list items are signed integers with a length equal to the absolute value of this entity. |
entity | [u/s]intX (as specified above) | 0...255 (as specified above) | The list items. There must be exactly as much list items as specified by the first entity and they all have to be of the type specified by the second entity of this block. |
ID 130: List — Contains a list of items to be displayed on the screen. | |||
subblock | List item | 0 ... ∞ | The items of the list. |
ID 131: List item — An item of a list (appears as subblock of a list block). | |||
entity | uint1 | 1 | The depth or level of this list item. Starting with zero, nested list items have a higher depth than their parent list items. |
entity | uint2 | 1 | The number of the list item. If set to zero, an unordered list (bulletin list) is assumed for this list level. |
various | Same as for last elements in "Article" | 0 ... ∞ | Any text of the list item. |
ID 132: Text link — A link to another page and/or anchor point. Links are automatically highlighted by a viewer application. Thus, you should not surround all links by begin unterline – end underline blocks. | |||
string | string | 0 | The text to be displayed on the screen in a highlighted style. |
control sequence | dummy | 0 or 1 | To avoid ambiguities, this control sequence makes it possible to find the end of the list of string chunks. This control sequence must be present if a link href is given separately from the link text (see below) and it must not be present if no separate link href is given. |
string | string | 0 or 1 (same as above) |
The title of the article to jump to when this link is clicked. You can specify an anchor point in the article by using a #-symbol or even let this string begin with a #-symbol for internal article-links. If this string is given, then the dummy control sequence above must also be present. If this string is not given, then the dummy control sequence above must not be present, too. If this string is not given, it defaults to the text given in the previous string. |
ID 133: Anchor point — An (invisible) mark that can be referenced to by Text links using the #-symbol. | |||
string | string | 1 | An ID that can be used by TextLinks. This ID does in general not start with a #-symbol. However, TextLinks use the #-symbol to specify that the substring after the #-symbol refers to an anchor ID. The values "top" and "bottom" should not be used as they are reserved for the top and bottom of the article. |
ID 134: Header — A header (title line). You normally want to surround Header-blocks by Paragraph break-blocks | |||
entity | uint1 | 1 | The depth of the header. A value of zero will create the largest header possible and should be used at the very beginning of an article. |
various | Same as for last elements in "Article" | 0 ... ∞ | Any text of the header. |
ID 135: Table — A block containing data that should be layouted in a tabular. | |||
entity | uint1 | 1 | The width of the table border, in pixels. However, you can't be sure how a reader application actually handles this attribute (some application might ignore it, because it doesn't have enough screen space to draw a border, some other application might draw a border with a maximum width of 1px, etc.). |
string | string | 0 or 1 | A header text for the table (optional). |
control sequence | dummy | 1 | To avoid ambiguities, this control sequence makes it possible to find the end of the list of string chunks. |
various | Same as for last elements in "Article", plus: Table Row, Table Cell, Table Header Cell | 0 ... ∞ | The contents of the table. The order of the control sequences "Table Row", "Table Cell", "Table Header Cell" must make sense, ie. the sum of colspans of all cells in a row must be the same for each row and the sum of rowspans of all cells in a column must be the same for each column. |
ID 136: Math — Contains LaTeX-code that might be rendered by a sophisticated reader application. | |||
string | string | 1 | The LaTeX-code that describes the content of this block. This text will be layouted as if it was between "$$" and "$$" in a LaTeX file. A reader application should allow macros from the following LaTeX packages: ucs, inputenc, amsmath, amsfonts, amssymb, empty. |
Copyright (c) 2005 Robert Bamler <Robert dot Bamler (a) gmx dot de>
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is available here.
"Apple" and "iPod" are registered trademarks of Apple Computer, Inc.