Skip to main content

The PreTeXt Guide

Subsection 4.1.4 Characters in Paragraphs

Some keyboard characters are unambiguous, for example, the percent sign, %. Other keyboard characters are poor replacements for several different characters. Is a slash, /, being used to separate information/ideas, or is it a solidus being used to form a fraction such as 3⁄4? Other characters, such as per-mille, ‰, are not present on keyboards at all. We organize this section according to these types of distinctions.

Subsubsection Unambiguous Keyboard Characters

The keyboard characters `, ~, !, @, #, $, %, ^, *, (, ), _, =, +, [, ], {, }, \, |, :, ;, and , are entered as-is and are only rendered one way. Easy.
Of course, the fifty-two Latin letters, and ten decimal digits, are also in this category. If you have an international, or bilingual, or country-specific keyboard, then common accented versions of Latin letters (as used in Europe and the Western Hemisphere) may also be used directly from your keyboard.

Subsubsection Exceptional Keyboard Characters

XML is a markup language, which in part means that some keyboard characters are co-opted to signal the start of markup. For XML this character is the less-than symbol, <. It signals the start of a tag, and then an opening tag ends with a greater-than symbol, >, while a closing tag has an extra / right after the <.
This begs the question: if a < is used in our XML source to signal the start of a tag, then how did we get one to appear here in this sentence without mistakenly starting a tag? Once a markup language gives some characters special meanings, then there needs to be an escape character. For XML the escape character is the ampersand, &. So to author the < and > symbols, we type escaped versions: &lt; and &gt;.
I hear you now say, “But now we just took the & out of the running and gave it a special meaning. How do we get an ampersand?” Easy, use the escaped version: &amp;.
So the short answer is: never, ever type the < or & keyboard characters in isolation. The very beginning of the processing of XML (i.e. PreTeXt) will fail fatally on these characters. Instead, always use the sequences &lt; and &amp; and then very early the XML processing will convert them to characters, without interpreting them as signals for aspects of the markup.
It does not seem necessary to author > as &gt;, though there is no real harm in doing so. The two other characters with escaped versions are the single and double quotes, ' and ", which have escaped version of &apos; and &quot; (respectively). These are only necessary for attribute values, and we have been careful to design PreTeXt so that they are not necessary.

Remark Excessive Escaped Characters.

If you know another markup language, such as , , Markdown, JSON, or PGML, think about how many characters have been given special meanings, and the subsequent necessity to use escaped versions. And if you want to write about computer languages, realize that each such language also gives certain keyboard characters special meanings.
XML only has five exceptional characters, and in your daily use, PreTeXt really only requires you to be aware of two, the minimum necessary for a markup language.

Best Practice A CDATA Section is Never Necessary.

We hate to mention it, but sooner or later, we need to have an uncomfortable discussion about the misunderstood CDATA section, and risk confusing the rest of this subsubsection. And this is the place. But you can come back later, if you wish.
You will read other places about very special markup known as a CDATA section. The name stands for character data, which means “all characters, no markup”. Think of it as switching off the XML processing for a while, so in particular, &, <, > no longer have any special meaning at all. That could be nice, but realize that now there is no opportunity to have any markup present using XML syntax, since it is ineffective.
A CDATA section is always a convenience and is never necessary.
When would it be convenient? Maybe you have some inside an <md> with a large matrix that uses lots of ampersands to separate the entries. Inside a CDATA you can author it with bare & rather than a plethora of &amp; or \amp. But you lose the ability to include an <xref> in that CDATA, so you need to be surgical about its scope. Perhaps a Tikz diagram in a <latex-image> has a multitude of <-> or a chunk of Sage code in an <input> has a lot of finitely-generated algebraic structures authored as R.<x> = ... (which is not even legal Python syntax either!). These places where there is little, or no, markup could be convenient places to use a CDATA. Be sure to read the warning at Item 7 in Section 5.10 before you go all-in.

Subsubsection Ambiguous Keyboard Characters

Some keyboard characters have a primary interpretation, and are imitations of other typographic characters. Your output will be of higher quality if you understand these distinctions and employ the proper variant.
Table Ambiguous Keyboard Characters and Alternatives
Keyboard Primary Notes
/ (forward) slash <solidus/> is a fraction bar, ⁄
' apostrophe <rsq/> is a right single quote, ’
` backtick <lsq/> is a right single quote, ‘
. period abbreviations and end-of-sentence
- hyphen See dashes, and arithmetic
" upright double quote <lq/> is “, <rq/> is ”
Note that the four quote marks (left/right, single/double) are meant for the actual characters. Always use the grouping constructions described above (i.e. <q> and <sq>) when grouping a phrase with quote marks. Note, too, that there is never a good reason to use the keyboard quote character (") unless you are creating some sort of verbatim text, such as a program listing or describing literal keyboard input.
When creating print or PDF via a period may get different trailing space depending on location and context, generally being its use in abbreviations or to conclude a sentence. We do not yet have this dual-use under control.

Subsubsection Extraordinary Characters

Some characters or symbols are typically not available on a keyboard, so we provide empty elements. Many of these may be entered directly into your source as Unicode characters, and they will do well in your HTML output. However, these may fail entirely if you create print or PDF via using the pdflatex engine. Furthermore, even for HTML output there may be several Unicode characters that are very similar.
So again, for the best quality output be aware of these elements and use them. Please suggest additions if you do not find what you need and are resorting to Unicode characters.

<ellipsis/>, …, ellipsis.

Typically three low dots with no intervening space, to indicate a continuation. This will always perform better than three consecutive periods.

<midpoint/>, ·, midpoint.

A small centered (vertically) dot, which can be used to separate pieces of information, especially in displayed text (i.e. outside of paragraphs). Not to be confused with a bullet preceding a list item, or multiplication in mathematics.

<swungdash/>, ⁓, swung dash.

Another decorative separator, not to be confused with the keyboard tilde character since it is wider and thicker.

<permille/>, ‰, per mille.

Like per cent, but now a number expressed as its product with \(1000\) (rather than with \(100\)).

<pilcrow/>, ¶, pilcrow, paragraph mark.

Mark used historically to indicate the start of an internal paragraph, and in a more modern use, to indicate a permalink.

<section-mark/>, §, section mark.

Used to prefix the number of a section, or other division. (So the word section is being used generically here.)

<copyright/>, ©, copyright.

The symbol used in publishing, legal, or business contexts. For a PreTeXt project, copyright information can be specified within the <colophon> portion of the <frontmatter>.

<trademark/>, ™, trademark.

The symbol used in legal or business contexts.

<registered/>, ®, registered.

The symbol used in legal or business contexts.
Table Extraordinary Characters and Their Empty Elements
Character Name Element
ellipsis <ellipsis/>
· midpoint <midpoint/>
swung dash <swungdash/>
per-mille <permille/>
pilcrow <pilcrow/>
§ section-mark <section-mark/>
© copyright <copyright/>
trademark <trademark/>
® registered <registered/>

Subsubsection Accented Characters

The second 128 Unicode characters (hex 80 to FF) contain many of the most frequently-used accented characters in Western languages, along with niceties such as the German eszett, ß, or the Scandinavian æsc, æ, an a-e ligature. Like the fifty-two Latin letters (part of the first 128 Unicode characters), these may be used as-is. They may be present on your keyboard, or you may need to learn keyboard shortcuts or specifics of your operating system to enter them as Unicode characters. In a pinch, you can often cut-and-paste a few characters from web pages.
This table is indexed by the Unicode number, in hexadecimal notation. The first 32 of the 128 (U+0080U+009F) are control codes and U+00A0 is a non-breaking space, so is invisible, while U+00AD is a soft hyphen (which we have not implemented and so is excluded).
Table Latin-1 Supplement, Unicode U+00A0U+00FF
0 1 2 3 4 5 6 7 8 9 A B C D E F
00A_   ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ® ¯
00B_ ° ± ² ³ ´ µ · ¸ ¹ º » ¼ ½ ¾ ¿
00C_ À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï
00D_ Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß
00E_ à á â ã ä å æ ç è é ê ë ì í î ï
00F_ ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ

Subsubsection Arithmetic

If you are writing about technical subjects, then you will want to avail yourself of PreTeXt’s extensive support for mathematics. Otherwise, you may wish to write really simple arithmetic within sentences without extra formatting. Notice that there is no provision for preventing line-breaks in the midst of these expressions.
So you can author (2×6)÷3+10−15 = -1, but that is about the limit of the complexity of expressions you should author without using the extensive capabilities designed for mathematics, rather than arithmetic. Note that the spaces around the equal sign have been supplied in the source, but no spaces have been provided around the operators. Also, the minus sign and the negative are slightly different because the subtraction uses the <minus/> element, while the negative answer uses a plain keyboard hyphen/dash.
Using the <m> element instead, the above is \((2\times 6)\div 3+10-15=-1\text{.}\) Note the more careful spacing, and the appropriate symbols for subtraction and negation, with no special care in the syntax used in the source.
Note also that the plus sign, + and the equals sign, =, can be provided in text as the unambiguous keyboard characters.
The <degree/>, <prime/>, <dblprime/> elements support simple coordinates with degrees, minutes, seconds, or temperature, or distance in feet and inches. “We parked the car at 36°16′0.83″N, 122°35′47.27″W, and since it was 93°F, we walked 505′3.6″ so we could swim in the bay.”

<minus/>, −, minus, subtraction, negation.

For simple arithmetic expressions in text, this symbol may be used. Note that the keyboard hyhpen (or dash) might be acceptable for your purposes, but they are different.

<times/>, ×, times, multiplication.

For simple arithmetic expressions in text, this symbol may be used. Or it may be used to specify dimensions, as in “I bought a 2×4 at the lumber yard.”

<solidus/>, ⁄, solidus, virgule, fraction bar.

For simple arithmetic expressions in text, this symbol may be used to form a fraction. It should appear to have a significantly shallower slope than the forward slash, /.

<obelus/>, ÷, obelus, division sign.

For simple arithmetic expressions in text, this symbol may be used to indicate division.

<plusminus/>, ±, plus-minus sign.

For simple arithmetic expressions in text, this symbol may be used to indicate a tolerance or a choice of two values, one the negative of the other.

<degree/>, °, degree symbol.

A raised open circle for temperature or for angles used in coordinates.

<prime/>, ′, prime symbol.

A straight mark that is placed like an exponent. For use in coordinates or statements of linear measure in feet and inches. Not an apostrophe, and not mathematics (like, say, not to denote a derivative).

<dblprime/>, ″, double prime symbol.

Two straight marks that are placed like an exponent. For use in coordinates or statements of linear measure in feet and inches. Not an apostrophe, and not mathematics (like, say, not to denote a second derivative).

Subsubsection Separators

<ndash/>, –, en dash.

A dash, the width of a lowercase ‘n’, or exactly half the width of the em dash. This is typically used to express a range, such as 1955<ndash />1975, with no intervening spaces. It is often expressed as two hyphens when typed. Bringhurst suggests an ndash surrounded by spaces – thusly – when setting off phrases.

<mdash/>, —, em dash.

A dash, the width of a lowercase ‘m’, or twice the width of the en dash. This is typically used to express a secondary part of a phrase, much like the use of a semi-colon or parentheses.
Style guides suggest that there should be no spaces, before or after, an em dash, while some allow for a “thin” space on either side. You should always leave no space around an <mdash/> element in your PreTeXt source. Then a publication file entry can be used to elect the automatic addition of a thin space, should your publisher so desire. See Subsection 44.1.5 for the syntax of the publisher file entry.

<nbsp/>, non-breaking space.

A space, but which ties two words together and discourages a line break when formatted, such as Summer<nbsp />1967. This can also be used to discourage a period in an abbreviation from being interpreted as the end of a sentence, such as C.R.<nbsp />Darwin.

<midpoint/>, <swungdash/>, ·, ⁓, midpoint, swung dash.

These can be used·as more decorative⁓separators.