“Normie” space characters:

U+0020
The humble space character that we all know and love;
U+0009
Character tabulation, commonly known as a “tab” (\t);
U+000A
New line character (\n), also known as end-of-line character and line feed (LF);
U+000D
Carriage return (CR; \r); commonly seen alongside \n (i.e., CRLF);
U+00A0
The classic non-breaking space, used everywhere (HTML  ), essentially telling the renderer to not insert a line break. This is commonly used between numbers and their units, as per the recommendation of many style guides.

“Indie” space characters:

U+000B
Line tabulation (\v), a legacy control character from typewriter/teletype days; moves the cursor down one “tab stop” vertically. Rarely used in modern text, but still recognised as whitespace in Unicode and for compatibility;
U+000C
Form feed (\f and ^L), historically used to denote page breaks in long documents;
U+0085
Next line (NEL), used in EBCDIC to represent new lines
U+1680
Ogham space mark; belongs to the ancient Irish Ogham script, not used in modern text;
U+2000
En quad, a space roughly the width of a capital letter N;
U+2001
Em quad, a space roughly the width of a capital letter M;
U+2002
En space, a space roughly the width of a capital letter N;
U+2003
Em space, a space roughly the width of a capital letter M;
U+2004
Three per em space, a space roughly the width of 1⁄3 em;
U+2005
Four per em space, a space roughly the width of 1⁄4 em;
U+2006
Six per em space, a space roughly the width of 1⁄4 em;
U+2007
Figure space; exactly as wide as a digit (0–9), for aligning numbers in columns;
U+2008
Punctuation space; as wide as a full stop or comma, for alignment;
U+2009
Thin space, a space roughly the width of 1⁄5–1⁄6 em. It can be used in nested quotation marks, or to separate glyphs that interfere with one another. It is not as narrow as the hair space. The thin space is also used in as a thousands separator when writing numbers in groups of three digits, in order to facilitate reading, to avoid the ambiguity of the comma as the thousands separator/decimal point. It can also be used on either side of a dash, but hair space might be preferable for this use case;
U+200A
Hair space, the thinnest of the horizontal whitespace characters; thinner than a thin space, these spaces can be used on either side of a dash to improve readibility/aesthetics;
U+202F
Similar to the classic non-breaking space, the narrow non-breaking space can used, e.g., on the interior side of French guillemets (« and ») before certain punctuation;
U+205F
Medium mathematical space, a space roughly the width of 4⁄18 em; used in math notation between symbols, such as a + b;
U+3000
Ideographic space, a full-width space used in CJK (Chinese, Japanese, and Korean) text, matching the stanrdard width of a CJK character;

Honourable mentions (not classified as spaces in the unicode standard but they have earned their place among “real” spaces spaced):

U+FEFF
Zero-width non-breaking space (ZWNBSP); has no visible width but prevents a line break (see also NBSP, U+00A0). Also used as a Byte Order Mark (BOM) at the start of a file to signal encoding and byte order, which has become its primary modern use. This dual purpose has made its use somewhat ambiguous, so it is best practice to avoid using this character for its non-breaking function in favour of the word joiner character, U+2060. This character is famously found at zerowidthspace.me;
U+2060
Word joiner; has no visible width but prevents a line break at its position. Functionally identical to the original use of U+FEFF;
U+200B
Zero-width space (ZWSP); has no visible width but marks a valid line-break opportunity, particularly useful in languages/scripts (e.g., Chinese, Thai) that don’t use spaces between words to hint where wrapping can occur;
U+200D
Zero-width joiner (ZWJ); force adjacent characters to join or form a ligature. These are most famously used in emoji sequences; e.g., 👩‍⚕️ is actually two emojis (👩 and ⚕️) joined by ZWJs;
U+200C
Zero-width non-joiner (ZWNJ); the opposite of ZWJ, forcing two adjacent characters to join/ligate, important in some scripts (e.g., Arabic, Persian) where letters normally connect, but they need to be visually separated without a visible space;
U+2800
Braille blank character; functionally a blank/space character, but technically used as a Braille pattern with no dots raised.

N.B.—: Throughout this page and dashes, we talk about “en” and “em” to mean roughly the width of the capital letter N and M respectively. This was historically true in metal type, and it is a useful mnemonic in practice, but modern typography has redefined this unit. Technically 1 em is a typographic unit equal to the font’s point size. For example, 1 em is 12 pt in 12 pt font (with the standard typographic unit of 1 pt being 1⁄72 inch). And 1 en is exactly  1⁄2 em. Note also that in practice, the designers of the typeface have some creative liberties so widths may vary from font to font, even with the same point size.

N.B.—: A quad and a space are effectively the same, but in the Unicode standard they are treated as distinct for historical compatibility reasons. A quad was a blank metal block used for spacing in metal type, whereas a space is the modern, digital equivalent.

N.B.—: In mathematical typography, the width of spaces are usually given in multiples of 1⁄18 em.


See also: empty form, dashes

Back to home