Normally Markdown’s default of using em for emphasis makes sense.
Emphasis is what you use italics for most of the time. When you need
cite, you can type <cite>
and when you need a more generic italic, for
example for foreign language words or Linnaean names, you can type
<i>
. But on user-writable forums that disable fallthrough HTML, i is
the better default. It’s never “wrong”, per se (em is just more
specific) and it matches how many people sometimes use *
and _
on
these forums, when they think “I want to make this look cursive”
rather than “I need to semantically emphasize this”.
Same goes for strong and b.
I like that it gives you a choice between _
and *
, that they mean the same
thing, and that you can mix and match them, and that double means
stronger. That was a good design decision that was a lot easier to
remember than the other markup formats I was using before. So
solutions where “_
is cite and *
is em” are no good, especially when
they don’t fix things like Linnaean names.
I suggest that Markdown installations that do allow fallthrough HTML should stick with em and strong as the default, since those are the most common cases and fallthrough HTML can handle the special cases, but Markdown installations on online forums that disable fallthrough HTML (i.e. where Markdown is used in lieu of bbcode or Wiki syntax), it should generate i and b instead of em and strong. It’s less wrong.
Here is an example from CommonMark’s own spec which demonstrates how wack it can get:
<p><strong>Gomphocarpus (<em>Gomphocarpus physocarpus</em>, syn.
<em>Asclepias physocarpa</em>)</strong></p>
That is not correct HTML, which should be:
<p><strong>Gomphocarpus (<i>Gomphocarpus physocarpus</i>, syn.
<i>Asclepias physocarpa</i>)</strong></p>
or, depending on the context, maybe even:
<p><b>Gomphocarpus (<i>Gomphocarpus physocarpus</i>, syn.
<i>Asclepias physocarpa</i>)</b></p>
Linnaean names, like that Latin name for balloon plant used in that
example, are always marked italics, cursive, oblique, or otherwise
text-decorated, but not emphasized; <em>
is semantically wrong for
them. I can not correctly write that plant’s Latin name here on GitHub
since there is currently no way, that I know of, to emit <i>
.
<i>
and <b>
are good fallbacks. They only indicate style. <em>
,
<cite>
, <strong>
are for when you specifically wanna indicate
the semantics of emphasis, citation, or strong emphasis.
Refering to a poodle as “a dog” is slightly weird but not that bad and
it’s technically correct (and that’s what we’re doing when we’re using
<i>
when we mean <em>
).
Refering to a collie as “a poodle” is, on the other hand, quite wack
(but that’s what we’re doing when we’re using <em>
for a Linnaean
name or for a citation or for a foreign-language phrase).
And before someone asks: “But we should express semantics, not style.
I heard someone back in the nineties say that a lot of the time people
are wrongly using <i>
when we should be more precise and use
<em>
”. Yes, that’s true. Em is more specific when i and is better to
use—but only when we know for sure that we mean emphasis.
Yes, it’s true that <em>
is the most common one. 90% of the time
it’s what you mean. But just because you are in a town where 90% of
the dogs are poodles it doesn’t turn a collie into a poodle. A collie
is still a non-poodle dog, just like a citation or a foreign phrase is
still a non-emphasis use of italics.
I backed off from this argument a few years ago because of this
argument: “We support raw HTML so people can type out <i>
or
<cite>
or <b>
when they mean <i>
or <cite>
or <b>
, and they
can use the shorthand *
or _
for the most common one, which is
<em>
, and **
and __
for the second-most common one, which is
<strong>
.”
But two things are becoming clear to me.
1A. People are using CommonMark-derived converters in places where raw
HTML is (and should be) turned off, like on public forums and comment
sections.
1B. Implementors of those public forums are referring to this spec
saying “I’m just doing what CommonMark says”.
\2. Not everyone is, wants to be, or needs to be a linguistics nerd.
People shouldn’t have to learn the specifics minutia of when to use
em, cite, or i. They just want the text to look slanted so they jam
stars or underscores around. Making *
and _
be <i>
match their
expectations.
That’s why my recommendation is this:
Sites where raw HTML is turned off (as it should be, for public text
inputs) should emit <i>
for *
and for _
, and <b>
for **
and
for __
.
Installations where markdown is used as a tool for writers, where it’s
a shortcut for HTML as opposed to a replacement for it, and raw HTML
is allowed, may optionally continue to emit <em>
and <strong>
or
have a flag for that behavior.
That’s what I would use for my own blog where I can type out <i>
,
<cite>
, or <b>
manually, as needed, and most of the time I would
get the default, <em>
. I just checked, and I use <em>
70% of the
time, <cite>
20% of the time, and <i>
10% of the time, so it’s
appropriate for me to have *
and _
emit em since I know to get the
others when I need them (I even have an shortcut that I bolted on to
Emacs markdown-mode to get them as raw HTML), but even then, that’s
not necessarily the best for all installations depending on how nerdy
the users of that tool are expected to have to be.
Not everyone should have to learn this stuff but that doesn’t mean
it’s OK that the web is littered with wrong semantics like
<em>Gomphocarpus physocarpus</em>
.
That’s more wrong than I'm <i>really</i> tired
.
It should be i and b instead of em and strong (at least on most of the websites out there like GitHub, Reddit, Stack Exchange, wikis etc).
I like that *
and _
both mean the same thing, that *
can be used
intraword and _
can’t, etc. That’s all good. I just don’t want to
call collies “poodles”.