Emphasis vs Italics

Normally Markdown’s default of using em for emphasis makes sense.

Emphasis is what you use italics for most of the time. When you need cite, you can type <cite> and when you need a more generic italic, for example for foreign language words or Linnaean names, you can type . But on user-writable forums that disable fallthrough HTML, i is the better default. It’s never “wrong”, per se (em is just more specific) and it matches how many people sometimes use * and _ on these forums, when they think “I want to make this look cursive” rather than “I need to semantically emphasize this”.

Same goes for strong and b.

I like that it gives you a choice between _ and *, that they mean the same thing, and that you can mix and match them, and that double means stronger. That was a good design decision that was a lot easier to remember than the other markup formats I was using before. So solutions where “_ is cite and * is em” are no good, especially when they don’t fix things like Linnaean names.

I suggest that Markdown installations that do allow fallthrough HTML should stick with em and strong as the default, since those are the most common cases and fallthrough HTML can handle the special cases, but Markdown installations on online forums that disable fallthrough HTML (i.e. where Markdown is used in lieu of bbcode or Wiki syntax), it should generate i and b instead of em and strong. It’s less wrong.

Here is an example from CommonMark’s own spec which demonstrates how wack it can get:

<p><strong>Gomphocarpus (<em>Gomphocarpus physocarpus</em>, syn.
<em>Asclepias physocarpa</em>)</strong></p>

That is not correct HTML, which should be:

<p><strong>Gomphocarpus (<i>Gomphocarpus physocarpus</i>, syn.
<i>Asclepias physocarpa</i>)</strong></p>

or, depending on the context, maybe even:

<p><b>Gomphocarpus (<i>Gomphocarpus physocarpus</i>, syn.
<i>Asclepias physocarpa</i>)</b></p>

Linnaean names, like that Latin name for balloon plant used in that example, are always marked italics, cursive, oblique, or otherwise text-decorated, but not emphasized;  is semantically wrong for them. I can not correctly write that plant’s Latin name here on GitHub since there is currently no way, that I know of, to emit .

 and  are good fallbacks. They only indicate style. , <cite>,  are for when you specifically wanna indicate the semantics of emphasis, citation, or strong emphasis.

Refering to a poodle as “a dog” is slightly weird but not that bad and it’s technically correct (and that’s what we’re doing when we’re using  when we mean ).

Refering to a collie as “a poodle” is, on the other hand, quite wack (but that’s what we’re doing when we’re using  for a Linnaean name or for a citation or for a foreign-language phrase).

And before someone asks: “But we should express semantics, not style. I heard someone back in the nineties say that a lot of the time people are wrongly using  when we should be more precise and use ”. Yes, that’s true. Em is more specific when i and is better to use—but only when we know for sure that we mean emphasis.

Yes, it’s true that  is the most common one. 90% of the time it’s what you mean. But just because you are in a town where 90% of the dogs are poodles it doesn’t turn a collie into a poodle. A collie is still a non-poodle dog, just like a citation or a foreign phrase is still a non-emphasis use of italics.

I backed off from this argument a few years ago because of this argument: “We support raw HTML so people can type out  or <cite> or  when they mean  or <cite> or , and they can use the shorthand * or _ for the most common one, which is , and ** and __ for the second-most common one, which is .”

But two things are becoming clear to me.

1A. People are using CommonMark-derived converters in places where raw HTML is (and should be) turned off, like on public forums and comment sections.
1B. Implementors of those public forums are referring to this spec saying “I’m just doing what CommonMark says”.

\2. Not everyone is, wants to be, or needs to be a linguistics nerd. People shouldn’t have to learn the specifics minutia of when to use em, cite, or i. They just want the text to look slanted so they jam stars or underscores around. Making * and _ be  match their expectations.

That’s why my recommendation is this:

Sites where raw HTML is turned off (as it should be, for public text inputs) should emit  for * and for _, and  for ** and for __.

Installations where markdown is used as a tool for writers, where it’s a shortcut for HTML as opposed to a replacement for it, and raw HTML is allowed, may optionally continue to emit  and  or have a flag for that behavior.

That’s what I would use for my own blog where I can type out , <cite>, or  manually, as needed, and most of the time I would get the default, . I just checked, and I use  70% of the time, <cite> 20% of the time, and  10% of the time, so it’s appropriate for me to have * and _ emit em since I know to get the others when I need them (I even have an shortcut that I bolted on to Emacs markdown-mode to get them as raw HTML), but even then, that’s not necessarily the best for all installations depending on how nerdy the users of that tool are expected to have to be.

Not everyone should have to learn this stuff but that doesn’t mean it’s OK that the web is littered with wrong semantics like Gomphocarpus physocarpus.

That’s more wrong than I'm really tired.

It should be i and b instead of em and strong (at least on most of the websites out there like GitHub, Reddit, Stack Exchange, wikis etc).

I like that * and _ both mean the same thing, that * can be used intraword and _ can’t, etc. That’s all good. I just don’t want to call collies “poodles”.