HTML and captions

If you have encountered images on the Web (and also if somehow you have not) then you might have heard of concepts like alt-text, or perhaps title text, or the general idea of captioning images.

When it comes to images and image-like elements in HTML5, there are actually several ways of attaching things that could be called captions. They have slightly different purposes, and present differently in user agents. This is a brief overview of what they are, and what they are meant for.

The title attribute

title is a global attribute (meaning available on all HTML elements) which allows for adding what is termed advisory information to HTML elements. An example is in listing 1.

<img src="kitty.webp" title="Photo of a young cat">

Listing 1: Example use of the title attribute

The advisory information is usually displayed in the form of a tool-tip—a box that pops up when hovering your mouse pointer over something. The attribute can therefore be used to add tool-tips to things like paragraphs, links, or, yes, images.

There is an immediately obvious problem here: some user agents don't really have tool-tips. Touchscreen devices, for one, usually do not have mouse pointers, and support for displaying title text in browsers for such devices may be rather lacking. Even on an ordinary desktop computer with a mouse-like pointing device, title text often has poor discoverability. The standard itself discourages reliance on the title attribute for these reasons.

Title text can have its uses, however: interfaces where it's already expected, or things like buttons with explanatory tool-tips. However, in general, by itself, it's not really a good way to caption images.

The alt attribute

<img> elements (which are the usual way of embedding images in an HTML document) can have an alt attribute, which defines the element's fallback content. Listing 2 provides an example.

<img src="kitty.jpg" alt="a cat playing with some yarn">

Listing 2: Example use of the alt attribute

Fallback content, as the name indicates, is the content presented to the user if the main content cannot be presented. An obvious case of this is screen readers—they cannot display images, so instead they read out the alt text. A less obvious case is text mode browsers like Lynx, or situations where network problems or explicit settings mean a more ordinary browser cannot display a particular image.

Of note is the fact that the HTML5 standard defines different semantics for missing and empty alt attributes. An alt attribute set to an empty string (alt="") means that the image is decorative or supplemental, and so user agents incapable of rendering images can skip it entirely, as if it did not exist. On the other hand, <img> elements with no alt attribute at all are considered part of content, but with no alternate textual representation. A text mode browser or a screen reader can skip images with empty alt text entirely, while for images with missing alt attribute, it should indicate that the image is there, but does not have fallback content. Note that instead of <img> tags with empty string alt attributes, the general recommendation is to include images via CSS if they are non-essential parts of the UI, or otherwise purely decorative.

alt text is generally not expected to be displayed in a tool-tip. It is not meant to provide supplemental information, but rather an alternate representation of the content. People, however, often conflate the two, likely due to a historical quirk: old versions of Internet Explorer used to display alt text in tool-tips, as if it were title text. Microsoft's more modern browsers no longer do this, but some people still hold the expectation of that behavior.

Figures and their captions

A good way to include a caption with an image in HTML5 is to use the <figure> element. <figure> elements can include <figcaption> elements, which, as the name hints, contains the caption for the figure.

<figure>s can include things besides images—in fact, they can include pretty much anything. This means that your figures can be, for example, math formulas or code samples (like, say, a sample of how to use figure elements). They provide a way to mark up some content that is relevant to, but separate from the main text on the page—the standard describes it as content that is "self-contained (like a complete sentence)".

<figure>
	<img src="kitty.webp" alt="a sleeping black cat">
	<figcaption>Photo of a particularly adorable cat</figcaption>
</figure>

Listing 3: Example use of a figure and its caption. Coincidentally, this listing is also a figure.

The advantage of using <figure>s is that they are semantic HTML: elements within the figure are distinguished as separate from the rest of content, and the use of <figcaption> makes it clear what is being captioned. Unlike title attributes, <figcaption> elements can contain any flow content, are generally displayed like any other text, and can be styled with CSS, which gives them better discoverability.

Summary

To sum up: generally, all <img> elements should have an alt attribute, even if it is an empty string. The alt text should not provide supplemental description of the image, but rather it should be the replacement text, to be used when the image cannot be displayed.

In situations where it makes sense to call something a figure, use the <figure> element, together with <figcaption> marking up the caption. Style with CSS as needed, showing the figure–caption relationship visually as well.

Use the title attribute sparingly, ideally in situations where it's already expected. Keep in mind that it is less discoverable than the alternatives, and might be actually impossible to read with some user agents.

Further reading

Standard documents:

  • title attribute: W3C, WHATWG
  • <img> element and its alt attribute: W3C, WHATWG
  • Requirements for alt text (especially interesting if you are not sure what kind of text to write for fallback content): W3C, WHATWG
  • <figure>: W3C, WHATWG
  • <figcaption>: W3C, WHATWG