Improved Excerpts 2.0

I recently got around to redesigning my blog. The whole design uses only the colours available on the Commodore 64 – the first computer I ever made video games for. The theme is based on the underscores theme, a bare bones theme that supports sass. You can find the source code on my github page.

One thing I wanted to take advantage of was the improved excerpt code I wrote for the Dragon Burn website. I’ve written about this before, but I’ve since improved the code so I figured it was time to write an update.

WordPress Excerpts

I find the vanilla code for generating excerpts to be subpar. If you’re lazy like me and don’t want to manually create an excerpt for every post you create, there is an easy way to override the vanilla excerpt code. In your functions.php you can remove the default filter and replace it with your own:

//  Replace default excerpt filter with our custom filter.
remove_filter('get_the_excerpt', 'wp_trim_excerpt');
add_filter('get_the_excerpt', 'improved_trim_excerpt');

Now we can create a function called ‘improved_trim_excerpt’ that will be used whenever WordPress requests an excerpt for a post. I’m just going to list the code and then go through it line by line.

function improved_trim_excerpt($text) {
	if ( '' == $text ) {
      		//  Get default content.
		$text = get_the_content();	
	}
			
	//  Filter out any scripts.
	$text = preg_replace('@<script[^>]*?>.*?</script>@si', '', $text);

	//  Filter out any figures (and their captions).
	$text = preg_replace('@<figure[^>]*?>.*?</figure>@si', '', $text);

	//  Strip out HTML but keep subheadings and paragraphs.
	$text = strip_tags($text, '<p><em><strong><h2>');

	//  Increase the length to 35 words.
	$excerpt_length = 35;
	if (function_exists("pll_current_language") &&
		pll_current_language() == "zh")
	{
		$text = mb_substr($text, 0, $excerpt_length);
	}
	else
	{
		$words = explode(' ', $text, $excerpt_length + 1);
		if (count($words)> $excerpt_length) {
			array_pop($words);
			$text = implode(' ', $words);
		}
	}

	//  Cut off after the last complete sentence.
	$sentences = explode('.', $text);    
	if (sizeof($sentences) > 1)
	{
		array_pop($sentences);
		$text = implode('.', $sentences);
	}
	else
	{
		$text .= "...";
	}

	//  Add a 'Read More...' link.
	$text = $text.'.</p>';
	$text = closetags($text);
	$text = $text.'<p><a href="'.get_permalink().'">Read More...</a></p>';
	return $text;
}

Get the Content

First we make sure we have content to generate an excerpt with. If the function is called with an empty string we simply call WordPress’s get_the_content() function so we have something to work with:

	if ( '' == $text ) {
      		//  Get default content.
		$text = get_the_content();	
	}

Clean the Content

Next we remove anything inside <script> and <figure> tags using preg_replace so we don’t need to worry about them. We also clean out all HTML tags using strip_tags from the content except for a few simple formatting tags.

	//  Filter out any scripts.
	$text = preg_replace('@<script[^>]*?>.*?</script>@si', '', $text);

	//  Filter out any figures (and their captions).
	$text = preg_replace('@<figure[^>]*?>.*?</figure>@si', '', $text);

	//  Strip out HTML but keep subheadings and paragraphs.
	$text = strip_tags($text, '<p><em><strong><h2>');

Reduce Content Size

I’ve found that 35 words works for excerpts on the Dragon Burn website. Different websites may look better with longer or shorter excerpts – it all depends on the design.

The code I wrote also supports Chinese. To do this I use function_exists to check that the pll_current_language function exists. This function is part of the Polylang plugin that I use on the Dragon Burn website. Since I check for it with function_exists I can use the same code on this blog even though it doesn’t have the plugin.

If the post is in Chinese I just take the first 35 characters and use that as the excerpt.

	$excerpt_length = 35;
	if (function_exists("pll_current_language") &&
		pll_current_language() == "zh")
	{
		$text = mb_substr($text, 0, $excerpt_length);
	}

If the post is in English I check how many words are in the content by creating an array using explode. I set the maximum length of this array to $excerpt_length + 1. By doing this the maximum array size will be 36, and the last element won’t be split – it will just contain the rest of the content. In this case the size of $words is greater than 35, so I pop the last element and replace $text with just the first 35 words.

	else
	{
		$words = explode(' ', $text, $excerpt_length + 1);
		if (count($words)> $excerpt_length) {
			array_pop($words);
			$text = implode(' ', $words);
		}
	}

Use Whole Sentences

Next I use explode again to find out how many sentences there are in the remaining text. If the text is only one incomplete sentence I add an ellipsis (…) to the end of the text as a fallback. Otherwise I do the same as I did for words, popping the last incomplete sentence and replacing $text with only the complete sentences.

	//  Cut off after the last complete sentence.
	$sentences = explode('.', $text);    
	if (sizeof($sentences) > 1)
	{
		array_pop($sentences);
		$text = implode('.', $sentences);
	}
	else
	{
		$text .= "...";
	}

Read More…

Finally we add a link to encourage visitors to read the entire article. The code is pretty simple:

	//  Add a 'Read More...' link.
	$text = $text.'.</p>';
	$text = closetags($text);
	$text = $text.'<p><a href="'.get_permalink().'">Read More...</a></p>';
	return $text;

The closetags function is a function I wrote to make sure that all html tags in the excerpts are closed. This ensures there are no weird formatting errors because of an unclosed <b> tag or similar.

function closetags($html) {
    // Find all opened tags
    preg_match_all('#<(?!meta|img|br|hr|input\b)\b([a-z]+)(?: .*)?(?<![/|/ ])>#iU', $html, $result);
    $openedtags = $result[1];

    // Find all closed tags
    preg_match_all('#</([a-z]+)>#iU', $html, $result);
    $closedtags = $result[1];

    // If the lengths match assume all tags have been closed properly.
    $len_opened = count($openedtags);
    if (count($closedtags) == $len_opened) {
        return $html;
    }

    // Otherwise go through and make sure each tag is closed properly.
    $openedtags = array_reverse($openedtags);
    for ($i=0; $i < $len_opened; $i++) {
        if (!in_array($openedtags[$i], $closedtags)) {
            $html .= '</'.$openedtags[$i].'>';
        } else {
            unset($closedtags[array_search($openedtags[$i], $closedtags)]);
        }
    }

    return $html;
} 

Generating Content

To actually use excerpts we need to actually render it to the page. Underscores default behaviour is to display the entire content of a post on the home page. It’s easy to change this by modifying template-parts/content.php. Underscores generates the main article content with the following code:

the_content( sprintf(
	wp_kses(
		/* translators: %s: Name of current post. Only visible to screen readers */
		__( 'Continue reading<span class="screen-reader-text"> "%s"</span>', 'bokmcdok' ),
		array(
			'span' => array(
				'class' => array(),
			),
		)
	),
	get_the_title()
) );

This code is used for both single posts and the home page. We can alter it by using is_singular to only do this for pages with a single post on them:

if (is_singular()):
	the_content( sprintf(
		wp_kses(
			/* translators: %s: Name of current post. Only visible to screen readers */
			__( 'Continue reading<span class="screen-reader-text"> "%s"</span>', 'bokmcdok' ),
			array(
				'span' => array(
					'class' => array(),
				),
			)
		),
		get_the_title()
	) );
else:
	the_excerpt();
endif;

If we are listing multiple posts is_singular will return false and we will display the_excerpt instead.

All Done?

So after all that work we no longer need to worry about excerpts anymore. There are some potential flaws in the code I’ve written, but it works for my design and writing style. Someone who never uses full stops, for example, would find that this code doesn’t work for them. Also there is the assumption in the closetags function that all tags have been closed properly if the number of opened and closed tags is the same.

However as written it has been working for me thus far. Any piece of code can always be improved, but the tradeoff is always that you will need to put more time into it. This article being a sequel shows that even the original code I wrote has gone through some improvements as I’ve encountered problems so maybe there will be a 3.0 coming in the future…