 6 min
May 25, 2020

## Introduction¶

This is part two of the Markdown To HTML series. The article shows you how to extend the markdown to HTML conversion pipeline we built in part one to calculate the estimated reading time per post. The estimated reading time will then be shown in the article's heading.

But before we jump in, here is an overview about the series. It consists of three parts:

• Part 1 presents the implementation of the whole generation pipeline (link).
• Part 2 (current article) extends the implemented pipeline by a module used to compute the estimated reading time for a given article (link).
• Part 3 demonstrates how you can use the pipeline to produce RSS feeds (link).

The code used in all three parts is available on GitHub.

## Calculate the Estimated Reading Time¶

Let's start by implementing a module that calculates the estimated reading time for a given text. The question arising at the beginning is how reading time can actually be estimated. To break it down, we need to know how many words a person usually reads per minute (WPM = words per minute).

Depending on the source you consult, the average number lies between 130 and 370 WPM. My articles contain a bunch of (normally Python) source code. It usually takes more time to read source code than a normal sentence if both have the same number of words in it. That is why I go with 200 WPM. You can test it and adjust the number later accordingly, but let's go with 200 for now.

Also as a result of dealing with source code, I do not want to count each word on its own. Given that the words there are usually less long than in a normal text, we need an ordinary average word length. I am writing my articles in English and found a forum post where people wrote that the average word length in English is around 5 characters . So far, so good.

In order to calculate the estimated reading time, we need a module in which the corresponding functions can live. Let's create a new file called `blog.py` in the `services` directory. The first thing we define are two constants holding the values we discussed earlier.

``````# blog.py

WPM = 200
WORD_LENGTH = 5
``````

The next thing we implement is a function that calculates the number of words in a given text by dividing its length by the average word length.

``````def _count_words_in_text(text: str) -> int:
return len(text) // WORD_LENGTH
``````

Notice that we added a trailing underscore to the function's name to indicate that it is a private function, which should not be used directly. It is meant to be only used internally.

Sometimes, I need to use HTML-tags in my markdown files. Consequently, these tags need to be filtered and removed. Therefore, we implement a new function called `_filter_visible_text`, which does exactly that by utilizing Python's re-module.

``````import re

def _filter_visible_text(text: str) -> str:
clear_html_tags = re.compile("<.*?>")
text = re.sub(clear_html_tags, "", text)

return "".join(text.split())
``````

First, we compile a regular expression pattern matching HTML-tags into a regular expression object. Next, we replace all HTML-tags by an empty string. Lastly, we split the whole text and join the elements of the resulting list using an empty string. This results in a single string without white spaces, newline characters, tabs and HTML-tags.

Note: If you want to learn more about regular expressions in Python, make sure to consult Python's documentation  or to visit DataCamp's regular expression tutorial .

Now, we are able to implement the `estimate_reading_time()` function.

``````def estimate_reading_time(text: str) -> int:
filtered_text = _filter_visible_text(text)
total_words = _count_words_in_text(filtered_text)

``````

In the end, this function will

1. get the markdown text of an article supplied,
2. filter out its visible text,
3. calculate the number of words in it, and
4. compute the estimated reading time, which is returned afterwards.

Here you can see the whole module again:

``````# blog.py

import re

WPM = 200
WORD_LENGTH = 5

def _count_words_in_text(text: str) -> int:
return len(text) // WORD_LENGTH

def _filter_visible_text(text: str) -> str:
clear_html_tags = re.compile("<.*?>")
text = re.sub(clear_html_tags, "", text)
return "".join(text.split())

filtered_text = _filter_visible_text(text)
total_words = _count_words_in_text(filtered_text)
``````

Awesome, we implemented a module which calculates the estimated reading time for us! Let's move on and integrate it into the existing conversion pipeline.

## Integrate the Module Into the Pipeline¶

At first, we need to import the module. Hence, add the corresponding `import`-statement to the `convert.py` script.

``````# convert.py import statements
import blog
# the rest of the code
``````

Next, we need to identify which text has to be supplied to the `estimate_reading_time()` function. You may remember that we have two `with`-blocks in the `for`-loop (if you don't, have a look at it here). The first `with`-block is responsible for reading the markdown text from a file and converting it into HTML. The content of the markdown file is stored in a local variable called `content`. We invoke the `blog.estimate_reading_time()` function with the `content` variable as argument to compute the estimated reading time for the currently processed post.

``````estimated_reading_time = blog.estimate_reading_time(content)
``````

Later, we will display the estimated reading time in the heading of the corresponding post. Therefore, we need to pass it to the document's environment. We can do so by adding it as a keyword argument to the `render()` method two lines later.

``````doc = env.get_template(str(BLOG_TEMPLATE_FILE)).render(
content=html,
baseurl=BASE_URL,
url=url,
**_md.Meta,
)
``````

Great! There is only one thing left to do: Display the estimated reading time for each article in the corresponding article heading.

In order to display the estimated reading time, we need to modify the `layout.html` template. In the body of the document is only one h1-tag. We change the content of it as follows.

``````{% raw %}<h1>{{ title }} <em>({{ estimated_reading_time }} min)</em></h1>{% endraw %}
``````

The estimated reading time will now be displayed in parentheses behind the article's title (emphasised).

Note: Make sure to run the `convert.py` script from the command-line to update the articles' HTML files.

## Summary¶

Congratulations, you have made it through the whole article! While reading the article, you have learned how to calculate the estimated reading time for a given next and how to integrate it into the conversion pipeline you implemented in part one.

I hope you enjoyed reading the article. Make sure to share it with your friends and colleagues. If you have not already, consider following me on Twitter, where I am @DahlitzF or subscribing to my newsletter so you will not miss any upcoming article. Stay curious and keep coding!