Summarytools in R Markdown Documents

1. Introduction

This document mainly contains examples showing how best to use summarytools in R Markdown documents. For a more in-depth view of the package’s features, please see vignette("introduction", "summarytools") - the online version can be found here.

1.1 Methods vs Styles

Every time we display summarytools objects with print(), view(), or stview(), we pick – explicitly or not – one of several display methods. Possible display methods are: pander, render, viewer, and browser.

Disambiguation

To avoid any confusion, here is a small digression on the word method. It is a broad term which is also used in the OOP (object oriented programming) lexicon to describe a special kind of function that is linked to a specific class of objects. In R, the print() function is called a generic function. It is generic because it takes and handles many type of objects as an input. According to the class (or classes) of the object, it will dispatch the object to the particular print method dedicated to its class. In that sense, the print.summarytools() function is itself a method: objects of class “summarytools” are being dispatched to it by the generic print() function.

In this document, the term method refers to the display method – not the OOP concept. It is one of the parameters for print.summarytools(), view(), and stview(). Since methods viewer and browser are mostly meant for interactive work and rely on the same underlying code as render, we will assume for the purpose of this document that there are really only two methods: pander and render.

Only the pander Method Uses Styles

The pander method is used by default when results are automatically printed to the console, or when we use print() without an explicit method argument.

The style parameter is communicated to pander (see ?pander::pander or visit its GitHub page to learn more on this very useful package).

When we use any of the viewer, browser, or render methods, the package rather uses htmltools to generate results; any specified styles are thus ignored.

summarytools styles are pander styles

Available styles are the ones supported by pander:

  • simple (default, used mainly in R console)
  • rmarkdown (used by all core functions except dfSummary())
  • grid (mainly used with dfSummary())
  • multiline (can be used with dfSummary() if you want ASCII graphs only)
  • jira (recent addition, not thoroughly tested)

1.2 General Guidelines

Always set results=‘asis’ either explicitly on a chunk-by-chunk bases or by including opts_chunk$set(results = 'asis') in your setup chunk.

Also, don’t forget to specify plain.ascii = FALSE in all function calls using the pander method. It is advised to set this option, as well as the style option in the setup chunk:

st_options(plain.ascii = FALSE, style = "rmarkdown")

or include st_options(plain.ascii = FALSE) in your setup chunk.

If you get repeated, unhelpful warnings, use chunk options message = FALSE and/or warning = FALSE.

The following table indicates which method / style is better suited for each summarytools function in the context of R Markdown documents:

Function render method pander method pander style
freq() rmarkdown
ctable() Sub-optimal rmarkdown
descr() rmarkdown
dfSummary() grid

Recommended Style When Using pander method

For freq(), descr(), and ctable(), rmarkdown style is recommended. For dfSummary(), grid is recommended. Note that ‘multiline’ can also be used, but only ASCII graphs will be displayed.

Starting with freq(), we’ll now review the recommended methods and styles to get satisfying results in R Markdown documents.


2. Using freq() in R Markdown

freq() is best used with `style = ‘rmarkdown’; html rendering is also possible.

2.1 Pander Style for freq()

With method="pander", “rmarkdown” is the easy winner.

freq(tobacco$gender, plain.ascii = FALSE, style = 'rmarkdown')
explicit NA's detected - temporarily setting 'report.nas' to FALSE

Types and Counts, Iris Flowers

tobacco$gender
Type: Facteur

  N % % Cum.
F 489 48.90 48.90
M 489 48.90 97.80
(Missing) 22 2.20 100.00
Total 1000 100.00 100.00

2.2 HTML Rendering for freq()

There are rarely any problems when using the render method to display freq() results.

print(freq(tobacco$gender), method = 'render')
explicit NA's detected - temporarily setting 'report.nas' to FALSE

Types and Counts, Iris Flowers

tobacco$gender
Type: Facteur
gender N % % Cum.
F 489 48.90 48.90
M 489 48.90 97.80
(Missing) 22 2.20 100.00
Total 1000 100.00 100.00

If you find the table too large, you can use table.classes = 'st-small':

print(descr(tobacco), method = 'render', table.classes = 'st-small')

Statistiques descriptives

tobacco
N: 1000
BMI age cigs.per.day samp.wgts
Moy 25.73 49.60 6.78 1.00
Écart-type 4.49 18.29 11.88 0.08
Min 8.83 18.00 0.00 0.86
Q1 22.93 34.00 0.00 0.86
Médiane 25.62 50.00 0.00 1.04
Q3 28.65 66.00 11.00 1.05
Max 39.44 80.00 40.00 1.06
ÉMA 4.18 23.72 0.00 0.01
ÉIQ 5.72 32.00 11.00 0.19
CV 0.17 0.37 1.75 0.08
Asymétrie 0.02 -0.04 1.54 -1.04
ET-Asymétrie 0.08 0.08 0.08 0.08
Aplatissement 0.26 -1.26 0.90 -0.90
Nb.Valide 974 975 965 1000
Pct.Valide 97.40 97.50 96.50 100.00

Back to top

3. Using ctable() in R Markdown

3.1 Rmarkdown Style for ctable()

Tables with heading spanning over 2 rows are not fully supported in markdown (yet), but the result is getting close to acceptable. This, however, is not true for all themes. That is why the rendering method is preferred.

ctable(tobacco$gender, 
       tobacco$smoker,
       plain.ascii = FALSE, 
       style = 'rmarkdown')

Tableau croisé, proportions par rangées

gender * smoker
Data frame: tobacco

smoker Yes No Total
gender
F 147 (30.1%) 342 (69.9%) 489 (100.0%)
M 143 (29.2%) 346 (70.8%) 489 (100.0%)
(Missing) 8 (36.4%) 14 (63.6%) 22 (100.0%)
Total 298 (29.8%) 702 (70.2%) 1000 (100.0%)

3.2 HTML Rendering for ctable()

For best results, use this method.

print(ctable(tobacco$gender, tobacco$smoker), method = 'render')

Tableau croisé, proportions par rangées

gender * smoker
Data frame: tobacco
smoker
gender Yes No Total
F 147 ( 30.1% ) 342 ( 69.9% ) 489 ( 100.0% )
M 143 ( 29.2% ) 346 ( 70.8% ) 489 ( 100.0% )
(Missing) 8 ( 36.4% ) 14 ( 63.6% ) 22 ( 100.0% )
Total 298 ( 29.8% ) 702 ( 70.2% ) 1000 ( 100.0% )

Back to top

4. Using descr() in R Markdown

descr() gives good results with both style = 'rmarkdown' and HTML rendering.

4.1 Rmarkdown Style for descr()

descr(tobacco, plain.ascii = FALSE, style = 'rmarkdown')

Statistiques descriptives

tobacco
N: 1000

  BMI age cigs.per.day samp.wgts
Moy 25.73 49.60 6.78 1.00
Écart-type 4.49 18.29 11.88 0.08
Min 8.83 18.00 0.00 0.86
Q1 22.93 34.00 0.00 0.86
Médiane 25.62 50.00 0.00 1.04
Q3 28.65 66.00 11.00 1.05
Max 39.44 80.00 40.00 1.06
ÉMA 4.18 23.72 0.00 0.01
ÉIQ 5.72 32.00 11.00 0.19
CV 0.17 0.37 1.75 0.08
Asymétrie 0.02 -0.04 1.54 -1.04
ET-Asymétrie 0.08 0.08 0.08 0.08
Aplatissement 0.26 -1.26 0.90 -0.90
Nb.Valide 974.00 975.00 965.00 1000.00
Pct.Valide 97.40 97.50 96.50 100.00

4.2 HTML Rendering for descr()

We’ll use table.classes = ‘st-small’ to show how it affects the table’s size, compared to the freq() table rendered earlier.

We’ll also use message = FALSE as chunk option to avoid the message saying that non-numerical variables have been ignored.

print(descr(tobacco), method = 'render', table.classes = 'st-small')

Statistiques descriptives

tobacco
N: 1000
BMI age cigs.per.day samp.wgts
Moy 25.73 49.60 6.78 1.00
Écart-type 4.49 18.29 11.88 0.08
Min 8.83 18.00 0.00 0.86
Q1 22.93 34.00 0.00 0.86
Médiane 25.62 50.00 0.00 1.04
Q3 28.65 66.00 11.00 1.05
Max 39.44 80.00 40.00 1.06
ÉMA 4.18 23.72 0.00 0.01
ÉIQ 5.72 32.00 11.00 0.19
CV 0.17 0.37 1.75 0.08
Asymétrie 0.02 -0.04 1.54 -1.04
ET-Asymétrie 0.08 0.08 0.08 0.08
Aplatissement 0.26 -1.26 0.90 -0.90
Nb.Valide 974 975 965 1000
Pct.Valide 97.40 97.50 96.50 100.00

Back to top

5. Using dfSummary() in R Markdown

To get optimal results, whichever method you choose, it is always best to omit at least 1, and if possible 2 columns from the output. Also, pick carefully the value of the graph.magnig parameter.

5.1 Grid Style for dfSummary()

Don’t forget to specify plain.ascii = FALSE (or set it as a global option with st_options(plain.ascii = FALSE)), or you won’t get good results. (Note that to avoid problems when uploading the package, the following is an imagine, not the actual rendering from this piece of code. This is because CRAN doesn’t allow the writing in /tmp or any other directory, except in R’s temp directory, for good reasons.)

dfSummary(tobacco, 
          plain.ascii  = FALSE,
          style        = 'grid',
          graph.magnif = 0.85,
          varnumbers = FALSE,
          valid.col    = FALSE,
          tmp.img.dir  = "/tmp")

4.2 HTML Rendering for dfSummary()

This method works really well, and not having to specify the tmp.img.dir parameter is a plus.

print(dfSummary(tobacco, 
                varnumbers   = FALSE, 
                valid.col    = FALSE, 
                graph.magnif = 0.76),
      method = 'render')

Tableau-synthèse

tobacco
Dimensions: 1000 x 9
Doublons: 2
Variable Stats / valeurs Fréq. (% de valide) Diagramme Manquant
gender [factor]
1. F
2. M
3. (Missing)
489(48.9%)
489(48.9%)
22(2.2%)
0 (0.0%)
age [numeric]
Moy (é-t) : 49.6 (18.3)
min ≤ med ≤ max:
18 ≤ 50 ≤ 80
ÉIQ (CV) : 32 (0.4)
63 valeurs uniques 25 (2.5%)
age.gr [factor]
1. 18-34
2. 35-50
3. 51-70
4. 71 +
258(26.5%)
241(24.7%)
317(32.5%)
159(16.3%)
25 (2.5%)
BMI [numeric]
Moy (é-t) : 25.7 (4.5)
min ≤ med ≤ max:
8.8 ≤ 25.6 ≤ 39.4
ÉIQ (CV) : 5.7 (0.2)
974 valeurs uniques 26 (2.6%)
smoker [factor]
1. Yes
2. No
298(29.8%)
702(70.2%)
0 (0.0%)
cigs.per.day [numeric]
Moy (é-t) : 6.8 (11.9)
min ≤ med ≤ max:
0 ≤ 0 ≤ 40
ÉIQ (CV) : 11 (1.8)
37 valeurs uniques 35 (3.5%)
diseased [factor]
1. Yes
2. No
224(22.4%)
776(77.6%)
0 (0.0%)
disease [character]
1. Hypertension
2. Cancer
3. Cholesterol
4. Heart
5. Pulmonary
6. Musculoskeletal
7. Diabetes
8. Hearing
9. Digestive
10. Hypotension
[ 3 autres ]
36(16.2%)
34(15.3%)
21(9.5%)
20(9.0%)
20(9.0%)
19(8.6%)
14(6.3%)
14(6.3%)
12(5.4%)
11(5.0%)
21(9.5%)
778 (77.8%)
samp.wgts [numeric]
Moy (é-t) : 1 (0.1)
min ≤ med ≤ max:
0.9 ≤ 1 ≤ 1.1
ÉIQ (CV) : 0.2 (0.1)
0.86 !:267(26.7%)
1.04 !:249(24.9%)
1.05 !:324(32.4%)
1.06 !:160(16.0%)
! arrondi
0 (0.0%)

4.3 Managing Lengthy dfSummary() Outputs in R Markdown Documents

For data frames containing numerous variables, we can use the max.tbl.height argument to wrap the results in a scrollable window having the specified height, in pixels.

print(dfSummary(tobacco, 
                varnumbers   = FALSE,
                valid.col    = FALSE,
                graph.magnif = 0.76), 
      max.tbl.height = 300,
      method = "render")

Tableau-synthèse

tobacco
Dimensions: 1000 x 9
Doublons: 2
Variable Stats / valeurs Fréq. (% de valide) Diagramme Manquant
gender [factor]
1. F
2. M
3. (Missing)
489(48.9%)
489(48.9%)
22(2.2%)
0 (0.0%)
age [numeric]
Moy (é-t) : 49.6 (18.3)
min ≤ med ≤ max:
18 ≤ 50 ≤ 80
ÉIQ (CV) : 32 (0.4)
63 valeurs uniques 25 (2.5%)
age.gr [factor]
1. 18-34
2. 35-50
3. 51-70
4. 71 +
258(26.5%)
241(24.7%)
317(32.5%)
159(16.3%)
25 (2.5%)
BMI [numeric]
Moy (é-t) : 25.7 (4.5)
min ≤ med ≤ max:
8.8 ≤ 25.6 ≤ 39.4
ÉIQ (CV) : 5.7 (0.2)
974 valeurs uniques 26 (2.6%)
smoker [factor]
1. Yes
2. No
298(29.8%)
702(70.2%)
0 (0.0%)
cigs.per.day [numeric]
Moy (é-t) : 6.8 (11.9)
min ≤ med ≤ max:
0 ≤ 0 ≤ 40
ÉIQ (CV) : 11 (1.8)
37 valeurs uniques 35 (3.5%)
diseased [factor]
1. Yes
2. No
224(22.4%)
776(77.6%)
0 (0.0%)
disease [character]
1. Hypertension
2. Cancer
3. Cholesterol
4. Heart
5. Pulmonary
6. Musculoskeletal
7. Diabetes
8. Hearing
9. Digestive
10. Hypotension
[ 3 autres ]
36(16.2%)
34(15.3%)
21(9.5%)
20(9.0%)
20(9.0%)
19(8.6%)
14(6.3%)
14(6.3%)
12(5.4%)
11(5.0%)
21(9.5%)
778 (77.8%)
samp.wgts [numeric]
Moy (é-t) : 1 (0.1)
min ≤ med ≤ max:
0.9 ≤ 1 ≤ 1.1
ÉIQ (CV) : 0.2 (0.1)
0.86 !:267(26.7%)
1.04 !:249(24.9%)
1.05 !:324(32.4%)
1.06 !:160(16.0%)
! arrondi
0 (0.0%)
Some users reported getting repeated X11 warnings; those can easily be avoided by using the following chunk expression: {r, results="asis", warning=FALSE}.

Back to top


5. Using Other Formatting Packages

As explained in the introductory vignette, tb() can be used to convert summarytools objects created with freq() and descr() to simple tibbles that packages specialized in table formatting will be able to process. This is particularly helpful with stby objects:

library(kableExtra)
library(magrittr)
stby(iris, iris$Species, descr, stats = "fivenum") %>%
  tb() %>%
  kable(format = "html", digits = 2) %>%
  collapse_rows(columns = 1, valign = "top")
Species variable min q1 med q3 max
setosa Petal.Length 1.0 1.4 1.50 1.6 1.9
Petal.Width 0.1 0.2 0.20 0.3 0.6
Sepal.Length 4.3 4.8 5.00 5.2 5.8
Sepal.Width 2.3 3.2 3.40 3.7 4.4
versicolor Petal.Length 3.0 4.0 4.35 4.6 5.1
Petal.Width 1.0 1.2 1.30 1.5 1.8
Sepal.Length 4.9 5.6 5.90 6.3 7.0
Sepal.Width 2.0 2.5 2.80 3.0 3.4
virginica Petal.Length 4.5 5.1 5.55 5.9 6.9
Petal.Width 1.4 1.8 2.00 2.3 2.5
Sepal.Length 4.9 6.2 6.50 6.9 7.9
Sepal.Width 2.2 2.8 3.00 3.2 3.8

Using tb(order = 3) flips the order of the grouping variable(s) and the reported variable(s):

stby(iris, iris$Species, descr, stats = "fivenum") %>%
  tb(order = 3) %>%
  kable(format = "html", digits = 2) %>%
  collapse_rows(columns = 1, valign = "top")
variable Species min q1 med q3 max
Petal.Length setosa 1.0 1.4 1.50 1.6 1.9
versicolor 3.0 4.0 4.35 4.6 5.1
virginica 4.5 5.1 5.55 5.9 6.9
Petal.Width setosa 0.1 0.2 0.20 0.3 0.6
versicolor 1.0 1.2 1.30 1.5 1.8
virginica 1.4 1.8 2.00 2.3 2.5
Sepal.Length setosa 4.3 4.8 5.00 5.2 5.8
versicolor 4.9 5.6 5.90 6.3 7.0
virginica 4.9 6.2 6.50 6.9 7.9
Sepal.Width setosa 2.3 3.2 3.40 3.7 4.4
versicolor 2.0 2.5 2.80 3.0 3.4
virginica 2.2 2.8 3.00 3.2 3.8

Back to top


6. Including dfSummaries in PDF Documents

Here is a recipe for including fully formatted data frame summaries in pdf documents. There is some work involved, but following the instructions given here should give the expected results.

There are basically two parts to this: first, you must create a preamble tex file. Second, you must indicate in the YAML section of your document where to find this file.

Included Preamble Tex File

This is the content that needs to be included as preamble. You can either copy this into your own tex file, or use the file that is now included in summarytools (as of version 1.0), following the instructions provided below.

\usepackage{graphicx}
\usepackage[export]{adjustbox}
\usepackage{letltxmacro}
\LetLtxMacro{\OldIncludegraphics}{\includegraphics}
\renewcommand{\includegraphics}[2][]{\raisebox{0.5\height}%
  {\OldIncludegraphics[valign=t,#1]{#2}}}

If you choose to create a tex file from the above content, the name of the file is arbitrary – you can use whatever name you want. Its location is also up to you. I suggest you put it in the same location as your Rmd file.

Along with the graph.magnif parameter for dfSummary(), you might need to adjust the 0.5 value used as raisebox parameter in the preamble.

The YAML Section

Your document should start with a YAML header like this one, supposing the preamble tex file is in the same location as your Rmd document :

---
title: "My PDF With Data Frame Summaries"
output: 
  pdf_document: 
    latex_engine: xelatex
    includes:
      in_header: 
      - !expr system.file("includes/fig-valign.tex", package = "summarytools")
---

If you need to customize the content of the preamble, then your header will look something like this:

---
title: "My PDF With Data Frame Summaries"
output: 
  pdf_document: 
    latex_engine: xelatex
    includes:
      in_header: fig-valign-modified.tex
---
The xelatex engine option is not mandatory, but there are several advantages to it. I use it systematically and recommend you do the same.

R Code

Here is an example setup chunk:

```{r, message=FALSE}  
library(summarytools)
st_options(
  plain.ascii = FALSE, 
  style = "rmarkdown",
  dfSummary.style = "grid",
  dfSummary.valid.col = FALSE,
  dfSummary.graph.magnif = .52,
  subtitle.emphasis = FALSE,
  tmp.img.dir = "/tmp"
)
```

And here is a chunk actually creating the summary:

```{r, results='asis', message=FALSE}  
define_keywords(title.dfSummary = "Data Frame Summary in PDF Format")
dfSummary(tobacco)
```

Remarks

Since we redefined the $\LaTeX$ command includegraphics, all images included using [](some-image.png) will be impacted. In some cases this will likely be problematic. Eventually we will find a more robust solution without such undesired side-effects. If you are well versed in $\LaTeX$ and think you can solve this problem, please get in touch.


7. This Vignette’s Setup

This vignette uses theme rmarkdown::html_vignette. Its YAML section looks like this:

---
title: "Summarytools in R Markdown Documents"
author: "Dominic Comtois"
date: "2024-11-04"
output: 
  html_document:
    fig_caption: false
    toc: true
    toc_depth: 1
    css: assets/vignette.css
vignette: >
  %\VignetteIndexEntry{Summarytools in R Markdown Documents}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
  %\VignetteDepends{magrittr}
  %\VignetteDepends{kableExtra}
---


The vignette.css file is copied from the installed rmarkdown package’s ‘templates/html_vignette/resources’ directory.

Global Options

The following global options for knitr and summarytools have been set. Other options might also be useful to optimize content, but this is a good place to start from.

```{r setup, include=FALSE}
library(knitr)
opts_chunk$set(comment=NA, 
               prompt=FALSE,
               cache=FALSE,
               echo=TRUE,
               results='asis')

st_options(bootstrap.css     = FALSE,       # Already part of the theme 
           plain.ascii       = FALSE,       # Essential setting for Rmd
           style             = "rmarkdown", # Essential setting for Rmd
           dfSummary.silent  = TRUE,        # Hides redundant messages 
           footnote          = NA,          # Keeping the results minimal
           subtitle.emphasis = FALSE)       # For the vignette theme,
                                            # this gives better results. 
                                            # For other themes, using
                                            # TRUE might be preferable.
```

Finally, summarytools CSS has been included in the following manner:

```{r, echo=FALSE}
st_css(main = TRUE, global = TRUE)
```

8. Final Notes

This is by no way a definitive guide; depending on the themes you use, you could find that other settings yield better results. If you are looking to create a Word or a PDF document, you might want to try different combinations of options. If you find problems with the recommended settings or if you find better combinations, you are welcome to open an issue on GitHub to suggest modifications or make a pull request with your own improvements to this vignette.

Back to top