Title: | Tools to Quickly and Neatly Summarize Data |
---|---|
Description: | Data frame summaries, cross-tabulations, weight-enabled frequency tables and common descriptive (univariate) statistics in concise tables available in a variety of formats (plain ASCII, Markdown and HTML). A good point-of-entry for exploring data, both for experienced and new R users. |
Authors: | Dominic Comtois [aut, cre] |
Maintainer: | Dominic Comtois <[email protected]> |
License: | GPL-2 |
Version: | 1.1.0 |
Built: | 2025-02-20 17:22:54 UTC |
Source: | https://github.com/dcomtois/summarytools |
summarytools is a collection of functions which neatly and quickly summarize numerical and categorical data. Data frame summaries, frequency tables and cross-tabulations, as well as common descriptive (univariate) statistics can be produced in a straightforward manner. Users with little to no prior R programming experience but who are familiar with popular commercial statistical software such as SAS, SPSS and Stata will feel right at home.
These are the four core functions:
Extensive yet legible data frame summaries.
Frequency tables supporting weights and displaying proportions of valid and of total data, including cumulative proportions.
All common univariate descriptive stats applied to a single vector or to all numerical vectors contained in a data frame.
Cross-tabulations for pairs of categorical variables – accepting both numerical and character vectors, as well as factors. Choose between Total, Columns or Rows proportions, and optionally display chi-square statistic (with corresponding p-value), odds ratio, as well as risk ratio with flexible confidence intervals.
Choice of output formats:
Ideal when showing results in the R console.
Perfect for writing short papers or presentations.
A format very well integrated in RStudio – but will
work with any Web browser. Use the view
function to display
results directly in RStudio's viewer, or in your preferred Web
browser.
Maintainer: Dominic Comtois [email protected]
Useful links:
Report bugs at https://github.com/dcomtois/summarytools/issues
Delete temporary files created when using generic print method with
method='browser'
or method='viewer'
, or when calling
view()
function.
cleartmp(all = TRUE, silent = FALSE, verbose = FALSE)
cleartmp(all = TRUE, silent = FALSE, verbose = FALSE)
all |
Logical. When |
silent |
Logical. Hide confirmation messages ( |
verbose |
Logical. Display a message for every file that is deleted.
|
Given that all temporary files are deleted automatically when an R session is ended, this function is an overkill in most circumstances. It could however be useful in server-type setups.
Dominic Comtois, [email protected]
Cross-tabulation for a pair of categorical variables with either row, column, or total proportions, as well as marginal sums. Works with numeric, character, as well as factor variables.
ctable( x, y, prop = st_options("ctable.prop"), useNA = "ifany", totals = st_options("ctable.totals"), style = st_options("style"), round.digits = st_options("ctable.round.digits"), justify = "right", plain.ascii = st_options("plain.ascii"), headings = st_options("headings"), display.labels = st_options("display.labels"), split.tables = Inf, na.val = st_options("na.val"), dnn = c(substitute(x), substitute(y)), chisq = FALSE, OR = FALSE, RR = FALSE, weights = NA, rescale.weights = FALSE, ... )
ctable( x, y, prop = st_options("ctable.prop"), useNA = "ifany", totals = st_options("ctable.totals"), style = st_options("style"), round.digits = st_options("ctable.round.digits"), justify = "right", plain.ascii = st_options("plain.ascii"), headings = st_options("headings"), display.labels = st_options("display.labels"), split.tables = Inf, na.val = st_options("na.val"), dnn = c(substitute(x), substitute(y)), chisq = FALSE, OR = FALSE, RR = FALSE, weights = NA, rescale.weights = FALSE, ... )
x |
First categorical variable - values will appear as row names. |
y |
Second categorical variable - values will appear as column names. |
prop |
Character. Indicates which proportions to show: “r”
(rows, default), “c” (columns), “t” (total), or “n”
(none). Default value can be changed using |
useNA |
Character. One of “ifany” (default), “no”, or
“always”. This argument is passed on ‘as is’ to
|
totals |
Logical. Show row and column totals. Defaults to
|
style |
Character. Style to be used by |
round.digits |
Numeric. Number of significant digits to keep. Defaults
to |
justify |
Character. Horizontal alignment; one of “l” (left), “c” (center), or “r” (right, default). |
plain.ascii |
Logical. Used by |
headings |
Logical. Show heading section. |
display.labels |
Logical. Display data frame label in the heading
section. |
split.tables |
Numeric. |
na.val |
Character. For factors and character vectors, consider this
value as |
dnn |
Character vector. Variable names to be used in output table. In most cases, setting this parameter is not required as the names are automatically generated. |
chisq |
Logical. Display chi-square statistic along with p-value. |
OR |
Logical or numeric. Set to |
RR |
Logical or numeric. Set to |
weights |
Numeric. Vector of weights; must have the same length as
|
rescale.weights |
Logical. When |
... |
A list containing two matrices, cross_table and
proportions. The print method takes care of assembling
figures from those matrices into a single table. The returned object is
of classes “summarytools” and “list”, unless
stby
is used, in which case we have an
object of class “stby”.
Markdown does not fully support multi-header tables; until such support is available, the recommended way to display cross-tables in .Rmd documents is to use 'method=render'. See package vignettes for examples.
Dominic Comtois, [email protected]
data("tobacco") ctable(tobacco$gender, tobacco$smoker) # Use with() to simplify syntax with(tobacco, ctable(smoker, diseased)) # Show column proportions, without totals with(tobacco, ctable(smoker, diseased, prop = "c", totals = FALSE)) # Simple 2 x 2 table with odds ratio and risk ratio with(tobacco, ctable(gender, smoker, totals = FALSE, headings = FALSE, prop = "n", OR = TRUE, RR = TRUE)) # Grouped cross-tabulations with(tobacco, stby(data = list(x = smoker, y = diseased), INDICES = gender, FUN = ctable)) ## Not run: ct <- ctable(tobacco$gender, tobacco$smoker) # Show html results in browser print(ct, method = "browser") # Save results to html file print(ct, file = "ct_gender_smoker.html") # Save results to text file print(ct, file = "ct_gender_smoker.txt") ## End(Not run)
data("tobacco") ctable(tobacco$gender, tobacco$smoker) # Use with() to simplify syntax with(tobacco, ctable(smoker, diseased)) # Show column proportions, without totals with(tobacco, ctable(smoker, diseased, prop = "c", totals = FALSE)) # Simple 2 x 2 table with odds ratio and risk ratio with(tobacco, ctable(gender, smoker, totals = FALSE, headings = FALSE, prop = "n", OR = TRUE, RR = TRUE)) # Grouped cross-tabulations with(tobacco, stby(data = list(x = smoker, y = diseased), INDICES = gender, FUN = ctable)) ## Not run: ct <- ctable(tobacco$gender, tobacco$smoker) # Show html results in browser print(ct, method = "browser") # Save results to html file print(ct, file = "ct_gender_smoker.html") # Save results to text file print(ct, file = "ct_gender_smoker.txt") ## End(Not run)
As an alternative to use_custom_lang
, this function allows
temporarily modifying the pre-defined terms in the outputs.
define_keywords(..., ask = TRUE, file = NA)
define_keywords(..., ask = TRUE, file = NA)
... |
One or more pairs of keywords and their new values see Details for the complete list of existing keywords. |
ask |
Logical. When 'TRUE' (default), a dialog box comes up to ask whether to save the edited values in a csv file for later use. |
file |
Character. Path and name of custom language file to be saved.
This comma delimited file can be reused by calling
|
On systems with GUI capabilities, a window will pop-up when calling
define_keywords()
without any parameters, allowing the modification
of the custom column. The changes will be active as long as the
package is loaded. When the edit window is closed, a dialog will pop up,
prompting the user to save the modified set of keywords in a custom csv
language file that can later be used with use_custom_lang
.
Here is the full list of modifiable keywords.
main heading for freq()
main heading for freq()
(weighted)
main heading for ctable()
main heading ctable()
(weighted)
indicates what proportions are displayed
indicates what proportions are displayed
indicates what proportions are displayed
main heading for descr()
main heading for descr()
(weighted)
main heading for dfSummary()
heading item used in descr()
heading item used in dfSummary()
heading item used in dfSummary()
heading item (all functions)
heading item (all functions) & column name in dfSummary()
heading item (all functions) & column name in dfSummary()
heading item (all functions when used with stby()
heading item for descr()
when used with stby()
heading item - descr()
& freq()
heading item for freq()
heading item - type in freq()
heading item - type in freq()
heading item - type in freq()
heading item - type in freq()
heading item - type in freq()
heading item - type in freq()
heading item - type in freq()
column name in freq()
column name in freq()
when report.nas=FALSE
column name in freq()
column name in freq()
column name in freq()
column name in freq()
column name in freq()
column name in freq()
and dfSummary()
& column content in dfSummary()
column content in dfSummary()
(emails)
column grouping in freq()
, html version
row name in descr()
row name in descr()
cell content (dfSummary)
row name in descr()
row name in descr()
- 1st quartile
row name in descr()
row name in descr()
- 3rd quartile
row name in descr()
row name in descr()
- Median Absolute Deviation
row name in descr()
- Inter-Quartile Range
row name in descr()
- Coefficient of Variation
row name in descr()
row name in descr()
- Std. Error for Skewness
row name in descr()
row name in descr()
- Count of non-missing values
row name in descr()
- pct. of non-missing values
column name in dfSummary()
- position of column in the data frame
column name in dfSummary()
column name in dfSummary()
column name in dfSummary()
column name in dfSummary()
cell content in dfSummary()
- singular form
cell content in dfSummary()
- plural form
cell content in dfSummary()
- column has only NAs
cell content in dfSummary()
- column has only empty strings
cell content in dfSummary()
- col. has only NAs and empty strings
cell content in dfSummary()
- factor has no levels defined
cell content in dfSummary()
cell content in dfSummary()
- note appearing in Stats/Values
cell content in dfSummary()
- nbr of values not displayed
cell content in dfSummary()
- When UPC codes are detected
cell content in dfSummary()
- mode = most frequent value
cell content in dfSummary()
- median (shortened term)
cell content in dfSummary()
- earliest date for date-type cols
cell content in dfSummary()
- latest date for data-type cols
cell content in dfSummary()
footnote content
footnote content
footnote - date format (see strptime
)
Setting a keyword starting with “title.” to NA or to empty string causes the main title to disappear altogether, which might be desired in some circumstances (when generating a table of contents, for instance).
## Not run: define_keywords(n = "Nb. Obs.") ## End(Not run)
## Not run: define_keywords(n = "Nb. Obs.") ## End(Not run)
Calculates mean, sd, min, Q1\*, median, Q3\*, max, MAD, IQR\*, CV, skewness\*, SE.skewness\*, and kurtosis\* on numerical vectors. (\*) Not available when using sampling weights.
descr( x, var = NULL, stats = st_options("descr.stats"), na.rm = TRUE, round.digits = st_options("round.digits"), transpose = st_options("descr.transpose"), order = "sort", style = st_options("style"), plain.ascii = st_options("plain.ascii"), justify = "r", headings = st_options("headings"), display.labels = st_options("display.labels"), split.tables = 100, weights = NULL, rescale.weights = FALSE, ... )
descr( x, var = NULL, stats = st_options("descr.stats"), na.rm = TRUE, round.digits = st_options("round.digits"), transpose = st_options("descr.transpose"), order = "sort", style = st_options("style"), plain.ascii = st_options("plain.ascii"), justify = "r", headings = st_options("headings"), display.labels = st_options("display.labels"), split.tables = 100, weights = NULL, rescale.weights = FALSE, ... )
x |
A numerical vector or a data frame. |
var |
Unquoted expression referring to a specific column in |
stats |
Character. Which stats to produce. Either “all” (default),
“fivenum”, “common” (see Details), or a selection of :
“mean”, “sd”, “min”, “q1”, “med”,
“q3”, “max”, “mad”, “iqr”, “cv”,
“skewness”, “se.skewness”, “kurtosis”,
“n.valid”, “n”, and “pct.valid”. Can be set globally
via |
na.rm |
Logical. Argument to be passed to statistical functions.
Defaults to |
round.digits |
Numeric. Number of significant digits to display.
Defaults to |
transpose |
Logical. Make variables appears as columns, and stats as
rows. Defaults to |
order |
Character. When analyzing more than one variable, this parameter determines how to order variables. Valid values are “sort” (or simply “s”), “preserve” (or “p”), or a vector containing all variable names in the desired order. Defaults to “sort”. |
style |
Character. Style to be used by |
plain.ascii |
Logical. |
justify |
Character. Alignment of numbers in cells; “l” for left, “c” for center, or “r” for right (default). Has no effect on html tables. |
headings |
Logical. Set to |
display.labels |
Logical. Show variable / data frame labels in heading
section. Defaults to |
split.tables |
Character. |
weights |
Numeric. Vector of weights having same length as x.
|
rescale.weights |
Logical. When set to |
... |
Since version 1.1, the stats argument can be set in a more flexible
way; keywords (all, common, fivenum) can be combined
with single statistics, or their “negation”. For instance, using
stats = c("all", "-q1", "-q3")
would show
all except q1 and q3.
For further customization, you could redefine any preset in the
following manner: .st_env$descr.stats$common <- c("mean", "sd", "n")
.
Use caution when modifying .st_env
, and reload the package
if errors ensue. Changes are temporary and will not persist across
R sessions.
An object having classes “matrix” and “summarytools” containing the statistics, with extra attributes useful to other functions/methods.
Dominic Comtois, [email protected]
data("exams") # All stats (default behavior) for all numerical variables descr(exams) # Show only "common" statistics, plus "n" descr(exams, stats = c("common", "n")) # Selection of statistics, transposing the results descr(exams, stats = c("mean", "sd", "min", "max"), transpose = TRUE) # Rmarkdown-ready descr(exams, plain.ascii = FALSE, style = "rmarkdown") # Grouped statistics data("tobacco") with(tobacco, stby(BMI, gender, descr, check.nas = FALSE)) # Grouped statistics in tidy table: with(tobacco, stby(BMI, age.gr, descr, stats = "common")) |> tb() ## Not run: # Show in Viewer (or browser if not in RStudio) view(descr(exams)) # Save to html file with title print(descr(exams), file = "descr_exams.html", report.title = "BMI by Age Group", footnote = "<b>Schoolyear:</b> 2018-2019<br/><b>Semester:</b> Fall") ## End(Not run)
data("exams") # All stats (default behavior) for all numerical variables descr(exams) # Show only "common" statistics, plus "n" descr(exams, stats = c("common", "n")) # Selection of statistics, transposing the results descr(exams, stats = c("mean", "sd", "min", "max"), transpose = TRUE) # Rmarkdown-ready descr(exams, plain.ascii = FALSE, style = "rmarkdown") # Grouped statistics data("tobacco") with(tobacco, stby(BMI, gender, descr, check.nas = FALSE)) # Grouped statistics in tidy table: with(tobacco, stby(BMI, age.gr, descr, stats = "common")) |> tb() ## Not run: # Show in Viewer (or browser if not in RStudio) view(descr(exams)) # Save to html file with title print(descr(exams), file = "descr_exams.html", report.title = "BMI by Age Group", footnote = "<b>Schoolyear:</b> 2018-2019<br/><b>Semester:</b> Fall") ## End(Not run)
Summary of a data frame consisting of: variable names and types, labels if any, factor levels, frequencies and/or numerical summary statistics, barplots/histograms, and valid/missing observation counts and proportions.
dfSummary( x, round.digits = 1, varnumbers = st_options("dfSummary.varnumbers"), class = st_options("dfSummary.class"), labels.col = st_options("dfSummary.labels.col"), valid.col = st_options("dfSummary.valid.col"), na.col = st_options("dfSummary.na.col"), graph.col = st_options("dfSummary.graph.col"), graph.magnif = st_options("dfSummary.graph.magnif"), style = st_options("dfSummary.style"), plain.ascii = st_options("plain.ascii"), justify = "l", na.val = st_options("na.val"), col.widths = NA, headings = st_options("headings"), display.labels = st_options("display.labels"), max.distinct.values = 10, trim.strings = FALSE, max.string.width = 25, split.cells = 40, split.tables = Inf, tmp.img.dir = st_options("tmp.img.dir"), keep.grp.vars = FALSE, silent = st_options("dfSummary.silent"), ... )
dfSummary( x, round.digits = 1, varnumbers = st_options("dfSummary.varnumbers"), class = st_options("dfSummary.class"), labels.col = st_options("dfSummary.labels.col"), valid.col = st_options("dfSummary.valid.col"), na.col = st_options("dfSummary.na.col"), graph.col = st_options("dfSummary.graph.col"), graph.magnif = st_options("dfSummary.graph.magnif"), style = st_options("dfSummary.style"), plain.ascii = st_options("plain.ascii"), justify = "l", na.val = st_options("na.val"), col.widths = NA, headings = st_options("headings"), display.labels = st_options("display.labels"), max.distinct.values = 10, trim.strings = FALSE, max.string.width = 25, split.cells = 40, split.tables = Inf, tmp.img.dir = st_options("tmp.img.dir"), keep.grp.vars = FALSE, silent = st_options("dfSummary.silent"), ... )
x |
A data frame. |
round.digits |
Number of significant digits to display. Defaults to
|
varnumbers |
Logical. Show variable numbers in the first column.
Defaults to |
class |
Logical. Show data classes in Variable column.
|
labels.col |
Logical. If |
valid.col |
Logical. Include column indicating count and proportion of
valid (non-missing) values. |
na.col |
Logical. Include column indicating count and proportion of
missing ( |
graph.col |
Logical. Display barplots/histograms column. |
graph.magnif |
Numeric. Magnification factor for graphs column. Useful
if the graphs show up too large (then use a value such as .75) or too small
(use a value such as |
style |
Character. Argument used by |
plain.ascii |
Logical. |
justify |
String indicating alignment of columns; one of “l” (left) “c” (center), or “r” (right). Defaults to “l”. |
na.val |
Character. For factors and character vectors, consider this
value as |
col.widths |
Numeric or character. Vector of column widths. If numeric,
values are assumed to be numbers of pixels. Otherwise, any CSS-supported
units can be used. |
headings |
Logical. Set to |
display.labels |
Logical. Should data frame label be displayed in the
title section? Default is |
max.distinct.values |
The maximum number of values to display frequencies for. If variable has more distinct values than this number, the remaining frequencies will be reported as a whole, along with the number of additional distinct values. Defaults to 10. |
trim.strings |
Logical; for character variables, should leading and
trailing white space be removed? Defaults to |
max.string.width |
Limits the number of characters to display in the
frequency tables. Defaults to |
split.cells |
A numeric argument passed to |
split.tables |
pander argument which determines the maximum width
of a table. Keeping the default value ( |
tmp.img.dir |
Character. Directory used to store temporary images when rendering dfSummary() with 'method = "pander"', 'plain.ascii = TRUE' and 'style = "grid"'. See Details. |
keep.grp.vars |
Logical. When using |
silent |
Logical. Hide console messages. |
... |
Additional arguments passed to |
The default value plain.ascii = TRUE
is intended to
facilitate interactive data exploration. When using the package for
reporting with rmarkdown, make sure to set this option to
FALSE
.
When trim.strings
is set to TRUE
, trimming is done
before calculating frequencies, be aware that those will
be impacted accordingly.
Specifying tmp.img.dir
allows producing results consistent with
pandoc styling while also showing png graphs. Due to the fact that
in Pandoc, column widths are determined by the length of cell contents
even if said content is merely a link to an image, using standard
R temporary directory to store the images would cause columns to be
exceedingly wide. A shorter path is needed. On Mac OS and Linux,
using “/tmp” is a sensible choice, since this directory is cleaned
up automatically on a regular basis. On Windows however, there is no such
convenient directory, so the user has to choose a directory and cleanup the
temporary images manually after the document has been rendered. Providing
a relative path such as “img”, omitting “./”, is recommended.
The maximum length for this parameter is set to 5 characters. It can be set
globally with st_options
(e.g.:
st_options(tmp.img.dir = ".")
.
It is possible to control which statistics are shown in the
Stats / Values column. For this, see the Details and
Examples sections of st_options
.
A data frame with additional class summarytools
containing as
many rows as there are columns in x
, with attributes to inform
print
method. Columns in the output data frame are:
Number indicating the order in which column appears in the data frame.
Name of the variable, along with its class(es).
Label of the variable (if applicable).
For factors, a list of their values, limited by the
max.distinct.values
parameter. For character variables, the most
common values (in descending frequency order), also limited by
max.distinct.values
. For numerical variables, common univariate
statistics (mean, std. deviation, min, med, max, IQR and CV).
For factors and character variables, the
frequencies and proportions of the values listed in the previous
column. For numerical vectors, number of distinct values, or frequency
of distinct values if their number is not greater than
max.distinct.values
.
An ASCII histogram for numerical variables, and ASCII barplot for factors and character variables.
An html encoded graph, either barplot or histogram.
Number and proportion of valid values.
Number and proportion of missing (NA and NAN) values.
Several packages provide functions for defining variable labels, summarytools being one of them. Some packages (Hmisc in particular) employ special classes for labelled objects, but summarytools doesn't use nor look for any such classes.
Dominic Comtois, [email protected]
data("tobacco") saved_x11_option <- st_options("use.x11") st_options(use.x11 = FALSE) dfSummary(tobacco) # Exclude some of the columns to reduce table width dfSummary(tobacco, varnumbers = FALSE, valid.col = FALSE) # Limit number of categories to be displayed for categorical data dfSummary(tobacco, max.distinct.values = 5, style = "grid") # Using stby() stby(tobacco, tobacco$gender, dfSummary) st_options(use.x11 = saved_x11_option) ## Not run: # Show in Viewer or browser - no capital V in view(); stview() is also # available in case of conflicts with other packages) view(dfSummary(iris)) # Rmarkdown-ready dfSummary(tobacco, style = "grid", plain.ascii = FALSE, varnumbers = FALSE, valid.col = FALSE, tmp.img.dir = "./img") # Using group_by() tobacco %>% group_by(gender) %>% dfSummary() ## End(Not run)
data("tobacco") saved_x11_option <- st_options("use.x11") st_options(use.x11 = FALSE) dfSummary(tobacco) # Exclude some of the columns to reduce table width dfSummary(tobacco, varnumbers = FALSE, valid.col = FALSE) # Limit number of categories to be displayed for categorical data dfSummary(tobacco, max.distinct.values = 5, style = "grid") # Using stby() stby(tobacco, tobacco$gender, dfSummary) st_options(use.x11 = saved_x11_option) ## Not run: # Show in Viewer or browser - no capital V in view(); stview() is also # available in case of conflicts with other packages) view(dfSummary(iris)) # Rmarkdown-ready dfSummary(tobacco, style = "grid", plain.ascii = FALSE, varnumbers = FALSE, valid.col = FALSE, tmp.img.dir = "./img") # Using group_by() tobacco %>% group_by(gender) %>% dfSummary() ## End(Not run)
Jeu de donnees simulees contenant les notes de 30 etudiants, avec les colonnes suivantes:
etudiant Nom de l'etudiant.
sexe Variable categorielle (facteur). Deux niveaux: “Fille”, “Garcon”.
francais Note en francais (numerique).
math Note en maths (numerique).
geographie Note en geographie (numerique).
histoire Note en histoire (numerique).
economie Note en economie (numerique).
anglais Note en anglais (numerique).
data(examens)
data(examens)
Un data frame de 30 rangees et 8 colonnes
Donnees simulees. Les notes de chaque etudiant sont centrees autour d'une moyenne personnelle et ecart-type randomises.
A copy of this dataset is available in English under the name “exams”.
A simulated dataset with grades for hypothetical 30 students, with the following variables:
student Student's name.
gender Factor with 2 levels: “Girl”, “Boy”.
french French Grade (numerical).
math Math Grade (numerical).
geography Geography Grade (numerical).
history History Grade (numerical).
economics Economics Grade (numerical).
english English Grade (numerical).
data(exams)
data(exams)
A data frame with 30 rows and 8 variables
All names and grades are simulated. Grades for each student are centered around a personal randomized average and standard deviation.
A copy of this dataset is also available in French under the name “examens”.
Used internally (not exported) to apply all relevant formatting. It is
documented here only because it can be used when setting the
dfSummary.custom.1
and dfSummary.custom.1
options.
format_number(x, round.digits, ...)
format_number(x, round.digits, ...)
x |
A numerical value to be formatted. |
round.digits |
Numerical. Number of decimals to show. Used to define
both |
... |
Any other formatting instruction that is compatible with
|
## Not run: format_number(IQR(column_data, na.rm = TRUE), round.digits) format_number(IQR(column_data, na.rm = TRUE), decimal.mark = ",") ## End(Not run)
## Not run: format_number(IQR(column_data, na.rm = TRUE), round.digits) format_number(IQR(column_data, na.rm = TRUE), decimal.mark = ",") ## End(Not run)
Displays weighted or unweighted frequencies, including <NA> counts and proportions.
freq( x, var = NULL, round.digits = st_options("round.digits"), order = "default", style = st_options("style"), plain.ascii = st_options("plain.ascii"), justify = "default", cumul = st_options("freq.cumul"), totals = st_options("freq.totals"), report.nas = st_options("freq.report.nas"), rows = numeric(), missing = "", na.val = st_options("na.val"), display.type = TRUE, display.labels = st_options("display.labels"), headings = st_options("headings"), weights = NA, rescale.weights = FALSE, ... )
freq( x, var = NULL, round.digits = st_options("round.digits"), order = "default", style = st_options("style"), plain.ascii = st_options("plain.ascii"), justify = "default", cumul = st_options("freq.cumul"), totals = st_options("freq.totals"), report.nas = st_options("freq.report.nas"), rows = numeric(), missing = "", na.val = st_options("na.val"), display.type = TRUE, display.labels = st_options("display.labels"), headings = st_options("headings"), weights = NA, rescale.weights = FALSE, ... )
x |
Factor, vector, or data frame. |
var |
Optional unquoted variable name. Provides support for piped
function calls (e.g. |
round.digits |
Numeric. Number of significant digits to display.
Defaults to |
order |
Character. Ordering of rows in frequency table; “name” (default for non-factors), “level” (default for factors), or “freq” (from most frequent to less frequent). To invert the order, place a minus sign before or after the word. “-freq” will thus display the items starting from the lowest in frequency to the highest, and so forth. |
style |
Character. Style to be used by |
plain.ascii |
Logical. |
justify |
String indicating alignment of columns. By default (“default”), “right” is used for text tables and “center” is used for html tables. You can force it to one of “left”, “center”, or “right”. |
cumul |
Logical. Set to |
totals |
Logical. Set to |
report.nas |
Logical. Set to |
rows |
Character or numeric vector allowing subsetting of the results. The order given here will be reflected in the resulting table. If a single string is used, it will be used as a regular expression to filter row names. |
missing |
Text to display in NA cells. Defaults to “”. |
na.val |
Character. For factors and character vectors, consider this
value as |
display.type |
Logical. Should variable type be displayed? Default is
|
display.labels |
Logical. Should variable / data frame labels be
displayed? Default is |
headings |
Logical. Set to |
weights |
Vector of weights; must be of the same length as |
rescale.weights |
Logical parameter. When set to |
... |
Additional arguments passed to |
The default plain.ascii = TRUE
option is there to make
results appear cleaner in the console. To avoid rmarkdown rendering
problems, this option is automatically set to FALSE
whenever
style = "rmarkdown"
(unless plain.ascii = TRUE
is made
explicit in the function call).
A frequency table of class matrix
and summarytools
with
added attributes used by print method.
The data type represents the class
in most cases.
Dominic Comtois, [email protected]
data(tobacco) freq(tobacco$gender) freq(tobacco$gender, totals = FALSE) # Ignore NA's, don't show totals, omit headings freq(tobacco$gender, report.nas = FALSE, totals = FALSE, headings = FALSE) # In .Rmd documents, use the two following arguments, minimally freq(tobacco$gender, style="rmarkdown", plain.ascii = FALSE) # Grouped Frequencies with(tobacco, stby(diseased, smoker, freq)) (fr_smoker_by_gender <- with(tobacco, stby(smoker, gender, freq))) # Print html Source print(fr_smoker_by_gender, method = "render", footnote = NA) # Order by frequency (+ to -) freq(tobacco$age.gr, order = "freq") # Order by frequency (- to +) freq(tobacco$age.gr, order = "-freq") # Use the 'rows' argument to display only the 10 most common items freq(tobacco$age.gr, order = "freq", rows = 1:10) ## Not run: # Display rendered html results in RStudio's Viewer # notice 'view()' is NOT written with capital V # If working outside RStudio, Web browser is used instead # A temporary file is stored in temp dir view(fr_smoker_by_gender) # Display rendered html results in default Web browser # A temporary file is stored in temp dir here too print(fr_smoker_by_gender, method = "browser") # Write results to text file (.txt, .md, .Rmd) or html file (.html) print(fr_smoker_by_gender, method = "render", file = "fr_smoker_by_gender.md) print(fr_smoker_by_gender, method = "render", file = "fr_smoker_by_gender.html) ## End(Not run)
data(tobacco) freq(tobacco$gender) freq(tobacco$gender, totals = FALSE) # Ignore NA's, don't show totals, omit headings freq(tobacco$gender, report.nas = FALSE, totals = FALSE, headings = FALSE) # In .Rmd documents, use the two following arguments, minimally freq(tobacco$gender, style="rmarkdown", plain.ascii = FALSE) # Grouped Frequencies with(tobacco, stby(diseased, smoker, freq)) (fr_smoker_by_gender <- with(tobacco, stby(smoker, gender, freq))) # Print html Source print(fr_smoker_by_gender, method = "render", footnote = NA) # Order by frequency (+ to -) freq(tobacco$age.gr, order = "freq") # Order by frequency (- to +) freq(tobacco$age.gr, order = "-freq") # Use the 'rows' argument to display only the 10 most common items freq(tobacco$age.gr, order = "freq", rows = 1:10) ## Not run: # Display rendered html results in RStudio's Viewer # notice 'view()' is NOT written with capital V # If working outside RStudio, Web browser is used instead # A temporary file is stored in temp dir view(fr_smoker_by_gender) # Display rendered html results in default Web browser # A temporary file is stored in temp dir here too print(fr_smoker_by_gender, method = "browser") # Write results to text file (.txt, .md, .Rmd) or html file (.html) print(fr_smoker_by_gender, method = "render", file = "fr_smoker_by_gender.md) print(fr_smoker_by_gender, method = "render", file = "fr_smoker_by_gender.html) ## End(Not run)
Assigns a label to a vector or data frame, or returns value stored
in the object's label
attribute (or NA
if none exists).
label(x, all = FALSE, fallback = FALSE, simplify = FALSE) label(x) <- value llabel(x, all = TRUE, fallback = FALSE, simplify = FALSE)
label(x, all = FALSE, fallback = FALSE, simplify = FALSE) label(x) <- value llabel(x, all = TRUE, fallback = FALSE, simplify = FALSE)
x |
An R object to extract labels from. |
all |
Logical. When x is a data frame, setting this argument to
|
fallback |
a logical value indicating if labels (returned values)
should fallback to object name(s). Defaults to |
simplify |
When x is a data frame and |
value |
String to be used as label. To clear existing labels, use
|
The wrapper function llabel
was named that way to avoid conflicting
with base function labels
.
A single character vector if all = FALSE
(default),
or a named list if all = TRUE
(named vector when using
simplify = TRUE
.
Loosely based on Gergely Daróczi's label
function.
Dominic Comtois, [email protected],
Displays a list comprised of summarytools objects created with
lapply
.
## S3 method for class 'list' print(x, method = "pander", file = "", append = FALSE, report.title = NA, table.classes = NA, bootstrap.css = st_options('bootstrap.css'), custom.css = st_options('custom.css'), silent = FALSE, footnote = st_options('footnote'), collapse = 0, escape.pipe = st_options('escape.pipe'), ...)
## S3 method for class 'list' print(x, method = "pander", file = "", append = FALSE, report.title = NA, table.classes = NA, bootstrap.css = st_options('bootstrap.css'), custom.css = st_options('custom.css'), silent = FALSE, footnote = st_options('footnote'), collapse = 0, escape.pipe = st_options('escape.pipe'), ...)
x |
A summarytools object, created by one of the four core
functions ( |
method |
Character. One of “pander”, “viewer”,
“browser”, or “render”. Default value for the |
file |
Character. File name to write output to. Defaults to “”. |
append |
Logical. Append output to existing file (specified using the
file argument). |
report.title |
Character. For html reports, this goes into the
|
table.classes |
Character. Additional html classes to assign to
output tables. Bootstrap css classes can be used. User-defined
classes (see the custom.css argument) are also specified here. See
details section. |
bootstrap.css |
Logical. When generating an html document,
include the “includes/stylesheets/bootstrap.min.css"” file
content inside a |
custom.css |
Character. Path to a custom .css file. Classes
defined in this must also appear in the |
silent |
Logical. Set to |
footnote |
Character. Text to display just after html output
tables. The default value (“default”) produces a two-line
footnote indicating the package's name and version, the R version, and
the current date. Has no effect on ascii or markdown
content. Can contain standard html tags. Set to |
collapse |
Numeric. |
escape.pipe |
Logical. Set to |
... |
Additional arguments used to override attributes stored in the
object, or to change formatting via |
This function is there only for cases where the object to be printed
was created with lapply
, as opposed to the recommended
functions for creating grouped results (stby
and
group_by
).
Displays a list comprised of summarytools objects created with stby
.
## S3 method for class 'stby' print(x, method = "pander", file = "", append = FALSE, report.title = NA, table.classes = NA, bootstrap.css = st_options('bootstrap.css'), custom.css = st_options('custom.css'), silent = FALSE, footnote = st_options('footnote'), escape.pipe = st_options('escape.pipe'), ...)
## S3 method for class 'stby' print(x, method = "pander", file = "", append = FALSE, report.title = NA, table.classes = NA, bootstrap.css = st_options('bootstrap.css'), custom.css = st_options('custom.css'), silent = FALSE, footnote = st_options('footnote'), escape.pipe = st_options('escape.pipe'), ...)
x |
A summarytools object, created by one of the four core
functions ( |
method |
Character. One of “pander”, “viewer”,
“browser”, or “render”. Default value for the |
file |
Character. File name to write output to. Defaults to “”. |
append |
Logical. Append output to existing file (specified using the
file argument). |
report.title |
Character. For html reports, this goes into the
|
table.classes |
Character. Additional html classes to assign to
output tables. Bootstrap css classes can be used. User-defined
classes (see the custom.css argument) are also specified here. See
details section. |
bootstrap.css |
Logical. When generating an html document,
include the “includes/stylesheets/bootstrap.min.css"” file
content inside a |
custom.css |
Character. Path to a custom .css file. Classes
defined in this must also appear in the |
silent |
Logical. Set to |
footnote |
Character. Text to display just after html output
tables. The default value (“default”) produces a two-line
footnote indicating the package's name and version, the R version, and
the current date. Has no effect on ascii or markdown
content. Can contain standard html tags. Set to |
escape.pipe |
Logical. Set to |
... |
Additional arguments used to override attributes stored in the
object, or to change formatting via |
Display summarytools
objects in the console, in Web Browser or in
RStudio's Viewer, or write content to file.
## S3 method for class 'summarytools' print(x, method = "pander", file = "", append = FALSE, report.title = NA, table.classes = NA, bootstrap.css = st_options('bootstrap.css'), custom.css = st_options('custom.css'), silent = FALSE, footnote = st_options('footnote'), max.tbl.height = Inf, collapse = 0, escape.pipe = st_options("escape.pipe"), ...)
## S3 method for class 'summarytools' print(x, method = "pander", file = "", append = FALSE, report.title = NA, table.classes = NA, bootstrap.css = st_options('bootstrap.css'), custom.css = st_options('custom.css'), silent = FALSE, footnote = st_options('footnote'), max.tbl.height = Inf, collapse = 0, escape.pipe = st_options("escape.pipe"), ...)
x |
A summarytools object, created by one of the four core
functions ( |
method |
Character. One of “pander”, “viewer”,
“browser”, or “render”. Default value for the |
file |
Character. File name to write output to. Defaults to “”. |
append |
Logical. Append output to existing file (specified using the
file argument). |
report.title |
Character. For html reports, this goes into the
|
table.classes |
Character. Additional html classes to assign to
output tables. Bootstrap css classes can be used. User-defined
classes (see the custom.css argument) are also specified here. See
details section. |
bootstrap.css |
Logical. When generating an html document,
include the “includes/stylesheets/bootstrap.min.css"” file
content inside a |
custom.css |
Character. Path to a custom .css file. Classes
defined in this must also appear in the |
silent |
Logical. Set to |
footnote |
Character. Text to display just after html output
tables. The default value (“default”) produces a two-line
footnote indicating the package's name and version, the R version, and
the current date. Has no effect on ascii or markdown
content. Can contain standard html tags. Set to |
max.tbl.height |
Numeric. Maximum table height in pixels allowed
in rendered |
collapse |
Numeric. |
escape.pipe |
Logical. Set to |
... |
Additional arguments used to override attributes stored in the
object, or to change formatting via |
Ascii
and markdown tables are generated using
pander
.
The following arguments can be used to override formatting attributes stored in the object:
style
round.digits
(except for dfSummary objects)
plain.ascii
justify
split.tables
headings
display.labels
varnumbers
(dfSummary
objects only)
labels.col
(dfSummary
objects only)
graph.col
(dfSummary
objects only)
valid.col
(dfSummary
objects only)
na.col
(dfSummary
objects only)
col.widths
(dfSummary
objects only)
keep.grp.vars
(dfSummary
objects only)
report.nas
(freq
objects only)
display.type
(freq
objects only)
missing
(freq
objects only)
The following arguments can be used to override heading elements:
Data.frame
Data.frame.label
Variable
Variable.label
Group
date
Data.type
(freq
objects only)
Row.variable
(ctable
objects only)
Col.variable
(ctable
objects only)
NULL
when method="pander"
; A file path returned
invisibly when method="viewer"
or "browser"
. In the
latter case, the file path is also passed to shell.exec
(Windows) or system
(*nix), causing
the document to be opened in default Web browser.
Dominic Comtois, [email protected]
Summarytools on GitHub List of pander options Bootstrap Cascading Stylesheets
## Not run: data(tobacco) view(dfSummary(tobacco), footnote = NA) ## End(Not run) data(exams) print(freq(exams$gender), style = 'rmarkdown') print(descr(exams), headings = FALSE)
## Not run: data(tobacco) view(dfSummary(tobacco), footnote = NA) ## End(Not run) data(exams) print(freq(exams$gender), style = 'rmarkdown') print(descr(exams), headings = FALSE)
Generate the css needed by summarytools in html documents.
st_css(main = TRUE, global = FALSE, bootstrap = FALSE, style.tag = TRUE, ...)
st_css(main = TRUE, global = FALSE, bootstrap = FALSE, style.tag = TRUE, ...)
main |
Logical. Include summarytools.css file. |
global |
Logical. Include the additional summarytools-global.css
file, which affects all content in the document. Provides control over
objects that were not html-rendered; in particular, table widths
and vertical alignment are modified to improve layout. |
bootstrap |
Logical. Include bootstrap.min.css. |
style.tag |
Logical. Include the opening and closing |
... |
Character. Path to additional css file(s) to include. |
Typically the function is called right after the initial setup chunk
of an R markdown document, in a chunk having options
echo=FALSE
and results="asis"
.
The css file(s) content silently as a character vector, and
prints (using cat()
) the content.
Dominic Comtois, [email protected]
To list all summarytools
global options, call without arguments. To
display the value of one or several options, enter the name(s) of the
option(s) in a character vector as sole argument. To reset all
options, use single unnamed argument ‘reset’ or 0
.
st_options( option = NULL, value = NULL, style = "simple", plain.ascii = TRUE, round.digits = 2, headings = TRUE, footnote = "default", display.labels = TRUE, na.val = NULL, bootstrap.css = TRUE, custom.css = NA_character_, escape.pipe = FALSE, char.split = 12, freq.cumul = TRUE, freq.totals = TRUE, freq.report.nas = TRUE, freq.ignore.threshold = 25, freq.silent = FALSE, ctable.prop = "r", ctable.totals = TRUE, ctable.round.digits = 1, ctable.silent = FALSE, descr.stats = "all", descr.transpose = FALSE, descr.silent = FALSE, dfSummary.style = "multiline", dfSummary.varnumbers = TRUE, dfSummary.class = TRUE, dfSummary.labels.col = TRUE, dfSummary.valid.col = TRUE, dfSummary.na.col = TRUE, dfSummary.graph.col = TRUE, dfSummary.graph.magnif = 1, dfSummary.silent = FALSE, dfSummary.custom.1 = expression(paste(paste0(trs("iqr"), " (", trs("cv"), ") : "), format_number(IQR(column_data, na.rm = TRUE), round.digits), " (", format_number(sd(column_data, na.rm = TRUE)/mean(column_data, na.rm = TRUE), round.digits), ")", collapse = "", sep = "")), dfSummary.custom.2 = NA, tmp.img.dir = NA_character_, subtitle.emphasis = TRUE, lang = "en", use.x11 = TRUE )
st_options( option = NULL, value = NULL, style = "simple", plain.ascii = TRUE, round.digits = 2, headings = TRUE, footnote = "default", display.labels = TRUE, na.val = NULL, bootstrap.css = TRUE, custom.css = NA_character_, escape.pipe = FALSE, char.split = 12, freq.cumul = TRUE, freq.totals = TRUE, freq.report.nas = TRUE, freq.ignore.threshold = 25, freq.silent = FALSE, ctable.prop = "r", ctable.totals = TRUE, ctable.round.digits = 1, ctable.silent = FALSE, descr.stats = "all", descr.transpose = FALSE, descr.silent = FALSE, dfSummary.style = "multiline", dfSummary.varnumbers = TRUE, dfSummary.class = TRUE, dfSummary.labels.col = TRUE, dfSummary.valid.col = TRUE, dfSummary.na.col = TRUE, dfSummary.graph.col = TRUE, dfSummary.graph.magnif = 1, dfSummary.silent = FALSE, dfSummary.custom.1 = expression(paste(paste0(trs("iqr"), " (", trs("cv"), ") : "), format_number(IQR(column_data, na.rm = TRUE), round.digits), " (", format_number(sd(column_data, na.rm = TRUE)/mean(column_data, na.rm = TRUE), round.digits), ")", collapse = "", sep = "")), dfSummary.custom.2 = NA, tmp.img.dir = NA_character_, subtitle.emphasis = TRUE, lang = "en", use.x11 = TRUE )
option |
option(s) name(s) to query (optional). Can be a single string or a vector of strings to query multiple values. |
value |
The value you wish to assign to the option specified in the
first argument. This is for backward-compatibility, as all options can now
be set via their own parameter. That is, instead of
|
style |
Character. One of “simple” (default), “rmarkdown”,
or “grid”. Does not apply to |
plain.ascii |
Logical. |
round.digits |
Numeric. Defaults to |
headings |
Logical. Set to |
footnote |
Character. When the default value “default” is used,
the package name & version, as well as the R version number are displayed
below html outputs. Set no |
display.labels |
Logical. |
na.val |
Character. For factors and character vectors, consider this
value as |
bootstrap.css |
Logical. Specifies whether to include
Bootstrap css in html reports' head section.
Defaults to |
custom.css |
Character. Path to an additional, user-provided, CSS file.
|
escape.pipe |
Logical. Set to |
char.split |
Numeric. Maximum number of characters allowed in a column
heading for |
freq.cumul |
Logical. Corresponds to the |
freq.totals |
Logical. Corresponds to the |
freq.report.nas |
Logical. Corresponds to the |
freq.ignore.threshold |
Numeric. Number of distinct values above which
numerical variables are ignored when calling |
freq.silent |
Logical. Hide console messages. |
ctable.prop |
Character. Corresponds to the |
ctable.totals |
Logical. Corresponds to the |
ctable.round.digits |
Numeric. Defaults to |
ctable.silent |
Logical. Hide console messages. |
descr.stats |
Character. Corresponds to the |
descr.transpose |
Logical. Corresponds to the |
descr.silent |
Logical. Hide console messages. |
dfSummary.style |
Character. “multiline” by default. Set to “grid” for R Markdown documents. |
dfSummary.varnumbers |
Logical. In |
dfSummary.class |
Logical. Show data classes in Name column.
|
dfSummary.labels.col |
Logical. In |
dfSummary.valid.col |
Logical. In |
dfSummary.na.col |
Logical. In |
dfSummary.graph.col |
Logical. Display barplots / histograms column in
|
dfSummary.graph.magnif |
Numeric. Magnification factor, useful if
|
dfSummary.silent |
Logical. Hide console messages. |
dfSummary.custom.1 |
Expression. First of two optional expressions
which once evaluated will populate lines 3+ of the 'Stats / Values'
cell when column data is numerical and has more distinct values than
allowed by the |
dfSummary.custom.2 |
Expression. Second the two optional expressions
which once evaluated will populate lines 3+ of the 'Stats / Values'
cell when the column data is numerical and has more distinct values than
allowed by the 'max.distinct.values' parameter. |
tmp.img.dir |
Character. Directory used to store temporary images. See
Details section of |
subtitle.emphasis |
Logical. Controls the formatting of the
“subtitle” (the data frame or variable name, depending
on context. When |
lang |
Character. A 2-letter code for the language to use in the produced outputs. Currently available languages are: ‘en’, ‘es’, ‘fr’, ‘pt’, ‘ru’, and ‘tr’. |
use.x11 |
Logical. TRUE by default. In console-only environments,
setting this to |
The dfSummary.custom.1
and dfSummary.custom.2
options
must be defined as expressions. In the expression, use the
culumn_data
variable name to refer to data. Assume the type to be
numerical (real or integer). The expression must paste together both the
labels (short name for the statistic(s) being displayed) and the
statistics themselves. Although round
can be used, a
better alternative is to call the internal format_number
,
which uses format
to apply all relevant formatting
that is active within the call to dfSummary
. For keywords
having a translated term, the trs()
internal function can be
used (see Examples).
To learn more about summarytools options, see
vignette("introduction", "summarytools")
.
# show all summarytools global options st_options() # show a specific option st_options("round.digits") # show two (or more) options st_options(c("plain.ascii", "style", "footnote")) ## Not run: # set one option st_options(plain.ascii = FALSE) # set one options, legacy way st_options("plain.ascii", FALSE) # set several options st_options(plain.ascii = FALSE, style = "rmarkdown", footnote = NA) # reset all st_options('reset') # ... or st_options(0) # Define custom dfSummary stats st_options(dfSummary.custom.1 = expression( paste( "Q1 - Q3 :", format_number( quantile(column_data, probs = .25, type = 2, names = FALSE, na.rm = TRUE), round.digits ), "-", format_number( quantile(column_data, probs = .75, type = 2, names = FALSE, na.rm = TRUE), round.digits ), collapse = "" ) )) dfSummary(iris) # Set back to default value st_options(dfSummary.custom.1 = "default") ## End(Not run)
# show all summarytools global options st_options() # show a specific option st_options("round.digits") # show two (or more) options st_options(c("plain.ascii", "style", "footnote")) ## Not run: # set one option st_options(plain.ascii = FALSE) # set one options, legacy way st_options("plain.ascii", FALSE) # set several options st_options(plain.ascii = FALSE, style = "rmarkdown", footnote = NA) # reset all st_options('reset') # ... or st_options(0) # Define custom dfSummary stats st_options(dfSummary.custom.1 = expression( paste( "Q1 - Q3 :", format_number( quantile(column_data, probs = .25, type = 2, names = FALSE, na.rm = TRUE), round.digits ), "-", format_number( quantile(column_data, probs = .75, type = 2, names = FALSE, na.rm = TRUE), round.digits ), collapse = "" ) )) dfSummary(iris) # Set back to default value st_options(dfSummary.custom.1 = "default") ## End(Not run)
An adaptation base R's by
function, designed to
optimize the results' display.
stby(data, INDICES, FUN, ..., useNA = FALSE)
stby(data, INDICES, FUN, ..., useNA = FALSE)
data |
an R object, normally a data frame, possibly a matrix. |
INDICES |
a grouping variable or a list of grouping variables,
each of length |
FUN |
a function to be applied to (usually data-frame) subsets of data. |
... |
Further arguments to FUN. |
useNA |
Make NA a valid grouping value in INDICES variable(s).
Set to |
When the grouping variable(s) contain NA values, the
base::by
function (as well as summarytools
versions prior to 1.1.0) ignores corresponding groups. Version 1.1.0
allows setting useNA = TRUE
to make new groups using
NA values on the grouping variable(s), just as
dplyr::group_by
does.
When NA values are detected and useNA = FALSE
, a message is
displayed; to disable this message, set check.nas = FALSE
.
An object of classes “list” and “summarytools”, giving results for each subset.
data("tobacco") with(tobacco, stby(data = BMI, INDICES = gender, FUN = descr, check.nas = FALSE)) with(tobacco, stby(data = smoker, INDICES = gender, freq, useNA = TRUE)) with(tobacco, stby(data = list(x = smoker, y = diseased), INDICES = gender, FUN = ctable, useNA = TRUE))
data("tobacco") with(tobacco, stby(data = BMI, INDICES = gender, FUN = descr, check.nas = FALSE)) with(tobacco, stby(data = smoker, INDICES = gender, freq, useNA = TRUE)) with(tobacco, stby(data = list(x = smoker, y = diseased), INDICES = gender, FUN = ctable, useNA = TRUE))
Jeu de donnees simulees de 1000 sujets, avec les colonnes suivantes:
sexe Variable categorielle (facteur), 2 niveaux: “F” et “M”. Environ 500 chacun.
age Numerique.
age.gr Groupe d'age - variable categorielle, 4 niveaux.
IMC Indice de masse corporelle (numerique).
fumeur Variable categorielle, 2 niveaux (“Oui” / “Non”).
cigs.par.jour Nombre de cigarettes fumees par jour (numerique).
malade Variable categorielle, 2 niveaux (“Oui” / “Non”).
maladie Champs texte.
ponderation Poids echantillonal (numerique).
data(tabagisme)
data(tabagisme)
Un data frame de 1000 rangees et 9 colonnes
Note sur la simulation des donnees: la probabilite pour un sujet de tomber dans la categorie “malade” est basee sur une fonction arbitraire faisant intervenir l'age, l'IMC et le nombre de cigarettes fumees par jour.
A copy of this dataset is available in English under the name “tobacco”.
Make a tidy dataset out of freq() or descr() outputs
tb( x, order = 1, drop.var.col = FALSE, recalculate = TRUE, fct.to.chr = FALSE, ... )
tb( x, order = 1, drop.var.col = FALSE, recalculate = TRUE, fct.to.chr = FALSE, ... )
x |
a |
order |
Integer. Useful for grouped results produced with
|
drop.var.col |
Logical. For |
recalculate |
Logical. TRUE by default. For grouped
|
fct.to.chr |
Logical. When grouped objects
are created with |
... |
For internal use only. |
stby
, which is based on and by
, initially make the first
variable vary, keeping the other(s) constant. On the other hand,
group_by
initially keeps the first grouping variable(s) constant,
making the last one vary. This will impact the ordering of the rows (and
as a result, the cumulative percent columns, if present).
Also, keep in mind that while group_by
shows NA
groups by
default, useNA = TRUE
must be used to achieve the same
results with stby
.
A tibble
which is constructed following the
tidy principles.
tb(freq(iris$Species)) tb(descr(iris, stats = "common")) data("tobacco") tb(stby(tobacco, tobacco$gender, descr, stats = "fivenum",check.nas = FALSE), order=3) tb(stby(tobacco, tobacco$gender, descr, stats = "common", useNA = TRUE)) # Compare stby() and group_by() groups' ordering tb(with(tobacco, stby(diseased, list(gender, smoker), freq, useNA = TRUE))) tobacco |> dplyr::group_by(gender, smoker) |> freq(diseased) |> tb()
tb(freq(iris$Species)) tb(descr(iris, stats = "common")) data("tobacco") tb(stby(tobacco, tobacco$gender, descr, stats = "fivenum",check.nas = FALSE), order=3) tb(stby(tobacco, tobacco$gender, descr, stats = "common", useNA = TRUE)) # Compare stby() and group_by() groups' ordering tb(with(tobacco, stby(diseased, list(gender, smoker), freq, useNA = TRUE))) tobacco |> dplyr::group_by(gender, smoker) |> freq(diseased) |> tb()
A simulated datasets of 1,000 subjects, with the following variables:
data(tobacco)
data(tobacco)
A data frame with 1000 rows and 9 variables
gender Factor with 2 levels: “F” and “M”, having roughly 500 of each.
age Numerical.
age.gr Factor with 4 age categories.
BMI Body Mass Index (numerical).
smoker Factor (“Yes” / “No”).
cigs.per.day Number of cigarettes smoked per day (numerical).
diseased Factor (“Yes” / “No”).
disease Character.
samp.wgts Sampling weights (numerical).
A note on simulation: probability for an individual to fall into category “diseased” is based on an arbitrary function involving age, BMI and number of cigarettes per day.
A copy of this dataset is also available in French under the name “tabagisme”.
Returns the object with all labels removed. The “label” attribute as well as the “labelled” class (used by Hmisc and labelled) are cleared.
unlabel(x)
unlabel(x)
x |
An R object to remove labels from. |
Dominic Comtois, [email protected],
If your language is not available or if you wish to customize the outputs' language to suit your preference, you can set up a translations file (see details) and import it with this function.
use_custom_lang(file)
use_custom_lang(file)
file |
Character. The path to the translations file. |
To build the translations file, copy the language_template.csv file located in the installed package's includes directory and fill out the ‘custom’ column using a text editor, leaving column titles unchanged. The file must also retain its UTF-8 encoding.
Visualize results in RStudio's Viewer or in Web Browser
view(x, method = "viewer", file = "", append = FALSE, report.title = NA, table.classes = NA, bootstrap.css = st_options("bootstrap.css"), custom.css = st_options("custom.css"), silent = FALSE, footnote = st_options("footnote"), max.tbl.height = Inf, collapse = 0, escape.pipe = st_options("escape.pipe"), ...)
view(x, method = "viewer", file = "", append = FALSE, report.title = NA, table.classes = NA, bootstrap.css = st_options("bootstrap.css"), custom.css = st_options("custom.css"), silent = FALSE, footnote = st_options("footnote"), max.tbl.height = Inf, collapse = 0, escape.pipe = st_options("escape.pipe"), ...)
x |
A summarytools object, created by one of the four core
functions ( |
method |
Character. One of “pander”, “viewer”,
“browser”, or “render”. Default value for the |
file |
Character. File name to write output to. Defaults to “”. |
append |
Logical. Append output to existing file (specified using the
file argument). |
report.title |
Character. For html reports, this goes into the
|
table.classes |
Character. Additional html classes to assign to
output tables. Bootstrap css classes can be used. User-defined
classes (see the custom.css argument) are also specified here. See
details section. |
bootstrap.css |
Logical. When generating an html document,
include the “includes/stylesheets/bootstrap.min.css"” file
content inside a |
custom.css |
Character. Path to a custom .css file. Classes
defined in this must also appear in the |
silent |
Logical. Set to |
footnote |
Character. Text to display just after html output
tables. The default value (“default”) produces a two-line
footnote indicating the package's name and version, the R version, and
the current date. Has no effect on ascii or markdown
content. Can contain standard html tags. Set to |
max.tbl.height |
Numeric. Maximum table height in pixels allowed
in rendered |
collapse |
Numeric. |
escape.pipe |
Logical. Set to |
... |
Additional arguments used to override attributes stored in the
object, or to change formatting via |
Creates html outputs and displays them in RStudio's viewer, in a browser, or renders the html code in R markdown documents.
For objects of class “summarytools”, this function is simply
a wrapper around print.summarytools
with
method = "viewer"
.
Objects of class “by”, “stby”, or
“list” are dispatched to the present function, as it can
manage multiple objects, whereas print.summarytools
can only
manage one object at a time.
Combination of most common “macro-level” functions that describe an object.
what.is(x, ...)
what.is(x, ...)
x |
Any object. |
... |
Included for backward-compatibility only. Has no real use. |
An alternative to calling in turn class
, typeof
,
dim
, and so on. A call to this function will readily give all
this information at once.
A list with following elements:
A data frame with the class(es), type, mode and storage mode of the object as well as the dim, length and object.size.
A named character vector giving all attributes (c.f. “names”, “row.names”, “class”, “dim”, and so forth) along with their length.
A character vector of all the
identifier functions. (starting with “is.”) that yield
TRUE
when used with x
as argument.
When x is a function, results of
ftype
are added.
Dominic Comtois, [email protected]
class
, typeof
, mode
,
storage.mode
, dim
, length
,
is.object
, otype
,
object.size
, ftype
what.is(1) what.is(NaN) what.is(iris3) what.is(print) what.is(what.is)
what.is(1) what.is(NaN) what.is(iris3) what.is(print) what.is(what.is)
Get rid of summarytools-specific attributes to get a simple data structure (matrix, array, ...), which can be easily manipulated.
zap_attr(x, except = c("dim", "dimnames"))
zap_attr(x, except = c("dim", "dimnames"))
x |
An object with attributes |
except |
Character. A vector of attribute names to preserve. By default, “dim” and “dimnames” are preserved. |
If the object contains grouped results:
The inner objects will lose their attributes
The “stby” class will be replaced with “by”
The “dim” and “dimnames” attributes will be set to
available relevant values, but expect slight differences between objects
created with stby()
vs group_by()
.
data(tobacco) descr(tobacco) |> zap_attr() freq(tobacco$gender) |> zap_attr()
data(tobacco) descr(tobacco) |> zap_attr() freq(tobacco$gender) |> zap_attr()