---
title: "Customizable Table Building with Tangram Pipe"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Customizable Table Building with Tangram Pipe}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}

---

# Overview
The goal of this package is to iteratively build a customizable data table, one 
row at a time. This package will allow a user to input a data object, specify 
the rows and columns to use for the summary table, and select the type of data
to use for each individual row. Missing data, overall statistics, and
comparison tests can be calculated using this package as well.

# Installation
```{r, eval=FALSE}
install.packages("tangram.pipe")
```

# Getting Started

## Loading supplementary packages
```{r, warning=FALSE}
suppressPackageStartupMessages(require(tangram.pipe))
suppressPackageStartupMessages(require(knitr))
suppressPackageStartupMessages(require(kableExtra))
```

# Initializing the table
The first step to using this package is to initialize the data table to create.
Here, the user will select the name of the dataset to be analyzed in the table and
specify the variable name to use for the columns. In addition, the user will need
to determine whether to account for missing data, calculate overall statistics 
across all columns, or conduct comparison tests across the columns for each row. 
The arguments for `missing`, `overall`, and `comparison` will be used as the 
defaults for each subsequent row added to the table; however, a user can specify
a different entry for each argument for individual rows if desired. Finally, 
the user can choose the default summary function to use for each type of row.

This vignette will use the built-in `iris` dataset, which is a well-known dataset
containing flower measurements for three species of iris flowers.  Since most of 
the data in `iris` is numerical, we will add in two made-up variables
(flower color and stem size) in order to demonstrate table-building
functions for non-numeric data. Note that the additional columns are made-up 
purely for demonstration of this package.

```{r, echo = FALSE}
set.seed(04082022)
```
```{r}
iris$color <- sample(c("Blue", "Purple"), size=150, replace=TRUE)
iris$Stem.Size <- sample(c("Small", "Medium", "Medium", "Large"), size=150, replace=TRUE)
iris[149,5] <- NA
iris[150,c(1:4, 6:7)] <- NA
head(iris) %>% 
  kable(escape=F, align="c") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
```

For this example, the variable 'Species' will be chosen as the column variable;
`missing` and `comparison` will be set to `FALSE` to generate a simple example.
We will also set each type of summary function to the default setting used by the
package.
```{r}
tbl1 <- tbl_start(data=iris, 
                  col_var="Species", 
                  missing=FALSE, 
                  overall=TRUE, 
                  comparison=FALSE,
                  digits = 2,
                  default_num_summary = num_default,
                  default_cat_summary = cat_default,
                  default_binary_summary = binary_default)
```

Using this function creates a list object that stores the user preferences for 
building the table going forward; in addition to the nine elements listed here, 
the number of rows is also saved to the list. Subsequent entries to the list
will store information for the rows, which will ultimately be compiled to create
the final table after all row information has been added.

`tbl_start` arguments are set to the following defaults. Aside from `data` and `col_var`,
the remaining arguments do not need to be specified if they match the following 
default values:

`missing` : FALSE

`overall` : TRUE

`comparison` : FALSE

`digits` : 2

`default_num_summary` : num_default

`default_cat_summary` : cat_default

`default_binary_summary` : binary_default

# Adding Rows
## Numeric Rows
To start off, we will first add a numeric row to the table. The function `num_row`
reads in data that is numeric in form, and by default calculates the five-number
summary statistics (minimum, first quartile, median, third quartile, maximum), as
well as the mean and standard deviation for the numeric variable within each
column.  Since we specified `overall=TRUE` in the initialization step, an overall
summary row will be included as well. The default summary function is `num_default`, 
but the user may write their own function to calculate different summary statistics
from what is shown here. 

Currently, there are five summary functions available for use within `num_row`.
The default summary to use for each row can be specified in `tbl_start`, or determined
using the `summary` argument of each row

`num_default` : Calculates the five-number summary, mean, and standard deviation

`num_minmax` : Calculates the minimum and maximum values

`num_medianiqr` : Calculates the median and interquartile range

`num_mean_sd` : Calculates the mean and standard deviation

`num_date` : Calculates the five-number summary for a date object

More information on writing your own summary functions can be found in the accompanying
package vignette "Writing User-Defined Summary Functions"

Let's start by calculating summary statistics for the Sepal Length in the `iris`
dataset. Since it makes more sense to display the variable name as "Sepal Length"
rather than the R-generated "Sepal.Length", we will use the `rowlabel` argument 
to make this change for the table. Note that if you have a dataframe with 
labelled variables as columns, leaving `rowlabel` blank will automatically
input the variable's label as the rowlabel. To output the final object, we
use the function `tbl_out` to display the table.
```{r}
tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  num_row(row_var="Sepal.Length", rowlabel="Sepal Length") %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
```

By default, each row function will use two decimal places in reported statistics.
We can use the `digits` argument to specify more or fewer significant digits in
the reported table.
```{r}
tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  num_row(row_var="Sepal.Length", rowlabel="Sepal Length", digits=4) %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
```

There is a small amount of missing data within the `iris` dataset. Currently, 
`num_row` filters out the missing data and only considers data with complete cases
of the row and column variables. To see how much missing data there is in the
sepal length, we specify `missing=TRUE`.
```{r}
tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  num_row(row_var="Sepal.Length", rowlabel="Sepal Length", missing=TRUE) %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
```

The function above tells us that the dataset is missing a sepal length measurement
for one of the virginica flowers. Note that the function cannot locate instances 
of missingness in the column variable.

Finally, suppose we want to look at the differences in means across all species.
The function `num_diff` for the `comparison` argument will calculated the mean
difference in sepal length for each row compared to a reference category, which
is coded as the first column variable in the table. Here, versicolor and virginica
will be compared to setosa. The function also provides a 95% Confidence interval
to accompany the mean difference. Currently, `num_diff` is the only built-in comparison
function for `num_row`.
```{r, eval = FALSE}
tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  num_row(row_var="Sepal.Length", rowlabel="Sepal Length", comparison=num_diff) %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
```
```{r, echo = FALSE}
tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  num_row(row_var="Sepal.Length", rowlabel="Sepal Length", comparison=num_diff) %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered")) %>%
  column_spec(c(7:9), width_min = "1.5in")
```

## Categorical Rows
Now, we will look at adding categorical variables. The function `cat_row`
reads in data that is categorical in form, and by default calculates the number
of instances for each row category within each column category, as well as the 
column-wise proportions. The default summary function is `cat_default`, 
but the user may write their own function to calculate different summary statistics
from what is shown here.

Currently, there are four summary functions available for use within `cat_row`.
The default summary to use for each row can be specified in `tbl_start`, or determined
using the `summary` argument of each row.

`cat_default` : Calculates the cell counts and column-wise proportions

`cat_pct` : Calculates the cell counts and column-wise percentages

`cat_count` : Calculates the cell counts

`cat_jama` : Calculates the column-wise percentages and cell counts divided by column
totals. This is the style used by the Journal of the American Medical Association.

We will demonstrate this function by looking at `Stem.Size` in the `iris` dataset.
Note that `cat_row` and `num_row` have nearly identical arguments, but `cat_row` 
allows you to choose the number of spaces to indent category names using the
`indent` argument. The default setting is 5 spaces.
```{r}
tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  cat_row("Stem.Size", rowlabel="Stem Size") %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
```

Setting `missing=TRUE` will reveal the proportion of each species that does not
have a corresponding entry for stem size. When missing data is accounted for, 
the missingness will be recorded as the percentage of each column that is
designated as missing data.
```{r}
tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  cat_row("Stem.Size", rowlabel="Stem Size", missing=TRUE) %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
```

We can also sort a categorical row in ascending or descending order by category
counts for a specified column. The `ordering` argument will sort the row variable,
and `sortcol` specifies which column we could like to sort our row by. Permissible
arguments for `ordering` are `c("ascending", "descending")`; by default, the row
function will sort by the overall cell counts unless a valid column category name
is inputted into `sortcol`. If an invalid category name is used, the row function 
will sort by the overall cell counts instead.
```{r}
tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  cat_row("Stem.Size", rowlabel="Stem Size (Ascending by versicolor)", missing=TRUE, 
          ordering = "ascending", sortcol = "versicolor") %>%
  cat_row("Stem.Size", rowlabel="Stem Size (Descending by overall count)", missing=TRUE, 
          ordering = "descending") %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
```

Finally, let's look at a comparison test for a categorical row. The default
comparison function is `cat_comp_default`, which will calculate the relative
entropy between each column and the reference category, as well as conduct
a Chi-Square Goodness of Fit test on the data present. Currently, 
`cat_comp_default` is the only built-in function for categorical data, but a 
user may write their own function to use instead.
```{r, eval = FALSE}
tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  cat_row("Stem.Size", rowlabel="Stem Size", comparison=cat_comp_default) %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
```
```{r, echo = FALSE}
tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  cat_row("Stem.Size", rowlabel="Stem Size", comparison=cat_comp_default) %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered")) %>%
  column_spec(c(2:6), width_min = "1.1in") %>%
  column_spec(7, width_min = "1.5in")
```

## Binary Rows
The final type of data we will examine here is binary data; this is when a variable
can only take on two possible values. In a table, it can be helpful to only include
one of the options if the second entry can be deduced from looking at the first.
This is done using the `binary_row` function. The default summary function is `binary_default`, 
but the user may write their own function to calculate different summary statistics
from what is shown here.

Currently, there are four summary functions available for use within `binary_row`.
The default summary to use for each row can be specified in `tbl_start`, or determined
using the `summary` argument of each row.

`binary_default` : Calculates the cell counts and column-wise proportions

`binary_pct` : Calculates the cell counts and column-wise percentages

`binary_count`: Calculates the cell counts

`binary_jama` : Calculates the column-wise percentages and cell counts divided by column
totals. This is the style used by the Journal of the American Medical Association.

Note that a user may use `cat_row` to process binary data if they wish
to see both row entries included in the table.

We will now demonstrate the use of `binary_row` on the color variable in `iris`.
In the dataset, the available colors are blue and purple, so we do not wish
to include both entries here.
```{r}
tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  binary_row("color", rowlabel = "Color") %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
```

The `binary_row` function includes all of the same arguments as the previous row
functions, but additionally includes three new arguments.  `reference` allows 
a user to choose which group will appear on the table. By default, the 
alphabetically first row group will appear on the table, which is why 'Blue' 
appeared above. If we want to see the statistics for purple flowers, we can
run the following code.
```{r}
tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  binary_row("color", rowlabel = "Color", reference="Purple") %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
```

Notice in the previous examples that the binary row is contained entirely within
one row. This is because many tables in professional journals will often abbreviate
binary data to fit within a single row of data. If you do not wish to do this within 
your table, you can set the additional argument `compact` to be FALSE and display 
the row information in more than one row.
```{r}
tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  binary_row("color", rowlabel = "Color", compact = FALSE) %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
```

As of package version 1.1.2, the user can now choose to remove the reference group
label from the table if they do not want it to be present. The argument `ref.label`
allows a user to toggle the name of the reference group in the table; by default,
this is set to `on`, but a user can input `off` to remove it.

Finally, let's look at some comparison functions used for binary data. By default,
this row function will calculate the difference in proportions by using `binary_diff`
if `comparison=TRUE` during initialization. This will calculate differences in
proportions across columns; the calculations will also include 95% Confidence intervals.
```{r, eval = FALSE}
tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  binary_row("color", comparison=binary_diff) %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
```
```{r, echo = FALSE}
tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  binary_row("color", comparison=binary_diff) %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered")) %>%
  column_spec(1, width_min = "1in") %>%
  column_spec(c(2:6), width_min = "1.1in") %>%
  column_spec(7, width_min = "1.75in") %>%
  column_spec(c(8:9), width_min = "1.5in")
```

The package has two additional options for comparison tests using binary data. 
Odds ratios can be calculated using `binary_or`, and risk ratios can be calculated
with `binary_rr`. Note that if `comparison=TRUE` is initialized in `tbl_start` and 
a user wants to use an odds ratio or risk ratio here, `comparison` must be set to 
either of those two options in this row addition, since excluding the argument
will lead to `binary_diff` being called by default.
```{r, eval = FALSE}
tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  binary_row("color", comparison=binary_or) %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
```
```{r, echo = FALSE}
tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  binary_row("color", comparison=binary_or) %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered")) %>%
  column_spec(1, width_min = "1in") %>%
  column_spec(c(2:7), width_min = "1.1in") %>%
  column_spec(c(8:9), width_min = "1.25in")
```

```{r, eval = FALSE}
tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  binary_row("color", comparison=binary_rr) %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
```
```{r, echo = FALSE}
tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  binary_row("color", comparison=binary_rr) %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered")) %>%
  column_spec(1, width_min = "1in") %>%
  column_spec(c(2:7), width_min = "1.1in") %>%
  column_spec(c(8:9), width_min = "1.25in")
```

## N Row

The `n_row` function will count the number of rows in your dataset, as well as the
total instances of each column variable. Note that you can decide whether or not
you want this function to include the missing data as part of your row count. For
the example below we will not include rows from missing data.

```{r}
tbl1 <- tbl_start(data=iris, col_var="Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>% 
  n_row() %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
```

## Adding an empty row
The `empty_row` function will add a blank row to the final table. This is useful
if a user wants to include blank space between some of table's rows. The user only
needs to specify the name of the list object in order to create the blank row.
An optional argument is a header to include, should the user want to create a label
for the subsequent rows that follow in the table.
```{r eval=FALSE}
tbl1 <- tbl1 %>% empty_row()
```

# Creating a Finished Product
The following code will generate a finalized table for the `iris` dataset.  It
will include all four numeric variables (sepal length, sepal width, petal length,
petal width), as well as stem size and color. The final table itself is generated
using `tbl_out`. Below is an example of a customized table report that can be produced 
using tangram.pipe. Annotations for the unique elements of the rows are created 
by inserting the comments into the header argument for the `empty_row()` command.
```{r, eval=FALSE}
tbl1 <- tbl_start( 
                  data = iris, 
                  col_var = "Species", 
                  missing=FALSE, 
                  overall=TRUE, 
                  comparison=TRUE,
                  default_num_summary = num_default,
                  default_cat_summary = cat_pct,
                  default_binary_summary = binary_default) %>%
  n_row() %>%
  num_row("Sepal.Length", rowlabel="Sepal Length") %>%
  empty_row('<i>No rowlabel, 3 decimal places</i>') %>%
  num_row("Sepal.Width", digits=3) %>%
  empty_row("<i>No comparison test used, Min-Max summary</i>") %>%
  num_row("Petal.Length", rowlabel="Petal Length", summary = num_minmax, comparison=FALSE) %>%
  empty_row("<i>Missing data considered, mean/Std. Dev summary</i>") %>%
  num_row("Petal.Width", rowlabel="Petal Width", summary = num_mean_sd, missing=TRUE) %>%
  cat_row("Stem.Size", rowlabel="Stem Size", missing=TRUE) %>%
  empty_row("<i>No rowlabels, indent 3 spaces, odds ratio as test</i>") %>%
  binary_row("color", missing = TRUE, comparison=binary_or, indent=3) %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
```
```{r, echo=FALSE}
tbl1 <- tbl_start( 
                  data = iris, 
                  col_var = "Species", 
                  missing=FALSE, 
                  overall=TRUE, 
                  comparison=TRUE,
                  default_num_summary = num_default,
                  default_cat_summary = cat_pct,
                  default_binary_summary = binary_default) %>%
  n_row() %>%
  num_row("Sepal.Length", rowlabel="Sepal Length") %>%
  empty_row('<i>No rowlabel, 3 decimal places</i>') %>%
  num_row("Sepal.Width", digits=3) %>%
  empty_row("<i>No comparison test used, Min-Max summary</i>") %>%
  num_row("Petal.Length", rowlabel="Petal Length", summary = num_minmax, comparison=FALSE) %>%
  empty_row("<i>Missing data considered, mean/Std. Dev summary</i>") %>%
  num_row("Petal.Width", rowlabel="Petal Width", summary = num_mean_sd, missing=TRUE) %>%
  cat_row("Stem.Size", rowlabel="Stem Size", missing=TRUE) %>%
  empty_row("<i>No rowlabels, indent 3 spaces, odds ratio as test</i>") %>%
  binary_row("color", missing = TRUE, comparison=binary_or, indent=3) %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered")) %>%
  column_spec(1, width_min = "1.5in") %>%
  column_spec(c(2:6), width_min = "1.3in") %>%
  column_spec(c(7:9), width_min = "1.5in")
```

# Additional Features
## Single Column of Data
The package can handle cases where a user only wants a single summary column of 
data. In the `iris` dataset, if we set the column variable to be NULL in `tbl_start`,
we can obtain just one summary column for the dataset without breaking the table
up by columns. Note that comparison functions will not run here, even if the
`comparison` argument is set to TRUE.
```{r}
tbl1 <- tbl_start(iris, NULL, missing=FALSE, overall=TRUE, comparison=FALSE) %>%
  n_row() %>%
  num_row("Sepal.Length", rowlabel="Sepal Length") %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>%
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
```

# Changing datasets within a table
This package allows for an individual row to use a different dataset from the one
initialized in `tbl_start`. Use the `newdata` argument to specify the new dataset
to use, then define the rows and columns for the new data. Note that if a new row
is added after the row with the differing dataset, the new row will automatically
return to using the initialized dataset from `tbl_start` unless the user specifies
otherwise in `newdata`.

For this example, we will split the `iris` dataset so that the sepal and petal
variables are in separate datasets, and show that the `newdata` argument can
allow the information from both datasets to be combined in one table.

```{r}
sepaldat <- iris %>% select(-c(Petal.Length, Petal.Width))
petaldat <- iris %>% select(-c(Sepal.Length, Sepal.Width))
```

```{r}
tbl1 <- tbl_start(sepaldat, "Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>%
  num_row("Sepal.Length", rowlabel="Sepal Length") %>%
  num_row("Sepal.Width", rowlabel="Sepal Width") %>%
  empty_row(header="Switch to Petal Dataset") %>% 
  num_row(row_var="Petal.Length", col_var="Species", newdata=petaldat, rowlabel="Petal Length") %>%
  num_row(row_var="Petal.Width", col_var="Species", newdata=petaldat, rowlabel="Petal Width") %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>%
  trimws %>%
  kable_styling(c("striped","bordered"))
```

Notice that in this example, the column variable for `sepaldat` was the same as 
that for `petaldat`. If the columns used had differed between the datasets, 
all columns would be included in the table, but only columns corresponding to 
the data used in the rows would have values filled in.

A common useage for the `newdata` argument is when you want to make a table
which combines summary statistics for subsets of data. Suppose we were to display
the sepal measures for the entire dataset, then show these same measurements
for two subsets of data which are determined by the petal length. Here, we
divide the dataset into two subsets; petal length > 4.3 and petal length <= 4.3.

```{r}
petal.small <- iris %>% filter(Petal.Length <= 4.3)
petal.large <- iris %>% filter(Petal.Length > 4.3)
```

```{r}
tbl1 <- tbl_start(iris, "Species", missing=FALSE, overall=TRUE, comparison=FALSE) %>%
  empty_row(header="All Data") %>%
  n_row() %>%
  num_row("Sepal.Length", rowlabel="     Sepal Length") %>%
  num_row("Sepal.Width", rowlabel="     Sepal Width") %>%
  empty_row(header="Petal Length less than 4.3") %>%
  n_row(newdata=petal.small) %>%
  num_row("Sepal.Length", rowlabel="     Sepal Length", col_var="Species", newdata=petal.small) %>%
  num_row("Sepal.Width", rowlabel="     Sepal Width", col_var="Species", newdata=petal.small) %>%
  empty_row(header="Petal Length greater than 4.3") %>%
  n_row(newdata=petal.large) %>%
  num_row("Sepal.Length", rowlabel="     Sepal Length", col_var="Species", newdata=petal.large) %>%
  num_row("Sepal.Width", rowlabel="     Sepal Width", col_var="Species", newdata=petal.large) %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"))
```

# Additional Styling with knitr & kableExtra

The `knitr` and `kableExtra` packages can be used to add styling features to the
finished tables. Captions can be added to the tables using the `caption` command,
and tables can also be rendered into a LaTeX format using the `format` argument;
both can be used in the `kable` function. `kable_styling` allows you to use the
`font_size` argument to specify how large the table text should be.

```{r, eval = FALSE}
tbl1 <- tbl_start(iris, "Species", missing=TRUE, overall=TRUE, comparison=TRUE, 
                  default_num_summary = num_minmax,
                  default_cat_summary = cat_pct,
                  default_binary_summary = binary_jama) %>%
  n_row() %>%
  num_row("Sepal.Length", rowlabel="Sepal Length") %>%
  cat_row("Stem.Size", rowlabel="Stem Size") %>%
  binary_row("color", rowlabel="Color") %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l", caption = "Example Summary table", format = "html") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"), font_size = 12)
```
```{r, echo = FALSE}
tbl1 <- tbl_start(iris, "Species", missing=TRUE, overall=TRUE, comparison=TRUE, 
                  default_num_summary = num_minmax,
                  default_cat_summary = cat_pct,
                  default_binary_summary = binary_jama) %>%
  n_row() %>%
  num_row("Sepal.Length", rowlabel="Sepal Length") %>%
  cat_row("Stem.Size", rowlabel="Stem Size") %>%
  binary_row("color", rowlabel="Color") %>%
  tbl_out()
tbl1 %>% 
  tangram_styling() %>% 
  kable(escape=F, align="l", caption = "Example Summary table", format = "html") %>% 
  trimws %>% 
  kable_styling(c("striped","bordered"), font_size = 12) %>%
  column_spec(1, width_min = "1.5in") %>%
  column_spec(c(2:6), width_min = "1.3in") %>%
  column_spec(c(7:9), width_min = "1.5in")
```

# Supplementary Material

One of the key features of this package is giving the user the flexibility to supply
custom summary and comparison functions to the package to create tables in formats
not built-in to `tangram.pipe`. The accompanying vignette "Writing User-Defined
Summary Functions" outlines the process for how to write functions that will work
well with `tangram.pipe`

# Version History

## 1.1.2 (released August 2022)

1. The `digits` parameter is now available in `tbl_start` for specifying default
digits to use throughout the table.

2. Added `ref.label` argument in binary summary functions to allow user to toggle
reference group labels in binary rows.

3. Deprecated the `print.tangram.pipe` function, as the update to `tbl_out` in
version 1.1.1 rendered this function obsolete.

## 1.1.1 (released April 2022)

1. Fixed a bug in `num_row` where column category names with spaces would not
format correctly.

2. Changed `binary_row` output to include the rowlabel along with the displayed
category when `compact = TRUE`.

3. Fixed a bug in `binary_row` where numeric row category labels would not format
correctly when `compact = TRUE`.

4. Added `ordering` and `sortcol` arguments to `cat_row`.

5. Edited categorical summary functions to utilize sorting arguments.

6. Added prewritten summary functions `num_date`, `cat_count`, and `binary_count`.

7. Edited `tbl_out` to output the finalized dataframe object (previous version
only appended the final table to the table information list).

## 1.1.0 (released October 2021)

1. Changed the `rowlabels` argument to `rowlabel`.

2. Options `overall`, `missing`, and `comparison` now have default values in `tbl_start()`.

3. Only leading white spaces are formatted to HTML form in `tangram_styling`.

4. Added `n_row()` as a row function to the table.

5. Added prewritten summary functions `num_minmax`, `num_medianiqr`, `num_mean_sd`,
`cat_pct`, `cat_jama`, `binary_pct`, `binary_jama`.

6. Added options `default_num_summary`, `default_cat_summary`, `default_binary_summary`
to `tbl_start()`. Default values are set to the default summary functions for each row.

7. Changed the `summary` argument within the row functions to automatically use
the default specified in `tbl_start()`, unless another function is supplied by the user.
The default in the function argument has changed from a function to NULL.

8. Summary functions now take on generic arguments specified by an ellipsis (...),
but still work the same as before within the row functions.

9. `binary_row()` now has the option to condense to one row (`compact`). Default is
TRUE.