0. Installing the safestats package

Alexander Ly, Rosanne Turner, Udo Boehm, Rianne de Heide, Peter Grunwald

In the practical sessions we will use a development version of the safestats package. Here we provide:

  1. an installation guide, and
  2. a first example on the design of experiments with safe tests/e-variables.

1. Installing the development version of the safestats package

Make sure that you have the most recent version of R (>= R 4.3). On Windows you also need to install RTools. Both can be found on the cran website. A nice visual editor is RStudio.

There are two ways to install the development version of the safestats package:

  1. By installing the GitHub version using the remotes packages, or
  2. By installing a downloaded version of the package manually.

a. Install using the remotes package

The remotes package can be installed as follows:

install.packages("remotes")
library("remotes")

This then allows you to install and load the development version of safestats as follows:

remotes::install_github("AlexanderLyNL/safestats", ref = "088")
library(safestats)

If this is successful then please skip the next subsection and continue with the first example.

b. Install using a downloaded tar.gz file

Download the tar.gz file from this dropbox link and save it to a folder that you can easily find.

install.packages(path_to_file, repos = NULL, type="source")

2. First example: Designing a safe z-test experiment

The safestats package workflow is as follows:

# # PSEUDO CODE
# #   This won't work, but it just a way to explain the ideas
# designObj <- designSafeAnalysis(alternative="twoSided")
# result <- safeAnalysis(x=dat$x, designObj)

The design object “designObj” summarises which analysis, e.g. “designSafeZ” (z-test), “designSafeT” (t-test), or “designSafeTwoProportions” (test for two proportions), is going to be performed, - what type I error rate, i.e. alpha, is tolerated, - whether the test is directional, e.g. “twoSided”, “greater”, or “less”, - which type of e-variables is going to be used “eType”, and - which test defining parameter value is set

The design function can also be provide with a minimal clinically relevant effect size, and a targeted power, 1-beta, or equivalently a tolerable type II error, beta. In that case, the design object describes how many samples the experimenter should plan for. Due to optional stopping, the actual realised sample size will typically be smaller than what one should plan for, provided that there is a true effect equal or larger than the minimal clinical relevant effect size.

The safe analysis functions, e.g. “safeZTest”, “safeTTest” or safeTwoProportionsTest”, scrutinises the data in view of the design object at hand. In fact, the e-variable/safe test that are being discussed here can be performed after each observation and acted upon without over-inflating the chance of falsely rejecting the null hypothesis.

A first example

To check whether the package is installed correctly, we run a safe two-sample z-test.

Suppose a new educational programme was developed that claims to increase secondary school students’ IQ scores by ten points. Assume further that the IQ scores are normally distributed with a population standard deviation of 12. The alternative hypothesis, that is, the claim that this educational programme causes an increase of ten IQ score points can be visualised as follows.

xDomain <- seq(80, 150)
treatmentPopulation <- dnorm(xDomain, mean=122, sd=12)
controlPopulation <- dnorm(xDomain, mean=112, sd=12)

oldPar <- safestats::setSafeStatsPlotOptionsAndReturnOldOnes();
plot(xDomain, treatmentPopulation, type="l", lwd=2, col="red",
     ylab="Density", xlab="IQ Scores")
lines(xDomain, controlPopulation, type="l", lwd=2, col="blue")
lines(c(122, 122), c(0, 1), lty=3)
lines(c(112, 112), c(0, 1), lty=3)

We can visualise the difference score, which according to the alternative will be centred at ten.

xDomain <- seq(-20, 40)
# variance of the sum X+(-Y) is the sum of the variances
differencePopulationAlt <- dnorm(xDomain, mean=10, sd=sqrt(2)*12)
differencePopulationNull <- dnorm(xDomain, mean=0, sd=sqrt(2)*12)

oldPar <- safestats::setSafeStatsPlotOptionsAndReturnOldOnes();
plot(xDomain, differencePopulationAlt, type="l", lwd=2,
     col="black", ylab="Density", xlab="IQ Scores difference")
lines(xDomain, differencePopulationNull, lwd=2, lty=2)
lines(c(0, 0), c(0, 1), lwd=1, lty=3)
lines(c(10, 10), c(0, 1), lwd=1, lty=3)

The null hypothesis of no effect states that this difference is centred at zero instead, which is depicted as the broken curve.

For the test we are going to perform we use the following design object:

library(safestats)
sigmaTrue <- 12
designObj <- designSafeZ(meanDiffMin=10,
                         beta=0.2, sigma=sigmaTrue,
                         testType="twoSample", seed=1)
designObj

Now look at the output of this procedure (if necessary make the console window bigger). The output shows that the if we would want to test the null hypothesis of no effect sequentially, then we need to plan for about n1=n2=36 participants in the treatment and control group to detect a minimal clinically relevant mean difference of 10 with 80% power. We provide further information on the planned sample size in the “Design” R Markdown document.

One advantage of e-variables is that you can analyse the data as they come in. For simplicity let us assume, for the moment, that we can only analyse the data once, namely, after n1=n2=43 students as described in the note, then the following code can be run:

set.seed(2)
treatmentGroup <- rnorm(36, mean=122, sd=sigmaTrue)
controlGroup <- rnorm(36, mean=112, sd=sigmaTrue)

resultObj <- safeZTest(x=treatmentGroup, y=controlGroup,
                       designObj=designObj)
resultObj

The resultObj shows that we can reject the null, as the e-value is 93232, which is much larger than 1/alpha = 20, see the next R Markdown document on “Testing” for more information regarding this evidence threshold. The resultObj also shows an (anytime-valid) confidence interval (between 6.04 and 22.87) for the mean difference, which is relatively wide, but recall that the population standard deviation is 12, and it does cover the true mean difference of 10, see the R markdown document on “Anytime-valid confidence sequences” for further details.