8  Z-Test for One Proportion

Author
Affiliations

Gabriel J. Odom

Florida International University

Robert Stempel College of Public Health and Social Work

8.1 Introduction to One-Sample \(Z\)-Tests

The one-sample \(Z\)-test is used to compare a sample proportion to a population proportion.

8.2 Mathematical definition of the One-Sample \(Z\)-Test

Consider a sample of size \(n\) with binary values (such as “true” or “false”). Let \(p_{s}\) and \(p_{E}\) be the observed sample and expected (population) proportions, respectively. The formula to calculate the \(z\) statistic is

\[ z \equiv \frac{ p_s - p_E }{ \sqrt{ \frac{1}{n}p_s(1 - p_s) } }. \]

8.3 Data source and description

We will use the CTN-0094 data set, a data set of harmonized clinical trials for opioid use disorder. The full database is in public.ctn0094data::, engineered features are in public.ctn0094extra::, and clinical trial outcomes (wrangled dependent variables) are in CTNote::. We will install all three packages, but only use CTNote:: for now.

# install.packages("public.ctn0094data")
# install.packages("public.ctn0094extra")
# install.packages("CTNote")

library(CTNote)
library(tidyverse)

8.4 Cleaning the data to create a model data frame

Because our method requires only one sample, we have very little work to do. We will use the Kosten et al. (1993) definition of opioid abstinence, provided in the data set outcomesCTN0094 as the column kosten1993_isAbs.

# What do the values look like?
summary(outcomesCTN0094$kosten1993_isAbs)
   Mode   FALSE    TRUE 
logical    2158    1402 
# How many samples are there?
nrow(outcomesCTN0094)
[1] 3560

There are 3560 logical values, and TRUE indicates that the trial participant achieved abstinence according to the definition used in Kosten et al. (1993).

8.5 Assumptions of the One-Sample \(Z\)-Test

To use a one-sample \(Z\)-test, we make the following assumptions:

  1. The data are from a random sample
  2. Each observation in the data are independent
  3. Neither the sample proportion nor population proportions are “extreme”; usually we apply this method if these proportions are between 5% and 95%.
  4. The data can be described as “successes” and “failures”, and there are at least 10 samples in each category.

If these assumptions hold, then \[ z \sim N(0, 1). \]

8.6 Checking the assumptions with plots

8.6.1 Independence and Randomness

Because the samples were collected at random via an FDA approved clinical trial protocol, we assume that all the participants were randomly selected and are independent of each other.

8.6.2 “Extreme” Proportions

According to Ling et al. (2020), the 12-month abstinence proportion of all 533 participants in their study was 40.5 percent. As we can see here, our abstinence rates are 39.4. Neither these proportions are smaller than 5% or greater than 95%.

(pExpected <- 0.508 * (425/533))
[1] 0.4050657
# Count the number of TRUE values
(nAbstinent <- sum(outcomesCTN0094$kosten1993_isAbs))
[1] 1402

8.6.3 Type and Counts of Data

We observe binary data, and we see at least 10 successes and at least 10 failures.

8.7 Code to run a One-Sample \(Z\)-Test

Now that we have checked our assumptions, we can perform the one-sample \(Z\)-test for proportions.

prop.test(
  x = nAbstinent,
  n = nrow(outcomesCTN0094),
  p = pExpected
)

    1-sample proportions test with continuity correction

data:  nAbstinent out of nrow(outcomesCTN0094), null probability pExpected
X-squared = 1.8218, df = 1, p-value = 0.1771
alternative hypothesis: true p is not equal to 0.4050657
95 percent confidence interval:
 0.3777537 0.4101176
sample estimates:
        p 
0.3938202 

8.8 Brief interpretation of the output

The 95% confidence interval contains the population proportion, so we fail to reject the hypothesis that the patients from these clinical trials achieve different abstinence rates than the general population.