├── Pearson_chi2_Wald_z_test_proportions.md
├── README.md
├── Testing hypotheses about proportions using logistic regression.md
├── logistic_regression_AME_Wald_z_test_proportions.md
└── logistic_regression_Rao_Wald_z_test_proportions.md
/Pearson_chi2_Wald_z_test_proportions.md:
--------------------------------------------------------------------------------
1 | # Proving the equivalence between the 2-sample Wald’s z-statistic for comparing proportions with pooled variances and the Pearson’s χ2 (independence) test for a 2×2 contingency table.
2 | #### Adrian Olszewski, 27.03.2025
3 |
4 | The Wald’s z-statistic for difference in 2 proportions with unpooled variances is of the following form:
5 |
6 | ``` math
7 | \begin{equation}
8 | z=\ \frac{\widehat{p_B}-\widehat{p_A}}{\sqrt{p\left(1-p\right)\left(\frac{1}{n_A}+\frac{1}{n_B}\right)}};\ z~N(0,\ 1)
9 | \hspace{2cm} (1)
10 | \end{equation}
11 | ```
12 | Where $\widehat{p_A}\$ stands for the estimated probability (sample proportion, %) in the 1st group,
13 | $\widehat{p_B}$ is the estimated probability in the 2nd group, $n_A$ and $n_B$ denote respective group sizes, and p is the pooled probability $p=\frac{x_A+x_B}{n_A+n_B}=\frac{p_An_A+p_Bn_B}{n_A+n_B}$.
14 |
15 | Traditionally the Wald’s statistic is expressed in the squared form, becoming: $z^2=\chi_{df=1}^2$.
16 | Both forms yield the same p-value. For convenience I will show that this $\chi_{df=1}^2$ statistic is 1:1 equivalent to the $χ^2$ (independence) test for a 2×2 contingency table.
17 |
18 | Let’s define the form of the 2×2 contingency table with the observed (O) counts:
19 | | Status
Group | Success
(S) | Failure
(F) | Total per group
(ni·) |
20 | |-----------------------------------------:|:----------------:|:---------------:|:------------------------:|
21 | | **A** | $n_{11} = O_{SA}$ | $n_{12} = O_{FA}$ | $n_A = n_{1·}= n_{11} + n_{12}$ |
22 | | **B** | $n_{21} = O_{SB}$ | $n_{22} = O_{FB}$ | $n_B = n_{2·}= n_{21} + n_{22}$ |
23 | | **Total per status (n·i)** | $n_{·1} = n_{11} + n_{21}$ | $n_{·2} = n_{12} + n_{22}$ | $n (=n_A + n_B)$ |
24 |
25 | Where $O_{SA}$ stands for “observed number of successes in group A”, $O_{FA}$ stands for “observed number of failures in group A”, and so on.
26 | The test statistic is defined as follows:
27 |
28 | ``` math
29 | \begin{equation}
30 | X^2=\sum_{r=1}^{2}\sum_{c=1}^{2}\frac{\left(O_{rc}-E_{rc}\right)^2}{E_{rc}};\ X^2~\chi^2(df=1)
31 | \hspace{2cm} (2)
32 | \end{equation}
33 | ```
34 |
35 | Let’s express the observed and expected number of events as products of totals and probabilities, remembering that the expected number (E) is calculated under $H_0$, i.e. using pooled probability p:
36 |
37 | ``` math
38 | \begin{equation}
39 | O = \begin{cases}
40 | O_{11}=O_{SA}=n_Ap_A \\
41 | O_{12}=O_{FA}=n_A\left(1-p_A\right) \\
42 | O_{21}=O_{SB}=n_Bp_B \\
43 | {O_{22}=O}_{FB}=n_B\left(1-p_B\right)
44 | \end{cases}
45 | \hspace{2cm} (3)
46 | \end{equation}
47 | ```
48 | and
49 | ``` math
50 | \begin{equation}
51 | E = \begin{cases}
52 | E_{11}=E_{SA}=n_Ap \\
53 | E_{12}=E_{FA}=n_A\left(1-p\right) \\
54 | E_{21}=E_{SB}=n_Bp \\
55 | {E_{22}=E}_{FB}=n_B\left(1-p\right)
56 | \end{cases}
57 | \hspace{2cm} (4)
58 | \end{equation}
59 | ```
60 |
61 | Let’s substitute the O and E elements in the $X^2$ test statistic:
62 |
63 | ``` math
64 | \begin{equation}
65 | \begin{aligned}
66 | X^2&=\frac{\left(n_Ap_A-n_Ap\right)^2}{n_Ap}+\frac{\left(n_A\left(1-p_A\right)-n_A\left(1-p\right)\right)^2}{n_A\left(1-p\right)}+\frac{\left(n_Bp_B-n_Bp\right)^2}{n_Bp}+\frac{\left(n_B\left(1-p_B\right)-n_B\left(1-p\right)\right)^2}{n_B\left(1-p\right)} \\
67 | &=\frac{\left(n_A\left(p_A-p\right)\right)^2}{n_Ap}+\frac{\left(n_A\left(p-p_A\right)\right)^2}{n_A\left(1-p\right)}+\frac{\left(n_B\left(p_B-p\right)\right)^2}{n_Bp}+\frac{\left(n_B\left(p-p_B\right)\right)^2}{n_B\left(1-p\right)} \\
68 | &=\frac{{n_A\left(p_A-p\right)}^2}{p}+\frac{{n_A\left(p-p_A\right)}^2}{\left(1-p\right)}+\frac{{n_B\left(p_B-p\right)}^2}{p}+\frac{{n_B\left(p-p_B\right)}^2}{\left(1-p\right)} \\
69 | &=\frac{\left(1-p\right)n_A\left(p_A-p\right)^2+pn_A\left(p-p_A\right)^2}{p\left(1-p\right)}+\frac{{{\left(1-p\right)n}_B\left(p_B-p\right)}^2+pn_B\left(p-p_B\right)^2}{p\left(1-p\right)} \\
70 | &=\frac{n_A\left(p_A-p\right)^2\left(1-p+p\right)}{p\left(1-p\right)}+\frac{n_B\left(p_B-p\right)^2\left(1-p+p\right)}{p\left(1-p\right)}=\frac{n_A\left(p_A-p\right)^2}{p\left(1-p\right)}+\frac{n_B\left(p_B-p\right)^2}{p\left(1-p\right)} \\
71 | &=\frac{n_A\left(p_A-\frac{p_An_A+p_Bn_B}{n_A+n_B}\right)^2+n_B\left(p_B-\frac{p_An_A+p_Bn_B}{n_A+n_B}\right)^2}{p\left(1-p\right)} \\
72 | &=\frac{n_A\left(\frac{p_An_B-p_Bn_B}{n_A+n_B}\right)^2+n_B\left(\frac{p_Bn_A-p_An_A}{n_A+n_B}\right)^2}{p\left(1-p\right)}=\frac{n_A\left(\frac{n_B\left(p_A-p_B\right)}{n_A+n_B}\right)^2+n_B\left(\frac{n_A\left(p_B-p_A\right)}{n_A+n_B}\right)^2}{p\left(1-p\right)} \\
73 | &=\frac{\frac{n_An_B^2\left(p_A-p_B\right)^2}{\left(n_A+n_B\right)^2}+\frac{n_Bn_A^2\left(p_A-p_B\right)^2}{\left(n_A+n_B\right)^2}}{p\left(1-p\right)}=\frac{\frac{\left(p_A-p_B\right)^2\left(n_An_B^2+n_Bn_A^2\right)}{\left(n_A+n_B\right)^2}}{p\left(1-p\right)}=\frac{\frac{{{(p}_A-p_B)}^2\left(n_An_B\right)\left(n_B+n_A\right)}{\left(n_A+n_B\right)^2}}{p\left(1-p\right)} \\
74 | \end{aligned}
75 | \hspace{2cm} (5a)
76 | \end{equation}
77 | ```
78 |
79 | ``` math
80 | \begin{equation}
81 | \begin{aligned}
82 | &=\frac{\frac{{{n_An_B(p}_A-p_B)}^2}{n_A+n_B}}{p\left(1-p\right)}=\frac{{{(p}_A-p_B)}^2}{p\left(1-p\right)\frac{n_A+n_B}{n_An_B}}=\frac{{{(p}_A-p_B)}^2}{p\left(1-p\right)\left(\frac{1}{n_A}+\frac{1}{n_B}\right)}\blacksquare
83 | \end{aligned}
84 | \hspace{2cm} (5b)
85 | \end{equation}
86 | ```
87 | This way I have proven that the z2 statistic is equivalent to the Pearson’s χ2 statistic for 2×2 table.
88 |
89 | -----
90 | By the way, it is worth noticing, that in the χ2 test, the expected frequencies (E) in each column are based on the pooled proportion [p] which is just the weighted average proportion across both groups.
91 | So, in other words, this test evaluates how much each group’s proportion $p_A$ and $p_B$ deviates from this overall average (p). And this effectively comparing the two groups directly, which is what the z-test actually does.
92 |
93 | -----
94 |
95 | ``` r
96 | > (m <- matrix(c(16, 2, 12, 11),
97 | + nrow = 2, ncol = 2,
98 | + dimnames=list(c("A", "B"),
99 | + c("Success", "Failure"))))
100 | Success Failure
101 | A 16 12
102 | B 2 11
103 | >
104 | > prop.test(m, correct = FALSE)
105 |
106 | 2-sample test for equality of proportions without continuity correction
107 |
108 | data: m
109 | X-squared = 6.2859, df = 1, p-value = 0.01217
110 | alternative hypothesis: two.sided
111 | 95 percent confidence interval:
112 | 0.1491317 0.6860332
113 | sample estimates:
114 | prop 1 prop 2
115 | 0.5714286 0.1538462
116 |
117 | > chisq.test(m, correct = FALSE)
118 |
119 | Pearson's Chi-squared test
120 |
121 | data: m
122 | X-squared = 6.2859, df = 1, p-value = 0.01217
123 | ```
124 |
125 | 
126 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Logistic-regression-is-regression
2 | Despite the popular yet wrong claim that logistic regression "is not a regression", it's one of the key regression and hypothesis testing tools in epxerimental research (like clinical trials). I will share information from my field to break the bizzare situation, when people from Machine Learning tell me that "what we do every day cannot be done".
3 |
4 | # Logistic regression has been a regression since its birth - and is used this way every day
5 |
6 | Well, it's kinda... embarrassing for me to write about something that is (should be) obvious to anyone working with statistics but in the last decade has been distorted by hundreds of thousands of members of the #machinelearning community, so today lie replaced the truth...
7 |
8 | I remember the first time, when, during some discussion, I said that "I've been using logistic regression for long years on daily basis for
9 | regression and testing hypotheses, but I've never used it for classification" and a Data Scientist (with PhD degree) told me, that I
10 | must have been mistaken, because "*despite its name logistic regression not a regression algorithm*". I asked him "*then tell me, please, what
11 | do I do every day at work?*😉" he replied "*I have no idea, but this sounds a pure nonsense, because logistic regression predicts only two
12 | binary outcomes so you understand it cannot be a regression*" 🤦
13 |
14 | I remember the first time, when, during some discussion, I said that "I've been using logistic regression for long years on daily basis for regression and testing hypotheses, but I've never used it for classification" and a Data Scientist (with PhD degree) told me, that I must have been mistaken, because "despite its name logistic regression not a regression algorithm". I asked him "then tell me, please, what do I do every day at work?😉" he replied "I have no idea, but this sounds a pure nonsense, because logistic regression predicts only two binary outcomes so you understand it cannot be a regression" 🤦
15 |
16 |
17 |
18 |
19 |
20 |
In such moments I wish there existed a triple facepalm too...
21 |
22 |
23 | Earlier, people (mostly researchers, statisticians) already were
24 | reporting that a similar situation happened to them during interviews
25 | and internet discussions. I did small research, which results knocked me
26 | off my feet. I "googled" for terms like "*logistic regression is not
27 | (a) regression*", "*logistic regression is a misnomer*" or
28 | "*logistic regression, despite its name*". The number of findings was
29 | huge - they occurred everywhere: in articles (Medium, Quora), tutorials
30 | (also issued by companies offering paid courses), blogs, courses, books
31 | (including bestsellers in ML written by people holding PhD), YouTube
32 | videos. I also repeated the search on LinkedIn and found endless flood
33 | of posts repeating this nonsense just copy-pasted from other posts.
34 |
35 | Not only that! I asked Chat GPT 3 (then 3.5) and got identical results.
36 | No surprise! It was "fed" by garbage, so it learned garbage, and today
37 | it helps spreading garbage to learners, who usually don't even suspect
38 | something is wrong, so they trust AI and repeat the nonsense further and
39 | further and\...
40 |
41 | \... there is no single week on LinkedIn without someone repeating it,
42 | earning hundreds of 👍 proving that hundreds of people liked (so tens of
43 | thousands saw it) it and will likely repeat the same.
44 |
45 |
46 |
47 |
48 |
49 | Finally I decided to write a few words about this "issue". I write
50 | from the perspective of a clinical biostatistician, working in clinical
51 | trials - part of the pharmaceutical industry responsible for both
52 | existing and new therapies (drugs, procedures, devices) evaluation and
53 | approval. Here the logistic regression is the key regression algorithm,
54 | used to answer questions about treatment efficacy and safety based on
55 | the data from clinical trials with binary endpoints (success/failure).
56 | And yes, that's true - I have never used it for classification during
57 | the whole time of my professional work.
58 |
59 | # Birth of the logistic regression and the\... Nobel Prize
60 |
61 | The origins of the logistic function can be traced back to the 19th
62 | century ([[free
63 | PDF]](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=360300)),
64 | where it was employed in a "*model of population growth*". Early
65 | attempts (1930s) to model binary data in the regression manner resulted
66 | in ***probit regression*** model, which constituted a standard for the
67 | next few decades. Researchers found the outcome not much intuitive, so
68 | they searched for a regression model, which coefficients would be easier
69 | to interpret. In already 1944 **Joseph Berkson** started working on the
70 | alternative to the probit model, and the "logit" (by analogy to
71 | "probit") model was born. Unfortunately, the logit model was rejected
72 | by many as *inferior to the probit model*. It took long years, until the
73 | logit model gained similar "trust" (1960-1970), finally refined by
74 | **Sir David Cox** ("Some procedures connected with the logistic
75 | qualitative response curve", 1966 and "[*[The regression analysis of
76 | binary sequences]*](https://www.jstor.org/stable/2983890)",
77 | 1968).
78 |
79 | / BTW, check also other titles of this Great Mind of Statistics!
80 | [[https://www.nuff.ox.ac.uk/Users/Cox/Publications.html]](https://www.nuff.ox.ac.uk/Users/Cox/Publications.html)
81 | /
82 |
83 | Almost in parallel with the *multinomial logit model* (Cox, Theil),
84 | which, finally, in 1973, allowed **Daniel McFadden**, a famous
85 | econometrician, to piece together existing puzzles, including the
86 | **Duncan Luce**'s *choice axiom*, into a whole, which resulted in a
87 | theoretical foundation for the ***logistic regression***. At that time,
88 | McFadden was deeply involved in pioneering work in developing the
89 | theoretical basis for *discrete choice* where he applied the logistic
90 | regression for empirical analysis. His work, making a profound impact on
91 | the analysis of discrete choice problems in economics and other fields,
92 | gave him the Nobel Prize in 2000.
93 |
94 | I think we can fairly say that Daniel McFadden's work on the logistic
95 | (ordinary and multinomial) regression model and the discrete choice
96 | analysis was truly groundbreaking. It played a significant role in
97 | establishing logistic regression as a solid tool in statistical
98 | analysis, not only in econometrics!
99 |
100 | Remember the rejection of the logit model, found inferior to the probit
101 | one? Now the situation reversed, and logistic regression today is the
102 | default approach.
103 |
104 | 1970s were truly fruitful to logistic regression! In 1972, **Sir John
105 | Nelder** and **Robert Weddeburn** in their seminal work ([[free
106 | PDF]](http://www.medicine.mcgill.ca/epidemiology/hanley/bios601/Likelihood/NelderWedderburn1972.pdf))
107 | introduced the idea of a unified framework: the ***Generalized Linear
108 | Model (GLM)***, enabling regression models to cope with response
109 | variables of any type (counts, categories, continuous), relaxing also
110 | the assumption of normal distribution of errors.
111 |
112 | / Logistic regression is a special case of the GLM. You can spot it
113 | easily when working with R statistical package, when you call the glm()
114 | and specify the family of conditional distribution of the response, here
115 | "binomial" with appropriate link, here "logit": glm(family =
116 | binomial(link = "logit")) /
117 |
118 | Just a decade later, two other big names you know for sure, **Trevor
119 | Hastie** and **Robert Tibshirani** extended the Generalized Linear Model
120 | (logistic regression is a special case of it) to the **Generalized
121 | Additive Model**. In their articles (e.g. "Generalized Additive Models
122 | for Medical Research",
123 | [[https://doi.org/10.1177/096228029500400]](https://doi.org/10.1177/096228029500400)
124 | ) they mention the role of logistic regression in identification and
125 | adjustment for prognostic factors in clinical trials and observational
126 | studies.
127 |
128 | / Did you know that Professor Hastie authored the glm() command in the
129 | S-PLUS statistical suite (the father of R, origin of the R syntax),
130 | estimating the coefficients of the regression models and performing the
131 | inference? /
132 |
133 | Additional extensions for handling repeated observations were made by
134 | **Kung-Yee Liang** and **Scott L. Zeger** in 1986 via *Generalized
135 | Estimating Equations* (GEE) and **Breslow, Clayton and others** around
136 | 1993, when the theory of *Generalized Linear Mixed Models* (GLMM) was
137 | born.
138 |
139 | **I can only imagine McFadden's and others' reaction to the nonsense
140 | "logistic regression is not a regression"\...**
141 |
142 |
143 |
144 |
145 |
146 | # Conditional expectation - the key to understand the GLM
147 |
148 | Every regression describes a relationship between the predictor and some
149 | function of the conditional response. It can be a quantile,
150 | Qith(Y\|x=x), as in the quantile regression. Or some trimmed estimator
151 | of the expected value, like in the robust regression. Or - the expected
152 | value of the conditional response (=*conditional expectation*) itself,
153 | like in the classic linear regression: **E(Y\|X=x)**.
154 |
155 | / so often confused with one of the estimation algorithms \--\> "OLS
156 | regression" - don't repeat that. /
157 |
158 | Now, it's all about the conditional distribution. If it's Gaussian
159 | (normal distribution), you obtain the linear regression. But the GLM
160 | allows you to use also other distributions: Bernoulli (or binomial),
161 | gamma, Poisson, negative binomial, etc. The problem is that then the
162 | conditional expectations are not linearly related with the predictor,
163 | which is something we really want. That's why we have the **link
164 | function**, linking the conditional expectation and the predictor for a
165 | given conditional distribution: **g(E(Y\|X=x)) = Xb** (sometimes you
166 | will see this formula reversed: E(Y\|X=x) = g-1(Xb). It's equivalent
167 | formulation).
168 |
169 | Now, the expected values are "linearized" with respect to the
170 | predictor. For the ordinary linear regression you don't need that, so
171 | the g() is just I() (identity function, which we omit) - the expected
172 | values lay on a straight line, plane, or hyperplane (depending on how
173 | many predictors you have).
174 |
175 | / The name, *conditional expectation*, is also perfectly visible when
176 | you do ANOVA. That's just 1:1, perfect example: the levels of
177 | categorical predictor(s) "form" sub-distributions, and mean is
178 | calculated in each. Now you also understand what it means: "**expected
179 | value CONDITIONAL to the predictor**"! /
180 |
181 | Below we can observe various conditional distributions and their means.
182 | The means lay on a straight line transformed by the g() function, **the
183 | link**.
184 |
185 | / OK, I know, the illustration isn't perfect, simplifications are made,
186 | but let's agree on its imperfection, as long as it shows the main idea,
187 | huh? /
188 |
189 |
190 |
191 |
beta regression isn't strictly a GLM, but an extension of it
192 |
193 |
194 | BTW: This is well explained in the book I recommend you to read:
195 |
196 |
197 |
198 |
Peter H. Westfall, Andrea L. Arias, Understanding Regression Analysis A Conditional Distribution Approach
199 |
200 |
201 | Now, let's answer a few questions:
202 |
203 | 1. **Is expected value numerical or categorical?** Of course it's
204 | numerical. It's just "average". So you instantly see that
205 | logistic regression CANNOT predict categorical (binary) outcome
206 | itself. Whatever you've been told - it cannot, and it does not.
207 | Period.
208 |
209 | 2. **What kind of conditional distribution does the logistic regression
210 | use?** It uses the Bernoulli's distribution of a single-trial
211 | random variable, taking value 1 (for success) with probability p and
212 | the value 0 (for failure) with probability 1−p.
213 |
214 | 3. **What is the expected value for the Bernoulli's distribution?**
215 | It's "p" - the probability of success.
216 |
217 | 4. **So the E(Y\|X=x) is numerical?** Yes, probabilities are numerical.
218 |
219 | 5. **Why "Bernoulli" if statistical manuals say "binomial"?**
220 | Bernoulli is binomial with k=1. Just a general term.
221 |
222 | **I hope you can see from this, that logistic regression, as any other
223 | regression, predicts a numerical outcome, NOT categorical.**
224 |
225 | # How is logistic regression turned into a classifier?
226 |
227 | The outcome from the logistic regression, the conditional probability
228 | (therefore logistic regression is called also a "*direct probability
229 | estimator*") subjected to a conditional rule IF-THEN-ELSE , which
230 | compares it against some threshold (usually 0.5, but this shouldn't be
231 | taken as granted!) and returns the category:
232 |
233 | **IF (p \< 0.5) THEN A ELSE B**
234 |
235 |
236 |
237 |
238 |
239 | - *Wait, but this is NOT a regression! This USES the regression prediction instead!*
240 |
241 | **Glad you spotted it!**
242 |
243 | Too often people do not and just repeat that "logistic regression
244 | predicts binary outcome". And when I tell them "*but what about the
245 | regression term in it, which means that it should predict a numerical
246 | value?*", they respond "***Oh! It's a misnomer! Despite its name,
247 | logistic regression isn't a regression because it it doesn't predict
248 | numerical outcome!*".**
249 |
250 | In other words, they do something like this:
251 |
252 |
253 |
254 |
255 |
256 | \... making a direct jump from binary input to binary output:
257 |
258 |
259 |
260 |
261 |
262 | But notice, they did NOT CHANGE THE NAME, accordingly. Instead of
263 | calling it "Logistic Classifier", the ML community left the name
264 | "Logistic Regression". We could say they "appropriated the logistic
265 | regression".
266 |
267 | Consequently, **they have problems with justifying the existing name**,
268 | so they try:
269 |
270 | - „*Oh! This name is a „misnomer*"
271 |
272 | - „*It's called regression because the equation has similar form to
273 | linear regression*"
274 |
275 | - "*It must be said that, despite its name, it's not a regression*"
276 |
277 | Isn't this just crazy?
278 |
279 | 1. Statisticians invent a regression algorithm (that is, to solve
280 | regression problems) and use this way for more than half a century
281 |
282 | 2. ML specialists find it useful for classification
283 |
284 | 3. ML specialists treat it as a classifier
285 |
286 | 4. ML specialists do NOT assign appropriate name to the classifier,
287 | leaving the "regression" in it
288 |
289 | 5. ML specialists deny that logistic regression is a regression
290 |
291 | 6. ML specialists correct statisticians that "LR is a misnomer and
292 | despite its name LR is not a regression"
293 |
294 | Despite numerous regression-related problems, where the logistic
295 | regression is used every day, the situation looks like below:
296 |
297 |
298 |
299 |
300 |
301 | So once in a lifetime, let's recall what is the difference between
302 | **logistic regression** and **logistic classifier**:
303 |
304 |
305 |
306 |
307 |
308 | # Maybe... But everyone uses logistic regression for classification!
309 |
310 | Ah, argumentum ad populum ;\]
311 |
312 | OK then:
313 |
314 | - First thing: not "**_everyone_**\*. I understand that ML is a hot topic today (and is here to stay) but it does NOT mean everything revolves around and _nothing else matters_. There are other areas of science and data analysis too. Yeah, really.
315 |
316 | - The fact, that gazillions of people use logistic regression for
317 | classification purposes doesn't remove it's regression nature.
318 | It's just one application. It's like using a hammer for cooling
319 | your forehead - you can call it a "forehead cooler", but it
320 | doesn't change the fact it's still a hammer.
321 |
322 | - You should add "\... in Machine Learning". Outside ML, the
323 | applications are much richer.
324 |
325 | - Believing that there's nothing beyond Machine Learning in this
326 | world doesn't change\... the world. And the reality is that
327 | **experimental research** (including clinical research, physical and
328 | chemical experiments, sociological and psychological studies,
329 | quality assessments), where the regression tools are applied to
330 | binary (and n-ary) endpoints on daily basis, is still the essential
331 | part of science. So no, it's not true that "everyone uses logistic
332 | regression for classification". You should be more specific and
333 | add: "*Everyone in Machine Learning*".
334 |
335 | So while I can understand someone saying that "*in ML, logistic
336 | regression is a classification algorithm*", I cannot agree that
337 | "*logistic regression is not a regression*". A single specific
338 | application, employing also additional steps, and producing a different
339 | (categorized) output does not invalidate the "core" engine.
340 |
341 | The fact that a tomato can be used to cook a soup (involving many steps)
342 | does not mean that "tomato is not a fruit - it is a misnomer, because
343 | tomato is a soup ingredient". It's that simple.
344 |
345 | # Regression-related applications of the logistic regression (and its friends)
346 |
347 | Multiple times I mentioned that logistic regression is used by me and
348 | other statisticians to non-classification, regression tasks. Believe me,
349 | there is NO difference from any other regression!
350 |
351 | - **Assessmen**t = direction and magnitude **of the impact of
352 | predictors on the response** expressed as: log-odds, odds-ratios, or
353 | probability (via estimated marginal means or marginal effects \--\>
354 | for non-identity links)
355 |
356 |
357 |
358 |
359 |
360 | - For categorical predictors: **inference about the main effects**
361 | (=ANOVA), optionally adjusted for numerical covariates (=ANCOVA);
362 | exploration of their **interactions**
363 |
364 | - For categorical predictors: **inference about the simple effects of
365 | interest**, analysed via planned or ad hoc contrasts; optionally
366 | adjusted for numerical covariates
367 |
368 |
369 |
370 |
371 |
372 | - For numerical or ordinal categorical predictors: **testing for
373 | trends** (linear, quadratic, cubic, higher) in proportions.
374 | **Comparisons of trends** between groups.
375 |
376 | - **Replicating the classic statistical tests**: of proportions,
377 | odd-ratios and stochastic superiority (Wald's and Rao z test, chi2,
378 | Cochran-Armitage, Breslow-Day, Cochran-Mantel-Haenszel, McNemar,
379 | Cochran Q, Friedman, Mann-Whitney (-Wilcoxon))
380 |
381 | - **Extending the above tests** for multiple variables and their
382 | interactions, and numerical covariates. Just check the illustration
383 | below and then [[visit my GitHub for several
384 | examples]](https://github.com/adrianolszewski/Logistic-regression-is-regression/blob/main/Testing%20hypotheses%20about%20proportions%20using%20logistic%20regression.md):
385 |
386 |
387 |
388 |
logistic regression and friends can replicate lots of classic tests!
389 |
390 |
391 | - Bonus: the model-based approach ([[check my GitHub for some thoughts
392 | and
393 | notes]](https://github.com/adrianolszewski/model-based-testing-hypotheses/blob/main/README.md))
394 | allows one to employ advanced parametric adjustment for multiple
395 | comparisons via multivariate t distribution, adjust numerical
396 | covariates, employ time-varying covariates, account for repeated and
397 | clustered observations and more!
398 |
399 | - Direct probability estimator used to implement the **inverse
400 | probability weighting** (IPW) and **propensity score matching**
401 | algorithms
402 |
403 | - Logistic regression is very useful in the **assessment of the
404 | Missing-Completely-At-Random (MCAR) pattern** when exploring the
405 | missing observations!
406 |
407 | In my field, clinical trials, I use the logistic regression for:
408 |
409 | - the **assessment of between-arm treatment effect** via comparison of
410 | the log-odds or the % of clinical success at certain timepoints
411 |
412 | - **non-inferiority, equivalence or superiority testing** *(employs
413 | clinical significance)* at selected timepoints via appropriately
414 | defined confidence intervals of difference between %s (average
415 | marginal effect)
416 |
417 | - the **assessment of the impact of predictors on the clinical
418 | success** + provide covariate-adjusted EM-means for their main
419 | effects, interactions and finally their appropriate contrasts
420 |
421 | - the **exploration of interactions** (simple effects), making the
422 | essential part of my daily work
423 |
424 | - **analysing the over-time within-arm trends of % of successes**,
425 | e.g. to assess the treatment or some practice persistence.
426 |
427 |
428 |
429 |
430 |
431 | # Friends of the logistic regression
432 |
433 | Logistic regression has many friends that were invented to address
434 | various problems related to regression. Let us enumerate them and
435 | briefly describe:
436 |
437 | - **Binary Logistic Regression -** that's our binomial regression
438 | with logit link, a special case of the Generalized Linear Model,
439 | modelling the % of successes.
440 |
441 | - **Multinomial Logistic Regression** (MLR) - helpful, when we deal
442 | with a response consisting of multiple unordered classes (e.g.
443 | colours).
444 |
445 | - **Nested MLR** - will help us when the classes are "organized" in
446 | groups, related in a hierarchy - thus nested. Imagine that a person
447 | chooses a mean of transport between **air** {plane} and **road**
448 | {car, train, bus}. When the road transportation is chosen, then the
449 | further decision is made only between the three alternatives. It's
450 | similar to multi-level models, where the error terms may present
451 | some correlation within the same nest, whereas uncorrelated between
452 | nests. Thank you, **McFadden**, also for this one! [[Read more
453 | here]](https://cran.r-project.org/web/packages/nestedLogit/vignettes/nestedLogit.html)
454 | and [[here (Applied Microeconometrix with
455 | R)]](https://discdown.org/microeconometrics/index.html)
456 | (or just "google" for more).
457 |
458 | - **Ordinal LR** (aka Proportional Odds Model) - allows you to deal
459 | with 2+ ordered classes, {horrible \< poor \< average \< good \<
460 | excellent} or {slow \< average \< fast}, {small \< medium \< big}
461 | and so on. This is the default method of **analysing responses from
462 | pools and questionnaires** (including Likert items), e.g. . Did you
463 | know, that the OLR is related with the *Mann-Whitney (-Wilcoxon)
464 | test*? Use it, if you need a flexible non-parametric test, that: a)
465 | handles multiple categorical variables, b) adjusts for numerical
466 | covariates (like ANCOVA). Don't hesitate to use it with NUMERICAL
467 | variables! Yes, you can always do this, the same way you employ
468 | rank-based methods (e.g. *Conover's AN\[C\]OVA*). Read also the
469 | articles by **Prof. Harrell**, namely: [[Resources for Ordinal
470 | Regression Models]](https://www.fharrell.com/post/rpo/),
471 | [[Equivalence of Wilcoxon Statistic and Proportional Odds
472 | Model]](https://www.fharrell.com/post/powilcoxon/), [[If
473 | You Like the Wilcoxon Test You Must Like the Proportional Odds
474 | Model]](https://www.fharrell.com/post/wpo/), and more.
475 |
476 | - **Generalized OLR** - aka Partial Proportional Odds Model is used
477 | when the proportionality of odds doesn't hold. (PS: read
478 | [[Violation of Proportional Odds is Not
479 | Fatal]](https://www.fharrell.com/post/po/))
480 |
481 | - **Logistic Quantile Regression** - application similar to the
482 | above - performs logistic quantile regression for bounded responses,
483 | like percentages (0-1), school grades, visual analog scales and so
484 | on. Check [[this
485 | article]](https://www.amarun.org/images/amarun/materiales/articulos/Galaraza_11_2020.pdf)
486 | and manuals for
487 | [[Stata]](https://journals.sagepub.com/doi/pdf/10.1177/1536867X1101100301)
488 | and [[R
489 | (lqr)]](https://cran.r-project.org/web/packages/lqr/).
490 |
491 | - **Conditional Logistic Regression - helpful** when we deal with
492 | stratification and matching groups of data, e.g. in observational
493 | studies without randomization, to match subjects by some
494 | characteristics and create homogenous "baseline". It can be used
495 | to reproduce the *Cochran-Mantel-Haenszel test* (via
496 | *clogit(\...strata)* in R)
497 |
498 | - The **binary logistic regression and its multinomial LR and ordinal
499 | LR friends can account for dependent responses** (repeated
500 | measurements, clustered observations) through the *Generalized
501 | Estimating Equations* (GEE) semi-parametric estimation and the
502 | *Generalized Linear Mixed Models* (GLMM). No surprise that logistic
503 | regression is one of the core regression models for longitudinal
504 | clinical trials with binary endpoints. And no, we do NOT classify
505 | there anything ;\]
506 |
507 | - **Alternating Logistic Regression** - it's a quite rare (and
508 | forgotten) model, suitable if we deal with correlated observations,
509 | e.g. when we analyse repeated or clustered data. I mentioned already
510 | two methods: the mixed-effect LR, and GEE LR. The Alternating LR is
511 | the 3rd option, which models the dependency between pairs of
512 | observations by using log odds ratios instead of correlations (which
513 | is done by GEE). It handles ordinal responses too. There were some
514 | past implementations in R, but now they are removed from CRAN. [[SAS
515 | supports it as part of PROC
516 | GEE]](https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_gee_details06.htm).
517 |
518 |
519 |
520 |
Logistic regression has many friends
521 |
522 |
523 | # Literature
524 |
525 | I will populate this chapter with textual references later. For now,
526 | find the "collage" of covers. And believe, neither of these books will
527 | say that "logistic regression is not a regression" :)
528 |
529 |
530 |
531 |
NEITHER of these great books will give you nonsenses like "...is not a regression"
532 |
533 |
534 | \+ recently found an excellent one:
535 |
536 |
537 |
538 |
Norman Matloff, Statistical Regression and Classification From Linear Models to Machine Learning
539 |
540 |
541 | Other authors also prove it can be done properly:
542 |
543 |
544 |
545 |
Brett Lantz, Machine Learning with R: Learn How to Use R to Apply Powerful Machine Learning Methods and Gain and Insight into Real-world Applications
546 |
547 |
548 | ## ad hoc comments from my readers
549 |
550 | - *Q: "Adrian, but in their book, Hastie and Tibshirani put the
551 | logistic regression in the »classification« chapter!"*
552 |
553 | A: Of course they did! It's a book about machine learning, so this kind
554 | of *application* is of interest and highly expectable. **BUT they've
555 | never said it's not a regression model. ** They both wrote also a
556 | series of articles on the application of the proportional hazard models
557 | and the logistic regression in biostatistical (they worked in the
558 | division of biostatistics) applications in the regression manner
559 | (assessment of the prognostic factors, assessment of the treatment
560 | effect) and call it a regression model.
561 |
562 |
563 |
564 |
565 |
566 | Also in the book you mention, on page 121-122 + the following examples
567 | they say: "*Logistic regression models are used mostly as a data
568 | analysis and inference tool, where the goal is to understand the role of
569 | the input variables in explaining the outcome. Typically many models are
570 | fit in a search for a parsimonious model involving a subset of the
571 | variables, possibly with some interactions terms.*"
572 |
573 | - *Q: You said that Prof. Hastie authored the glm() function in S. Any
574 | source?*
575 |
576 | *A:*
577 | Here (just for instance):
578 |
579 |
580 |
581 |
582 |
583 | - *Q: ChatGPT 3.5 says that logistic regression is not a regression!*
584 |
585 | *A:* **ChatGPT will repeat what was trained on**. Don't rely on it
586 | strictly when you are learning a new topic, because what you will be
587 | told strongly depends on how you will ask. It was trained on mixed good
588 | and bad resources, so sometimes the *valid one* is "allowed to speak"
589 | but just a few questions later it may be messing again. This pertains to
590 | ANY kind of topic, not only in statistics. **DO ALWAYS verify the
591 | responses from any AI-based system if you are going to learn from it,
592 | pass your exams or an interview, or do your job.**
593 |
594 | PS: I was told that the newest version of ChatGPT is much better, so
595 | give it a try.
596 |
597 |
598 |
599 |
ChatGPT 3.5 in action
600 |
601 |
--------------------------------------------------------------------------------
/Testing hypotheses about proportions using logistic regression.md:
--------------------------------------------------------------------------------
1 | Despite the widespread and nonsensical claim, that "logistic regression is not a regression", it constitutes one of the key regression and hypothesis testing tools used in the experimental research (like clinical trials).
2 |
3 | Let me show you how the logistic regression (with a few extensions) can be used to test hypotheses about fractions (%) of successes, repacling the classic "test for proportions".
4 | Namely, it can replicate the results of:
5 |
6 | 1. [the Wald's (normal approximation) **z test for 2 proportions with non-pooled standard errors**](#wald_2prop_z) (common in clinical trials). Testing is done via EM-means (estimated-marginal means) on the probability scale or the AME (average marginal effect) over the binary predictor distinguishing the two compared groups.
7 | 2. [the Rao's score (normal appr.) **z test for 2 proportions with pooled standard errors**](#rao_2prop_z) (just what the `prop.test()` does in R)
8 | 3. the **chi2 test for multiple (2+) proportions**
9 | 4. **ANOVA-like** (joint) test for multiple caterogical predictors (n-way ANOVA). Also (n-way) ANCOVA if you employ numerical covariates.
10 | 5. [the **Cochran-Mantel-Haenszel (CMH) for stratified/matched data**](#cmh) via _conditional logistic regression_
11 | 7. [the **Breslow-Day test for odds ratios**](#breslow-day) through Rao's ANOVA --> the interaction term
12 | 8. [the **Cochran-Armitage test for trend in ordered proportions**](#armitage-trend)
13 | 9. [the **McNemar and Cochran Q** test of paired proportions](#mcnemar) via GEE estimation (Generalized Estimating Equations with compound symmetry)
14 | 10. [the **Friedman test**](#mcnemar) - as above
15 | 11. [the **Mann-Whitney-Wilcoxon and Kruskal-Wallis**](#mww) via Ordinal Logistic Regression (and paired Wilcoxon via GEE)
16 |
17 | Actually, the model-based approach to testing hypotheses is not anything new, and lots of other tests can be replicated with the general linear model via Ordinal Least Square (OLS) and Generalized Least Square (GLS) estimation, generalized linear models (GLM) via both Maximum-Likelihood estimation (MLE) and semi-parametric Generalized Estimating Equations (GEE). Let's add to this also the conditional approach via Mixed-Effect models (both general and generalized). And let's not forget about the Quantile Regression (with mixed effects), robust regression models, survival models (Cox, AFT, Andersen-Gill, frailty models) and dozens of others!
18 |
19 | All those models, followed by the _Likelihood Ratio_ (LR), _Rao (score)_, or _Wald's_ approach to testing of model coefficients, especially combined with the idea of _estimated marginal means_ (EM-means) will give you an incredibly flexible testing framework.
20 |
21 | This time we will look at the Logistic Regression, part of the Generalized Linear Model - the binomial regression with logit link. We will also employ certain extensions and generalizations to achieve concrete effects.
22 |
23 | 
24 |
25 | ---
26 |
27 | We are going to use 3 data sets (defined at the bottom of this file):
28 | * unpaired 2-group data
29 |
30 | ``` r
31 | > head(unpaired_data)
32 | sex response trt
33 | 1 female 0 active
34 | 2 female 0 active
35 | 3 female 0 active
36 | 4 female 0 active
37 | 5 female 0 active
38 | 6 female 0 active
39 | > tail(unpaired_data)
40 | sex response trt
41 | 101 male 1 placebo
42 | 102 male 1 placebo
43 | 103 male 1 placebo
44 | 104 male 1 placebo
45 | 105 male 1 placebo
46 | 106 male 1 placebo
47 | ```
48 |
49 | * paired 2-group data
50 |
51 | ``` r
52 | > head(paired_data)
53 | ID Time Treatment Response
54 | 1 1 Pre placebo 0
55 | 2 1 Post placebo 1
56 | 3 2 Pre placebo 0
57 | 4 2 Post placebo 0
58 | 5 3 Pre placebo 0
59 | 6 3 Post placebo 0
60 | >
61 | > tail(paired_data)
62 | ID Time Treatment Response
63 | 35 18 Pre active 0
64 | 36 18 Post active 1
65 | 37 19 Pre active 0
66 | 38 19 Post active 0
67 | 39 20 Pre active 0
68 | 40 20 Post active 0
69 | ```
70 |
71 | * ordered data
72 | ``` r
73 | > head(ordered_paired_data)
74 | ID Time Response TimeUnord
75 | 1 1 T1 0 T1
76 | 2 2 T1 0 T1
77 | 3 3 T1 0 T1
78 | 4 4 T1 0 T1
79 | 5 5 T1 0 T1
80 | 6 6 T1 0 T1
81 | > tail(ordered_paired_data)
82 | ID Time Response TimeUnord
83 | 25 5 T3 1 T3
84 | 26 6 T3 1 T3
85 | 27 7 T3 1 T3
86 | 28 8 T3 0 T3
87 | 29 9 T3 1 T3
88 | 30 10 T3 1 T3
89 | ```
90 |
91 | * unpaired 2-group ordinal data (Pain score of the ODI (Oswestry Disability Index) questionnaire; 6-items Likert data.
92 | https://www.lni.wa.gov/forms-publications/F252-130-000.pdf
93 | ``` r
94 | > head(ordinal_data)
95 | ODIPain Arm Age_centered
96 | 1 [2] Moderate pain B -6.15315
97 | 2 [0] No pain B 12.84685
98 | 3 [1] Very mild pain A -9.15315
99 | 4 [2] Moderate pain B 14.84685
100 | 5 [3] Fairly severe pain A 12.84685
101 | 6 [2] Moderate pain B 2.84685
102 | > tail(ordinal_data)
103 | ODIPain Arm Age_centered
104 | 106 [2] Moderate pain A -15.153153
105 | 107 [2] Moderate pain B -11.153153
106 | 108 [2] Moderate pain A -4.153153
107 | 109 [4] Very severe pain B -0.153153
108 | 110 [1] Very mild pain B -4.153153
109 | 111 [1] Very mild pain B -7.153153
110 | ```
111 |
112 | ---
113 | Loading necessary packages
114 | ```{r}
115 | library(emmeans)
116 | library(broom)
117 | library(survival)
118 | library(marginaleffects)
119 | library(geepack)
120 | ```
121 |
122 | Defining auxiliary function (to validate the results)
123 | ``` r
124 | wald_z_test <- function(table) {
125 | p1 <- prop.table(table, 1)[1, 1]
126 | p2 <- prop.table(table, 1)[2, 1]
127 | n1 <- rowSums(table)[1]
128 | n2 <- rowSums(table)[2]
129 | se_p1 <- sqrt(p1 * (1 - p1) / n1)
130 | se_p2 <- sqrt(p2 * (1 - p2) / n2)
131 | se_diff <- sqrt(se_p1^2 + se_p2^2)
132 | z <- (p1 - p2) / se_diff
133 | p <- 2 * (1 - pnorm(abs(z)))
134 | return(data.frame(estimate = p1 - p2, z = z, se = se_diff, p.value = p, row.names = NULL))
135 | }
136 | ```
137 |
138 | ---
139 |
140 | # Wald's z test for 2 proportions (non-pooled SE)
141 |
142 | We want to reproduce this result:
143 | ``` r
144 | > wald_z_test(xtabs(~ trt + response,data = unpaired_data))
145 | estimate z se p.value
146 | 1 0.2737968 3.047457 0.08984435 0.002307865
147 | ```
148 |
149 | We will use this logistic regression (LR) model:
150 | ``` r
151 | > summary(lr_model <- glm(response ~ trt , data = unpaired_data, family = binomial(link = "logit")))
152 |
153 | Call:
154 | glm(formula = response ~ trt, family = binomial(link = "logit"),
155 | data = unpaired_data)
156 |
157 | Deviance Residuals:
158 | Min 1Q Median 3Q Max
159 | -1.7011 -1.1620 0.7325 1.0778 1.1929
160 |
161 | Coefficients:
162 | Estimate Std. Error z value Pr(>|z|)
163 | (Intercept) -0.03637 0.26972 -0.135 0.89274
164 | trtplacebo 1.21502 0.42629 2.850 0.00437 **
165 | ---
166 | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
167 |
168 | (Dispersion parameter for binomial family taken to be 1)
169 |
170 | Null deviance: 140.50 on 105 degrees of freedom
171 | Residual deviance: 131.88 on 104 degrees of freedom
172 | AIC: 135.88
173 |
174 | Number of Fisher Scoring iterations: 4
175 | ```
176 |
177 | ## Wald's z test via LS-means on re-grided scale (probability scale)
178 | ``` r
179 | > pairs(emmeans(lr_model, regrid="response", specs = ~ trt))
180 | contrast estimate SE df z.ratio p.value
181 | active - placebo -0.274 0.0898 Inf -3.047 0.0023
182 | ```
183 | Let's look closer at the results:
184 | | Outcome | LS-means | raw z test | comment |
185 | |-----------|----------|------------|---------|
186 | | estimate | -0.2737968 | 0.2737968| 👍; swap factor levels to change the sign or ignore |
187 | | SE | 0.08984432 | 0.08984435 | agreement by 7 dec. digits 👍 |
188 | | statistic | -3.047458 | 3.047457 | sign - as above; agreement by 5 dec. digits 👍 |
189 | | p-value | 0.002307857 | 0.002307865 | aggrement by 7 dec. digits 👍 |
190 |
191 | Excellent agreement!
192 |
193 | ## Wald's z test via AME (average marginal effect)
194 | / How does it work? Please read my explanation and a formal proof here: https://github.com/adrianolszewski/Logistic-regression-is-regression/blob/main/logistic_regression_AME_Wald_z_test_proportions.md /
195 | ``` r
196 | > marginaleffects::avg_slopes(lr_model)
197 |
198 | Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
199 | trt placebo - active 0.274 0.0898 3.05 0.00231 8.8 0.0977 0.45
200 |
201 | Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high
202 | ```
203 | Let's look closer at the results:
204 | | Outcome | AME | raw z test | comment |
205 | |-----------|----------|------------|------|
206 | | estimate | 0.2737968 | 0.2737968 | | 👍 |
207 | | SE | 0.08984433 | 0.08984435 | 👍 |
208 | | statistic | 3.047458 | 3.047457 | agreement by 5 dec. digits 👍 |
209 | | p-value | 0.002307859 | 0.002307865 | agreement by 6 dec. digits 👍 |
210 |
211 | Perfect agreement!
212 | ---
213 |
214 | # Rao score z test for 2 proportions (pooled SE)
215 | / How does it work? Please read my explanation and a formal proof here: https://github.com/adrianolszewski/Logistic-regression-is-regression/blob/main/logistic_regression_Rao_Wald_z_test_proportions.md /
216 |
217 | We want to reproduce this result:
218 | ``` r
219 | > prop.test(xtabs(~ trt + response,data=unpaired_data), correct = FALSE)
220 |
221 | 2-sample test for equality of proportions without continuity correction
222 |
223 | data: xtabs(~trt + response, data = unpaired_data)
224 | X-squared = 8.4429, df = 1, p-value = 0.003665
225 | alternative hypothesis: two.sided
226 | 95 percent confidence interval:
227 | 0.09770511 0.44988848
228 | sample estimates:
229 | prop 1 prop 2
230 | 0.5090909 0.2352941
231 | ```
232 |
233 | We will use the same logistic regression (LR) model as previously
234 |
235 | ## Rao score z test via ANOVA with Rao test
236 | ``` r
237 | > anova(glm(response ~ trt , data = unpaired_data, family = binomial(link = "logit")), test = "Rao")
238 | Analysis of Deviance Table
239 |
240 | Model: binomial, link: logit
241 |
242 | Response: response
243 |
244 | Terms added sequentially (first to last)
245 |
246 | Df Deviance Resid. Df Resid. Dev Rao Pr(>Chi)
247 | NULL 105 140.50
248 | trt 1 8.6257 104 131.88 8.4429 0.003665 **
249 | ---
250 | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
251 | ```
252 | Let's look closer at the results:
253 | | Outcome | ANOVA + Rao test | prop.test() | comment |
254 | |-----------|----------|------------|---------|
255 | | statistic | 8.442898 | 8.442897 | agreement by 5 dec. digits 👍 |
256 | | p-value | 0.003664718 | 0.003664719 | agreement by 8 dec. digits 👍 |
257 |
258 | Perfect agreement!
259 |
260 | ---
261 |
262 | # Breslow-Day test for odds ratios via ANOVA with Rao test
263 |
264 | We want to reproduce this result for treatment and sex:
265 | ``` r
266 | > BreslowDayTest(xtabs(~ trt +response + sex, data=unpaired_data), correct = TRUE)
267 |
268 | Breslow-Day Test on Homogeneity of Odds Ratios (with Tarone correction)
269 |
270 | data: xtabs(~trt + response + sex, data = unpaired_data)
271 | X-squared = 1.4905, df = 1, p-value = 0.2221
272 | ```
273 | This time add sex to the model and will look at the interaction term
274 |
275 | ``` r
276 | > as.data.frame(anova(glm(response ~ trt * sex , data = unpaired_data, family = binomial(link = "logit")), test="Rao")[4, ])
277 | Df Deviance Resid. Df Resid. Dev Rao Pr(>Chi)
278 | trt:sex 1 1.498573 102 130.0512 1.496552 0.2212027
279 | ```
280 | Let's look closer at the results:
281 | | Outcome | ANOVA + Rao test | Breslow-Day | comment |
282 | |-----------|----------|------------|---------|
283 | | statistic | 1.496552 | 1.490537 | agreement by 2 dec. digits 👍 |
284 | | p-value | 0.2212027 | 0.2221331 | agreement bt 2 dec. digits 👍 |
285 |
286 | Good agreement!
287 |
288 | ---
289 |
290 | # (Cochrane-) Mantel-Haenszel via conditional logistic regression
291 |
292 | We want to reproduce this result for sex strata:
293 | ``` r
294 | > mantelhaen.test(unpaired_data$response, unpaired_data$trt, unpaired_data$sex, exact = F, correct = F)
295 |
296 | Mantel-Haenszel chi-squared test without continuity correction
297 |
298 | data: unpaired_data$response and unpaired_data$trt and unpaired_data$sex
299 | Mantel-Haenszel X-squared = 8.3052, df = 1, p-value = 0.003953
300 | alternative hypothesis: true common odds ratio is not equal to 1
301 | 95 percent confidence interval:
302 | 1.445613 7.593375
303 | sample estimates:
304 | common odds ratio
305 | 3.313168
306 | ```
307 | And through the model:
308 | ``` r
309 | > summary(clogit(response~trt + strata(sex),data=unpaired_data))$sctest
310 | test df pvalue
311 | 8.30516934 1.00000000 0.00395324
312 | ```
313 | Let's look closer at the results:
314 | | Outcome | Cond. LR | CMH | comment |
315 | |-----------|----------|------------|---------|
316 | | statistic | 8.30516934 | 8.305169 | 👍 |
317 | | p-value | 0.00395324 | 0.00395324 | 👍 |
318 |
319 | Ideal agreement!
320 |
321 | ---
322 |
323 | # McNemar's, Cochran Q, Friedman tests via GEE estimated LR
324 | We want to reproduce this result for sex strata:
325 | ``` r
326 | > mcnemar.test(x=paired_data[paired_data$Time == "Pre", "Response"], y=paired_data[paired_data$Time == "Post", "Response"], correct = F)
327 |
328 | McNemar's Chi-squared test
329 |
330 | data: paired_data[paired_data$Time == "Pre", "Response"] and paired_data[paired_data$Time == "Post", "Response"]
331 | McNemar's chi-squared = 10.286, df = 1, p-value = 0.001341
332 |
333 | # or this one
334 |
335 | > paired_data %>% rstatix::friedman_test(Response ~ Time |ID)
336 | # A tibble: 1 × 6
337 | .y. n statistic df p method
338 | *
339 | 1 Response 20 10.3 1 0.00134 Friedman test
340 |
341 | # or this one
342 |
343 | > RVAideMemoire::cochran.qtest(Response ~ Time | ID,data=paired_data)
344 |
345 | Cochran's Q test
346 |
347 | data: Response by Time, block = ID
348 | Q = 10.2857, df = 1, p-value = 0.001341
349 | alternative hypothesis: true difference in probabilities is not equal to 0
350 | sample estimates:
351 | proba in group Post proba in group Pre
352 | 0.7 0.1
353 | ```
354 |
355 | Through the GEE-estimated model:
356 | ``` r
357 | > summary(geepack::geeglm(Response ~ Time, id = ID,data=paired_data, family = binomial(), corstr = "exchangeable"))
358 |
359 | Call:
360 | geepack::geeglm(formula = Response ~ Time, family = binomial(),
361 | data = paired_data, id = ID, corstr = "exchangeable")
362 |
363 | Coefficients:
364 | Estimate Std.err Wald Pr(>|W|)
365 | (Intercept) 0.8473 0.4880 3.015 0.08249 .
366 | TimePre -3.0445 0.9484 10.305 0.00133 **
367 | ---
368 | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
369 |
370 | Correlation structure = exchangeable
371 | Estimated Scale Parameters:
372 |
373 | Estimate Std.err
374 | (Intercept) 1 0.7215
375 | Link = identity
376 |
377 | Estimated Correlation Parameters:
378 | Estimate Std.err
379 | alpha -0.1455 0.2819
380 | Number of clusters: 20 Maximum cluster size: 2
381 |
382 | # or in a more compact form:
383 | > coef(summary(geepack::geeglm(Response ~ Time, id = ID,data=paired_data, family = binomial(), corstr = "exchangeable")))[2,]
384 | Estimate Std.err Wald Pr(>|W|)
385 | TimePre -3.045 0.9484 10.31 0.001327
386 | ```
387 |
388 | Let's look closer at the results:
389 | | Outcome | GEE LR | Tests | comment |
390 | |-----------|----------|------------|---------|
391 | | statistic | 10.31 | 10.2857 | agreement by 1 deci. digits 👍 |
392 | | p-value | 0.001327 | 0.001341 | agreement by 4 dec. digits 👍 |
393 |
394 | Acceptable agreement!
395 |
396 | ---
397 |
398 | # Cochrane-Armitage test for trend via GLM + ANOVA LRT (Likelihood Ratio Test)
399 | We want to reproduce this result for sex strata:
400 | ``` r
401 | > DescTools::CochranArmitageTest(xtabs(~Response + Time,data=ordered_paired_data))
402 |
403 | Cochran-Armitage test for trend
404 |
405 | data: xtabs(~Response + Time, data = ordered_paired_data)
406 | Z = -3.6, dim = 3, p-value = 0.0003
407 | alternative hypothesis: two.sided
408 |
409 | # or this one
410 |
411 | > rstatix::prop_trend_test(xtabs(~Response + Time,data=ordered_paired_data))
412 | # A tibble: 1 × 6
413 | n statistic p p.signif df method
414 | *
415 | 1 30 12.9 0.000336 *** 1 Chi-square trend test
416 | ```
417 |
418 | Through the GLM model:
419 | ``` r
420 | > as.data.frame(anova(glm(Response ~ Time,data=ordered_paired_data, family = binomial()), test="LRT"))[2,]
421 | Df Deviance Resid. Df Resid. Dev Pr(>Chi)
422 | Time 2 14.99 27 26.46 0.0005553
423 | ```
424 |
425 | Let's look closer at the results:
426 | | Outcome | GLM + LRT ANOVA | Test | comment |
427 | |-----------|----------|------------|---------|
428 | | statistic | 12.86 | 14.99 | same order of magnitude |
429 | | p-value | 0.0005553 | 0.000336 | agreement by 3 dec. digits 👍 |
430 |
431 | Reasonable agreement. (Maybe I'll find a better one).
432 |
433 | ---
434 |
435 | # Mann-Whitney (-Wilcoxon) test of stochastic equivalence (vs. stochastic superiority / dominance)
436 | **Note:** This test DOES NOT TEST MEDIANS in general, unless strong distributional assumptions hold:
437 | 1) IID samples (same dispersion, variance & same shape - if skewed, then in the same direction)
438 | 2) Symmetric around their medians.
439 | For detailed explanations, read my gist and find a rich list of literature (mostly freely accessible) and examples: https://gist.github.com/adrianolszewski/2cec75678e1183e4703589bfd22fa8b2
440 |
441 | We want to reproduce this result:
442 | ``` r
443 | > (wtest <- wilcox.test(as.numeric(ODIPain) ~ Arm, data = ordinal_data, exact = FALSE, correct = FALSE))
444 |
445 | Wilcoxon rank sum test
446 |
447 | data: as.numeric(ODIPain) by Arm
448 | W = 1472, p-value = 0.68
449 | alternative hypothesis: true location shift is not equal to 0
450 |
451 | > wtest$p.value
452 | [1] 0.679575
453 | ```
454 | By using the proportional-odds model (ordinal logistic regression) we obtain:
455 | ``` r
456 | > coef(summary(m <- MASS::polr(ODIPain ~ Arm , data = ordinal_data, Hess=T)))
457 | Value Std. Error t value
458 | ArmB 0.141709 0.341471 0.414995
459 | [0] No pain|[1] Very mild pain -1.444439 0.299213 -4.827458
460 | [1] Very mild pain|[2] Moderate pain -0.273260 0.259784 -1.051875
461 | [2] Moderate pain|[3] Fairly severe pain 1.361363 0.291704 4.666935
462 | [3] Fairly severe pain|[4] Very severe pain 2.093502 0.345203 6.064551
463 | [4] Very severe pain|[5] Worst imaginable pain 4.072209 0.736078 5.532306
464 |
465 | > pairs(emmeans(m, specs = ~Arm))
466 | contrast estimate SE df z.ratio p.value
467 | A - B -0.142 0.341 Inf -0.415 0.6781
468 |
469 | # or
470 | > (mtest <- joint_tests(m))
471 | model term df1 df2 F.ratio p.value
472 | Arm 1 Inf 0.172 0.6781
473 |
474 | mtest$p.value
475 | [1] 0.678146
476 | ```
477 |
478 | This time, the two outputs (model vs. test) look very different but give a very close p-value!
479 | It's not a coincidence.
480 | You can find detailed explanations and necessary formulas here: [Equivalence of Wilcoxon Statistic and Proportional Odds Model](https://www.fharrell.com/post/powilcoxon/) | [Resources for Ordinal Regression Models](https://www.fharrell.com/post/rpo/) | [If You Like the Wilcoxon Test You Must Like the Proportional Odds Model](https://www.fharrell.com/post/wpo/)
481 |
482 | So, like Prof. Harrell, we will check also the concordance index:
483 | ``` r
484 | # From the Wilcoxon statistic
485 | > (bind_cols(tidy(wilcox.test(as.numeric(ODIPain) ~ Arm, data = ordinal_data, exact = FALSE, correct = FALSE)),
486 | ordinal_data %>%
487 | group_by(Arm) %>%
488 | summarize(n=n()) %>%
489 | summarize("n1*n2" = prod(n))) %>%
490 | mutate(c = statistic / `n1*n2`) -> concor)
491 |
492 | # A tibble: 1 × 6
493 | statistic p.value method alternative `n1*n2` c
494 |
495 | 1 1472. 0.680 Wilcoxon rank sum test two.sided 3080 0.478
496 |
497 | > concor$c
498 | 0.478084
499 |
500 | # From the odds ratio taken from the model:
501 | > (OR <- 1/exp((coef(summary(m)))[1,1]))
502 | [1] 0.867874
503 |
504 | and finally
505 | > (c_mod <- OR^0.66 / (1 + OR ^ 0.66))
506 | 0.476635
507 |
508 | # So we are off by:
509 | > sprintf("%.2f%%",100*(concor$c - c_mod) / concor$c)
510 | [1] "0.30%"
511 |
512 | # Isn't this IMPRESSIVE?
513 | ```
514 |
515 | Let's collect the results closer at the results:
516 | | Outcome | OLR | Wilcox | comment |
517 | |-----------|----------|------------|---------|
518 | | concordance | 0.478084 | 0.476635 | agreement by 2 dec. digits 👍 |
519 | | p-value | 0.679575 | 0.678146 | agreement by 2 dec. digits 👍 |
520 |
521 | Very good agreement!
522 |
523 | Later, in a separate gist, I will shoud you, through simulation, that this equivalence holds very well!
524 |
525 | **Think about the consequences. This way obtain the Mann-Whitney (-Wilcoxon) test adjusted for covariates.**
526 | By the way, this is another interesting example, where the result of a completely non-parametric test can be obtained via parametric method.
527 |
528 | ---
529 | * _EM-means_ (estimated marginal means) is another name of the well-known in experimental research _LS-means_ (least-square means)
530 | It's a model-based predicted (estimated) mean. If you remember the definition of regression (NO, not the Machine Learning one...)
531 | then you know that regresion gives you some function of the data conditional to the predictor.
532 | For the linear regression it's E(Y|X=x), for the GLM it is link(E(Y|X=x)), for quantile regression it's Qi(Y|X=x).
533 | And since the predictor exclusively consists of categorical variables, they form sub-groups in which the (conditional)
534 | means are calculated. If we include also numerical covariates into the model, the predictions will account for it, giving us so-called "covariate-adjusted means".
535 |
536 | ----
537 | The datasets for your own experiments:
538 |
539 | ``` r
540 | unpaired_data <- structure(list(sex = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
541 | 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
542 | 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
543 | 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
544 | 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
545 | 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
546 | 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
547 | 2L, 2L, 2L), levels = c("female", "male"), class = "factor"),
548 | response = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
549 | 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
550 | 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,
551 | 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
552 | 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
553 | 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), trt = structure(c(1L,
554 | 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
555 | 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
556 | 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
557 | 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
558 | 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
559 | 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
560 | 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
561 | ), levels = c("active", "placebo"), class = "factor")), row.names = c(NA,
562 | -106L), class = "data.frame")
563 |
564 | paired_data <- structure(list(ID = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L,
565 | 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L, 11L, 12L, 12L,
566 | 13L, 13L, 14L, 14L, 15L, 15L, 16L, 16L, 17L, 17L, 18L, 18L, 19L,
567 | 19L, 20L, 20L), Time = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L,
568 | 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
569 | 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
570 | 1L), levels = c("Post", "Pre"), class = "factor"), Treatment = structure(c(2L,
571 | 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
572 | 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
573 | 1L, 1L, 1L, 1L, 1L, 1L, 1L), levels = c("active", "placebo"), class = "factor"),
574 | Response = c(0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 1L,
575 | 0L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L,
576 | 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L)), row.names = c(NA,
577 | -40L), class = "data.frame")
578 |
579 | ordered_paired_data <- structure(list(ID = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
580 | 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L,
581 | 4L, 5L, 6L, 7L, 8L, 9L, 10L), levels = c("1", "2", "3", "4",
582 | "5", "6", "7", "8", "9", "10"), class = "factor"), Time = structure(c(1L,
583 | 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
584 | 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), levels = c("T1",
585 | "T2", "T3"), class = c("ordered", "factor")), Response = c(0L,
586 | 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L,
587 | 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L), TimeUnord = structure(c(1L,
588 | 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
589 | 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), levels = c("T1",
590 | "T2", "T3"), class = "factor")), row.names = c(NA, -30L), class = "data.frame")
591 |
592 | ordinal_data <- structure(list(ODIPain = structure(c(3L, 1L, 2L, 3L, 4L, 3L,
593 | 4L, 2L, 3L, 5L, 2L, 5L, 5L, 6L, 2L, 3L, 1L, 2L, 3L, 3L, 1L, 3L,
594 | 3L, 2L, 2L, 5L, 5L, 2L, 5L, 3L, 5L, 1L, 3L, 3L, 3L, 1L, 5L, 3L,
595 | 5L, 1L, 1L, 2L, 1L, 2L, 3L, 2L, 3L, 1L, 2L, 1L, 2L, 4L, 6L, 4L,
596 | 3L, 3L, 3L, 3L, 1L, 4L, 5L, 4L, 3L, 3L, 1L, 3L, 1L, 4L, 3L, 3L,
597 | 2L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 1L, 2L, 2L, 1L, 3L, 4L, 4L, 3L,
598 | 2L, 2L, 2L, 2L, 2L, 1L, 1L, 3L, 1L, 3L, 1L, 3L, 4L, 4L, 3L, 3L,
599 | 1L, 2L, 3L, 3L, 3L, 3L, 5L, 2L, 2L), levels = c("[0] No pain",
600 | "[1] Very mild pain", "[2] Moderate pain", "[3] Fairly severe pain",
601 | "[4] Very severe pain", "[5] Worst imaginable pain"), class = c("ordered",
602 | "factor")), Arm = structure(c(2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
603 | 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L,
604 | 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L,
605 | 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
606 | 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
607 | 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L,
608 | 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L,
609 | 1L, 1L, 2L, 1L, 2L, 2L, 2L), levels = c("A", "B"), class = "factor"),
610 | Age_centered = c(-6.15315315315316, 12.8468468468468, -9.15315315315316,
611 | 14.8468468468468, 12.8468468468468, 2.84684684684684, -10.1531531531532,
612 | -18.1531531531532, -1.15315315315316, 8.84684684684684, -17.1531531531532,
613 | 13.8468468468468, 9.84684684684684, 17.8468468468468, -19.1531531531532,
614 | -7.15315315315316, -10.1531531531532, -19.1531531531532,
615 | -7.15315315315316, 0.846846846846844, -17.1531531531532,
616 | 5.84684684684684, -25.1531531531532, -1.15315315315316, -15.1531531531532,
617 | 4.84684684684684, 1.84684684684684, 12.8468468468468, -11.1531531531532,
618 | 5.84684684684684, -6.15315315315316, -0.153153153153156,
619 | 20.8468468468468, 5.84684684684684, -0.153153153153156, 12.8468468468468,
620 | -19.1531531531532, -11.1531531531532, 1.84684684684684, 0.846846846846844,
621 | -21.1531531531532, 9.84684684684684, 15.8468468468468, 14.8468468468468,
622 | -12.1531531531532, -11.1531531531532, -9.15315315315316,
623 | 5.84684684684684, -4.15315315315316, 12.8468468468468, 1.84684684684684,
624 | -7.15315315315316, -3.15315315315316, 7.84684684684684, 0.846846846846844,
625 | -4.15315315315316, 5.84684684684684, -0.153153153153156,
626 | 1.84684684684684, -7.15315315315316, 1.84684684684684, -9.15315315315316,
627 | 6.84684684684684, 9.84684684684684, 17.8468468468468, 5.84684684684684,
628 | 9.84684684684684, -10.1531531531532, -5.15315315315316, 18.8468468468468,
629 | 21.8468468468468, -0.153153153153156, 2.84684684684684, -8.15315315315316,
630 | -5.15315315315316, 5.84684684684684, 2.84684684684684, -15.1531531531532,
631 | 2.84684684684684, 25.8468468468468, -11.1531531531532, 27.8468468468468,
632 | 2.84684684684684, 20.8468468468468, -0.153153153153156, -2.15315315315316,
633 | 12.8468468468468, -0.153153153153156, 0.846846846846844,
634 | 11.8468468468468, -8.15315315315316, 3.84684684684684, 22.8468468468468,
635 | 5.84684684684684, 12.8468468468468, 4.84684684684684, 11.8468468468468,
636 | -5.15315315315316, -17.1531531531532, -7.15315315315316,
637 | -16.1531531531532, 0.846846846846844, -13.1531531531532,
638 | -13.1531531531532, -19.1531531531532, -15.1531531531532,
639 | -11.1531531531532, -4.15315315315316, -0.153153153153156,
640 | -4.15315315315316, -7.15315315315316)), row.names = c(NA,
641 | -111L), class = "data.frame")
642 | ```
643 |
--------------------------------------------------------------------------------
/logistic_regression_AME_Wald_z_test_proportions.md:
--------------------------------------------------------------------------------
1 | # Proving the equivalence between the 2-sample Wald’s z-statistic for comparing proportions with unpooled variances and the Average Marginal Effect over logistic regression with a single binary predictor
2 | #### Adrian Olszewski, 02.03.2025
3 |
4 | The Wald’s z-statistic for difference in 2 proportions with unpooled variances is of the following form:
5 |
6 | ``` math
7 | \begin{equation}
8 | z=\ \frac{\widehat{p_B}-\widehat{p_A}}{\sqrt{\frac{\widehat{p_A}\left(1-\widehat{p_A}\right)}{n_A}+\frac{\widehat{p_B}\left(1-\widehat{p_B}\right)}{n_B}}}
9 | \hspace{2cm} (1)
10 | \end{equation}
11 | ```
12 | Where $\widehat{p_1}\$ stands for the estimated probability (sample proportion, %) in the 1st group, $\widehat{p_2}$ is the estimated probability (sample proportion, %) in the 2nd group, $n_1$ and $n_2$ denote respective group sizes.
13 | I will show that this statistic is 1:1 equivalent to the average marginal effect (AME) of the logistic regression with a single binary predictor playing role of indicator for the compared samples.
14 |
15 | To simplify calculations, I will show the equivalence of the nominator and denominator of the z statistic, i.e. the difference between two probabilities and its variance, separately.
16 |
17 | **The equation of logistic regression**
18 |
19 | Let’s start from the equation of the logistic regression with a single binary predictor:
20 | ``` math
21 | \begin{equation}
22 | logit\left(E\left(Y\middle| X\right)=logit\left(\hat{p}(Y=1|X\right)\right)=\ln{\left(\frac{\hat{p}\left(Y=1\middle| X\right)}{1-\hat{p}\left(Y=1\middle| X\right)}\right)}=\beta_0+X_1\beta_1
23 | \hspace{2cm} (2)
24 | \end{equation}
25 | ```
26 | or equivalently, after applying the inverse-logit, i.e. sigmoid function (let’s also simplify X1 to X)
27 | ``` math
28 | \begin{equation}
29 | \hat{p}\left(Y=1\middle| X\right)=\frac{e^{\beta_0+X\beta_1}}{1+e^{\beta_0+X\beta_1}}
30 | \hspace{2cm} (3)
31 | \end{equation}
32 | ```
33 | where $$Y_i$$ are independent Bernoulli random variables with probabilities $$\hat{p_i}$$.
34 |
35 | **(Average) Marginal Effect**
36 |
37 | Now, let’s introduce the marginal effect (ME) of a regression model. The ME of a given predictor variable X is the slope of the regression surface with respect to X, reflecting the rate at which Y changes at a given point with respect to X, while holding values of all other predictors constant. In other words, ME is an instantaneous rate of change, calculated as partial derivative of the regression equation with respect to X. For example, for an equation: Y = β0 + β1X1+ β2X2 + β3X1X2 corresponding ME(Y, X2) = ∂Y/ ∂X2 = β2 + β3X1
38 | The average marginal effect (AME) calculates ME at every observed value of X and averages across the resulting effect estimates. In other words, AME is an average effect of an infinitesimal change in X across all observations:
39 |
40 | ``` math
41 | \begin{equation}
42 | {AME}_{Xk}(model)=\beta_k\times\frac{1}{N}\sum_{i=1}^{N}\frac{\partial E(Y_i|{Xk}_i,\ covariates)}{\partial Xk}
43 | \hspace{2cm} (4)
44 | \end{equation}
45 | ```
46 |
47 | For a generally defined equation:
48 | ``` math
49 | \begin{equation}
50 | \sigma\left(x\right)=\frac{e^x}{1+e^x}
51 | \hspace{2cm} (5)
52 | \end{equation}
53 | ```
54 |
55 | the partial derivative (using the quotient rule and rewriting back in terms of σ) is of the form:
56 | ``` math
57 | \begin{equation}
58 | \frac{d\sigma(x)}{dx}=\sigma(x)(1-\sigma\left(x\right))
59 | \hspace{2cm} (6)
60 | \end{equation}
61 | ```
62 |
63 | Therefore,
64 | ``` math
65 | \begin{equation}
66 | {AME}_X(model)=\beta\times\frac{1}{N}\sum_{i=1}^{N}{\hat{p}\left(Y=1\middle| X=x_i\right)\times\hat{p}\left(Y=0\middle| X=x_i\right)}
67 | \hspace{2cm} (7)
68 | \end{equation}
69 | ```
70 |
71 | **AME for the binary predictor**
72 |
73 | For a categorical predictor, however, there is no something like “infinitesimal change”. There is just switch between categories, so the AME becomes a contrast, i.e. a difference between values of the appropriate partial derivative calculated at the selected category levels, thus:
74 |
75 | ``` math
76 | \begin{equation}
77 | {AME}_X\left(model\right)=\hat{p}\left(Y=1\middle| X=B\right)-\hat{p}\left(Y=1\middle| X=A\right)=\ \widehat{p_B}-\widehat{p_A}
78 | \hspace{2cm} (8)
79 | \end{equation}
80 | ```
81 |
82 | Which means, that the AME for such defined logistic regression corresponds to a difference in two estimated group probabilities, expressed in percentage points.
83 |
84 | **Variance of AME for the binary predictor**
85 |
86 | Now, I will show the equivalence between the variance in the Wald’s z statistic and the variance of the AME for the binary predictor.
87 |
88 | ``` math
89 | \begin{equation}
90 | var\left(AME\right)=var(\widehat{p_B}-\widehat{p_A})=\ \frac{\widehat{p_A}\left(1-\widehat{p_A}\right)}{n_A}+\frac{\widehat{p_B}\left(1-\widehat{p_B}\right)}{n_B}
91 | \hspace{2cm} (9)
92 | \end{equation}
93 | ```
94 | Several replacements will save a lot of typing:
95 | - $$P_A=\ \hat{p}\left(Y=1\middle| X=A\right),\ P_B=\ \hat{p}\left(Y=1\middle| X=B\right)$$
96 | - $$P_i\times\left(1-P_i\right)=P_iQ_i$$
97 |
98 | Let’s introduce the AME function:
99 | ``` math
100 | \begin{equation}
101 | AME=\ g\left(\beta_0,\beta_1\right)=P_B-P_A
102 | \hspace{2cm} (10)
103 | \end{equation}
104 | ```
105 |
106 | Let’s also encode the two levels {A, B} using a single binary predictor X such that: A: X=0, B: X=1, let’s express $$P_A$$ and $$P_B$$ in terms of beta coefficients:
107 | ``` math
108 | \begin{equation}
109 | \left\{
110 | \begin{aligned}
111 | P_A &= P(Y = 1, X = 0) = \frac{e^{\beta_0}}{1 + e^{\beta_0}} \\
112 | P_B &= P(Y = 1, X = 1) = \frac{e^{\beta_0 + \beta_1}}{1 + e^{\beta_0 + \beta_1}}
113 | \end{aligned}
114 | \right.
115 | \hspace{2cm} (11)
116 | \end{equation}
117 | ```
118 | so the AME function is now expressed as:
119 |
120 | ``` math
121 | \begin{equation}
122 | g\left(\beta_0,\beta_1\right)=\ \frac{e^{\beta_0+\beta_1}}{1+e^{\beta_0+\beta_1}}-\frac{e^{\beta_0}}{1+e^{\beta_0}}
123 | \hspace{2cm} (12)
124 | \end{equation}
125 | ```
126 |
127 | The variance for the AME is typically obtained by the _delta method_:
128 | ``` math
129 | \begin{equation}
130 | var\left(AME\right)=var\left(g\left(\beta_0,\beta_1\right)\right)\approx{\nabla g}^T\left(\beta_0,\beta_1\right)\times\Sigma\times\nabla g\left(\beta_0,\beta_1\right)
131 | \hspace{2cm} (13)
132 | \end{equation}
133 | ```
134 |
135 | Let’s first obtain the derivatives.
136 | Recall, that:
137 |
138 | ``` math
139 | \begin{equation}
140 | \begin{aligned}
141 | \text{for } \sigma(x) &= \frac{e^x}{1 + e^x} \\
142 | \frac{d\sigma(x)}{dx} &= \sigma(x)(1 - \sigma(x)) = \frac{e^x}{1 + e^x} \times \frac{1}{1 + e^x} = \frac{e^x}{(1 + e^x)^2}
143 | \end{aligned}
144 | \hspace{2cm} (14)
145 | \end{equation}
146 | ```
147 | Therefore,
148 |
149 | ``` math
150 | \begin{equation}
151 | \nabla g\left(\beta_0,\beta_1\right) =
152 | \left[
153 | \begin{matrix}
154 | \frac{\partial g}{\partial\beta_0} \\
155 | \frac{\partial g}{\partial\beta_1}
156 | \end{matrix}
157 | \right]
158 | =
159 | \left[
160 | \begin{matrix}
161 | P_B\left(1 - P_B\right) - P_A\left(1 - P_A\right) \\
162 | P_B\left(1 - P_B\right)
163 | \end{matrix}
164 | \right]
165 | =
166 | \left[
167 | \begin{matrix}
168 | P_B Q_B - P_A Q_A \\
169 | P_B Q_B
170 | \end{matrix}
171 | \right]
172 | =
173 | \left[
174 | \begin{matrix}
175 | \frac{e^{\beta_0+\beta_1}}{\left(1+e^{\beta_0+\beta_1}\right)^2} \\
176 | \frac{e^{\beta_0}}{\left(1+e^{\beta_0}\right)^2}
177 | \end{matrix}
178 | \right]
179 | \hspace{1.5cm} (15)
180 | \end{equation}
181 | ```
182 | Now, we need the variance-covariance matrix, i.e.
183 | ``` math
184 | \begin{equation}
185 | \Sigma=\left[\begin{matrix}var(\beta_0)&cov(\beta_0,\beta_1)\\cov(\beta_0,\beta_1&var(\beta_1)\\\end{matrix}\right]
186 | \hspace{2cm} (16)
187 | \end{equation}
188 | ```
189 |
190 | This can be obtained by inverting the Fisher information matrix given by:
191 | ``` math
192 | \begin{equation}
193 | \Sigma=I^{-1}=\left(X^TWX\right)^{-1}
194 | \hspace{2cm} (17)
195 | \end{equation}
196 | ```
197 |
198 | where X is the design matrix with 2 columns ($β_0$ of ones and $β_1$ indicating when X=1), with $n_A$ and $n_B$ number of rows corresponding to group A and B, respectively.
199 |
200 | ``` math
201 | \begin{equation}
202 | X=\left[\begin{matrix}1&A=0\\1&A=0\\\vdots&\vdots\\1&A=0\\1&B=1\\\vdots&\vdots\\1&B=1\\\end{matrix}\right]
203 | \hspace{2cm} (18)
204 | \end{equation}
205 | ```
206 | and W is the diagonal matrix of weights, of the block-diagonal form:
207 | ``` math
208 | \begin{equation}
209 | W=diag\left(P_i\times\left(1-P_i\right)\right)=\left[\begin{matrix}P_AQ_A&0&\ldots&0&0&0&\ldots&0\\0&P_AQ_A&\cdots&0&0&0&\ldots&0\\\vdots&\vdots&\ddots&\vdots&\vdots&\vdots&\ddots&\vdots\\0&0&\ldots&P_AQ_A&0&0&\ldots&0\\0&0&\ldots&0&P_BQ_B&0&\ldots&0\\0&0&\ldots&0&0&P_BQ_B&\ldots&0\\\vdots&\vdots&\ddots&\vdots&\vdots&\vdots&\ddots&\vdots\\0&0&\ldots&0&0&0&\ldots&P_BQ_B\\\end{matrix}\right]
210 | \hspace{2cm} (19)
211 | \end{equation}
212 | ```
213 | which can be simplified to:
214 |
215 | ``` math
216 | \begin{equation}
217 | W=diag\left(P_i\times\left(1-P_i\right)\right)=\left[\begin{matrix}P_AQ_AI_{nA}&0\\0&P_BQ_BI_{nB}\\\end{matrix}\right]
218 | \hspace{2cm} (20)
219 | \end{equation}
220 | ```
221 | where $I_nA$ and $I_nB$ are respective identity matrices.
222 |
223 | The multiplication result can be abbreviated to appropriate sums:
224 | ``` math
225 | \begin{equation}
226 | X^TWX=\left[\begin{matrix}\Sigma1P_iQ_i&\Sigma X P_iQ_i\\\Sigma X P_iQ_i&\Sigma X^2P_iQ_i\\\end{matrix}\right]
227 | \hspace{2cm} (21)
228 | \end{equation}
229 | ```
230 |
231 | where **1** is the result of multiplying 1 x 1 (the $β_0$ vector), and **X**s refer to the other products of the $β_0$ and $β_1$ vectors. Since these vectors consist only of 0 and 1, and 0 refers to the “A” group and 1 refers to the “B” group, their products keep only the “B” part.
232 | Therefore, the final result is:
233 | ``` math
234 | \begin{equation}
235 | X^TWX=\left[\begin{matrix}{n_AP}_AQ_A+{n_BP}_BQ_B&{n_BP}_BQ_B\\{n_BP}_BQ_B&{n_BP}_BQ_B\\\end{matrix}\right]
236 | \hspace{2cm} (22)
237 | \end{equation}
238 | ```
239 |
240 | So the inverse can be computed (remembering that both covariance elements are equal):
241 | ``` math
242 | \begin{equation}
243 | \Sigma=\left(X^TWX\right)^{-1}=\frac{1}{\left({n_AP}_AQ_A+{n_BP}_BQ_B\right)\times{n_BP}_BQ_B-\left({n_BP}_BQ_B\right)^2}\left[\begin{matrix}{n_BP}_BQ_B&-{n_BP}_BQ_B\\-{n_BP}_BQ_B&{n_AP}_AQ_A+{n_BP}_BQ_B\\\end{matrix}\right]
244 | \hspace{2cm} (23)
245 | \end{equation}
246 | ```
247 |
248 | After simplifying the denominator term:
249 | ``` math
250 | \begin{equation}
251 | \left({n_AP}_AQ_A+{n_BP}_BQ_B\right)\times{n_BP}_BQ_B-\left({n_BP}_BQ_B\right)^2={n_AP}_AQ_A\times{n_BP}_BQ_B+\left({n_BP}_BQ_B\right)^2-\left({n_BP}_BQ_B\right)^2={n_AP}_AQ_A\times{n_BP}_BQ_B
252 | \hspace{1cm} (24)
253 | \end{equation}
254 | ```
255 | finally:
256 |
257 | ``` math
258 | \begin{equation}
259 | \Sigma=\frac{1}{{n_AP}_AQ_A\times{n_BP}_BQ_B}\left[\begin{matrix}{n_BP}_BQ_B&-{n_BP}_BQ_B\\-{n_BP}_BQ_B&{n_AP}_AQ_A+{n_BP}_BQ_B\\\end{matrix}\right]=\left[\begin{matrix}\frac{1}{{n_AP}_AQ_A}&-\frac{1}{{n_AP}_AQ_A}\\-\frac{1}{{n_AP}_AQ_A}&\frac{{n_AP}_AQ_A+{n_BP}_BQ_B}{{n_AP}_AQ_A\times{n_BP}_BQ_B}\\\end{matrix}\right]
260 | \hspace{2cm} (25)
261 | \end{equation}
262 | ```
263 |
264 | By recalling formula #13 and matrix #15, we can express the variance of AME as:
265 | ``` math
266 | \begin{equation}
267 | var\left(AME\right)=\left(\frac{\partial g}{\partial\beta_0}\right)^2var\left(\beta_0\right)+\left(\frac{\partial g}{\partial\beta_1}\right)^2var\left(\beta_1\right)+2\ast\frac{\partial g}{\partial\beta_0}\frac{\partial g}{\partial\beta_1}covar\left(\beta_0,\ \beta_1\right)
268 | \hspace{2cm} (26)
269 | \end{equation}
270 | ```
271 |
272 | which expands to:
273 |
274 | ``` math
275 | \begin{equation}
276 | var\left(AME\right)=\frac{\left(P_BQ_B-P_AQ_A\right)^2}{{n_AP}_AQ_A}+\frac{\left(P_BQ_B\right)^2\times\left({n_AP}_AQ_A+{n_BP}_BQ_B\right)}{{n_AP}_AQ_A\times{n_BP}_BQ_B}-2\ast\frac{\left(P_BQ_B-P_AQ_A\right)\times P_BQ_B}{{n_AP}_AQ_A}
277 | \hspace{2cm} (27)
278 | \end{equation}
279 | ```
280 |
281 | Let's replace $P*Q$ with V to simplify (note: I occasionally use "×" to denote arithmetic multiplication to facilicate reading; A×B is just AB):
282 |
283 | ``` math
284 | \begin{equation}
285 | \begin{aligned}
286 | var(AME) &= \frac{(V_B - V_A)^2}{n_A V_A} + \frac{V_B^2 \times (n_A V_A + n_B V_B)}{n_A V_A \times n_B V_B} - 2 \times \frac{(V_B - V_A) \times V_B}{n_A V_A} \\[10pt]
287 | &= \frac{(V_B - V_A)(V_B - V_A - 2V_B)}{n_A V_A} + \frac{V_B \times (n_A V_A + n_B V_B)}{n_A V_A \times n_B} \\[10pt]
288 | &= \frac{(V_B - V_A)(-V_B - V_A)}{n_A V_A} + \frac{V_B n_A V_A + V_B^2 n_B}{n_A V_A \times n_B} \\[10pt]
289 | &= \frac{V_A^2 - V_B^2}{n_A V_A} + \frac{V_B}{n_B} + \frac{V_B^2}{n_A V_A} \\[10pt]
290 | &= \frac{V_A}{n_A} + \frac{V_B}{n_B}
291 | \hspace{1cm} (28)
292 | \end{aligned}
293 | \end{equation}
294 | ```
295 | Therefore,
296 | ``` math
297 | \begin{equation}
298 | var\left(AME\right)=\frac{V_A}{n_A}+\frac{V_B}{n_B}=\frac{P_AQ_A}{n_A}+\frac{P_BQ_B}{n_B}=\frac{\widehat{p_A}\left(1-\widehat{p_A}\right)}{n_A}+\frac{\widehat{p_B}\left(1-\widehat{p_B}\right)}{n_B}\blacksquare
299 | \hspace{2cm} (29)
300 | \end{equation}
301 | ```
302 | This way I have shown the equivalence of the 2-sample Wald’s z-statistic for comparing proportions with unpooled variances and the average marginal effect of the logistic regression with a single binary predictor distinguishing the compared samples.
303 | Also, since the Estimated Marginal Means (EM-Means) on so defined logistic regression and “re-grided” to the probability scale represent the estimated probabilities, the contrast comparing them through the Wald’s approach yields exactly the same result.
304 |
305 | Although some minor discrepancies exist due to Maximum Likelihood Estimation, even for so small samples (N=10 and 20) the agreement is just perfect.
306 |
307 | **Average Marginal Effect**
308 | ```r
309 | > wald_z_test_non_pooled(x1 = 6, n1 = 20, x2 = 10, n2 = 20)
310 | diff z chi2 se p.value p.value_1 LCI HCI
311 | 1 -0.2 -1.318761 1.73913 0.1516575 0.187249 0.09362452 -0.4972433 0.09724326
312 | >
313 | > data <- data.frame(response = factor(c(rep("Success", 6), rep("Failure", 20-6),
314 | + rep("Success", 10), rep("Failure", 20-10))),
315 | + grp = factor(rep(c("B", "A"), each=20)))
316 | >
317 | > m <- glm(response ~ grp, data = data, family = binomial(link = "logit"))
318 | > data.frame(marginaleffects::avg_slopes(m)) %>% mutate(across(where(is.numeric), ~round(., 6)))
319 | term contrast estimate std.error statistic p.value s.value conf.low conf.high
320 | 1 grp B - A -0.2 0.151657 -1.318762 0.187249 2.416973 -0.497243 0.097243
321 | ```
322 | 
323 |
324 | **EM-means**
325 | ```r
326 | > library(emmeans)
327 | > update(pairs(emmeans(m, specs = ~grp, regrid="response")), infer = c(TRUE, TRUE)) %>%
328 | + data.frame() %>%
329 | + mutate(across(where(is.numeric), ~round(., 6)))
330 | contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
331 | 1 A - B 0.2 0.151657 Inf -0.097243 0.497243 1.318761 0.187249
332 | ```
333 |
334 | 
335 |
336 | The implementation of the z statistic:
337 | ```r
338 | wald_z_test_non_pooled <- function(x1, n1, x2, n2, conf.level=0.95) {
339 | p1 <- x1/n1
340 | p2 <- x2/n2
341 |
342 | se_p1 <- sqrt(p1 * (1 - p1) / n1)
343 | se_p2 <- sqrt(p2 * (1 - p2) / n2)
344 |
345 | se_diff <- sqrt(se_p1^2 + se_p2^2)
346 |
347 | z <- (p1 - p2) / se_diff
348 | p <- 2 * (1 - pnorm(abs(z)))
349 | hCI <- abs(qnorm((1 - conf.level)/2)) * se_diff
350 |
351 | return(data.frame(diff=p1-p2,
352 | z = z, chi2 = z^2,
353 | se = se_diff,
354 | p.value = p, p.value_1 =p/2,
355 | LCI = (p1-p2) - hCI,
356 | HCI = (p1-p2) + hCI,
357 | row.names = NULL))
358 | }
359 | ```
360 |
361 | ````````
362 |
--------------------------------------------------------------------------------
/logistic_regression_Rao_Wald_z_test_proportions.md:
--------------------------------------------------------------------------------
1 | # Proving the equivalence between the 2-sample Wald’s z-statistic for comparing proportions with pooled variances and the Rao score test over logistic regression with a single binary predictor
2 | #### Adrian Olszewski, 21.03.2025
3 |
4 | The Wald’s z-statistic for difference in 2 proportions with unpooled variances is of the following form:
5 |
6 | ``` math
7 | \begin{equation}
8 | z=\ \frac{\widehat{p_B}-\widehat{p_A}}{\sqrt{p\left(1-p\right)\left(\frac{1}{n_A}+\frac{1}{n_B}\right)}};\ z~N(0,\ 1)
9 | \hspace{2cm} (1)
10 | \end{equation}
11 | ```
12 | Where $\widehat{p_A}\$ stands for the estimated probability (sample proportion, %) in the 1st group,
13 | $\widehat{p_B}$ is the estimated probability in the 2nd group, $n_A$ and $n_B$ denote respective group sizes, and p is the pooled probability $p=\frac{x_A+x_B}{n_A+n_B}$.
14 |
15 | Traditionally the Wald’s statistic is expressed in the squared form, becoming: $z^2=\chi_{df=1}^2$.
16 | Both forms yield the same p-value. For convenience I will show that this $\chi_{df=1}^2$ statistic is 1:1 equivalent to the Rao score test over the logistic regression with a single binary predictor playing role of indicator for the compared samples.
17 | To simplify calculations, I will derive the formula for pooled probability p and the overall statistic form separately.
18 |
19 | **The equation of logistic regression**
20 |
21 | Let’s start from the equation of the logistic regression with a single binary predictor:
22 |
23 | ``` math
24 | \begin{equation}
25 | logit\left(E\left(Y\middle| X\right)=logit\left(\hat{p}(Y=1|X\right)\right)=\ln{\left(\frac{\hat{p}\left(Y=1\middle| X\right)}{1-\hat{p}\left(Y=1\middle| X\right)}\right)}=\beta_0+X_1\beta_1
26 | \hspace{2cm} (2)
27 | \end{equation}
28 | ```
29 |
30 | or equivalently, after applying the inverse-logit, i.e. sigmoid function (let’s also simplify X1 to just X)
31 |
32 | ``` math
33 | \begin{equation}
34 | \hat{p}\left(Y=1\middle| X\right)=\frac{e^{\beta_0+X\beta_1}}{1+e^{\beta_0+X\beta_1}}=\frac{e^\eta}{1+e^\eta}
35 | \hspace{2cm} (3)
36 | \end{equation}
37 | ```
38 | where Yi are independent Bernoulli random variables with probabilities pi. The second, simpler form with η=β0+Xβ1 will facilitate later calculations.
39 |
40 | Let’s also encode the two levels {A, B} using a single binary predictor X such that: A: X=0, B: X=1, let’s simplify notation and express $p_A$ and $p_B$ in terms of beta coefficients:
41 |
42 | ``` math
43 | \begin{equation}
44 | \begin{cases}
45 | \hat{p}_A = P(Y = 1, X = 0) = \frac{e^{\beta_0}}{1 + e^{\beta_0}} \\
46 | \hat{p}_B = P(Y = 1, X = 1) = \frac{e^{\beta_0 + \beta_1}}{1 + e^{\beta_0 + \beta_1}}
47 | \end{cases}
48 | \hspace{2cm} (4)
49 | \end{equation}
50 | ```
51 |
52 | Let’s also skip the hat notation for estimated p and use simply $p_A$ and $p_B$ until stated otherwise.
53 |
54 | **Pooled probability**
55 |
56 | Under $H_0$, i.e. $β_1=0$, $p_A = p_B$, and the best estimator of this common proportion is the pooled proportion p. Let’s find its form.
57 | We need to assume that the data consists of two independent binomial samples:
58 | ``` math
59 | \begin{equation}
60 | \begin{cases}
61 | {group\ A=X}_A~Binomial\left(n_A,p_A\right) \\
62 | {group\ B=X}_B~Binomial\left(n_B,p_B\right)
63 | \end{cases}
64 | \hspace{2cm} (5)
65 | \end{equation}
66 | ```
67 |
68 | So the likelihood function is:
69 |
70 | ``` math
71 | \begin{equation}
72 | L(p_A,p_B)=\left(\begin{matrix}n_A\\x_A\\\end{matrix}\right)p_A^{x_A}\left(1-p_A\right)^{n_A-x_A}\bullet\left(\begin{matrix}n_B\\x_B\\\end{matrix}\right)p_B^{x_B}\left(1-p_B\right)^{n_B-x_B}
73 | \hspace{2cm} (6)
74 | \end{equation}
75 | ```
76 | where $x_A$ and $x_B$ are the observed number of successes in groups A and B, respectively, and $n_A$ and $n_B$ are the total sample sizes in each group.
77 |
78 | Under the null hypothesis $H0: p_A=p_B=p_{pooled}$. The pooled p is of some form yet unknown.
79 | Knowing that:
80 |
81 | ``` math
82 | \begin{equation}
83 | \begin{aligned}
84 | L(p_{pooled}) &= \binom{n_A}{x_A} \binom{n_B}{x_B} p_{pooled}^{x_A + x_B} (1 - p_{pooled})^{(n_A - x_A) + (n_B - x_B)} \\
85 | &= \binom{n_A}{x_A} \binom{n_B}{x_B} p_{pooled}^{x_A + x_B} (1 - p_{pooled})^{(n_A + n_B - x_A - x_B)}
86 | \end{aligned}
87 | \hspace{2cm} (7)
88 | \end{equation}
89 | ```
90 | we obtain a single binomial likelihood where the total number of successes is $x_A+x_B$ and the total number of trials is $n_A+n_B$,
91 | i.e.
92 |
93 | ``` math
94 | \begin{equation}
95 | $X_A+X_B~Binomial\left(n_A+n_B,p_{pooled}\right)
96 | \hspace{2cm} (8)
97 | \end{equation}
98 | ```
99 |
100 | **Note**: The binomial coefficients are multiplied rather than pooled, as they originally come from two independent samples, the true $H_0$ is an assumption, not the property of the data. Actually, the choice doesn’t matter, as this term will be zeroed when taking the derivative.
101 |
102 | Let’s simplify notation and replace $p_{pooled}$ with just p. Now, the log-likelihood, $log(L(p))=\ell\(p)$, is defined as:
103 |
104 | ``` math
105 | \begin{equation}
106 | \ell(p)=const\bullet(x_A+x_B)\bullet log(p)+(n_A+n_B-x_A-x_B)\bullet log(1-p)
107 | \hspace{2cm} (9)
108 | \end{equation}
109 | ```
110 |
111 | (I wrote _const_ to highlight that this term will disappear after taking the derivative).
112 | Now by taking ${\frac{d\ell\left(p\right)}{dp}}$ and setting it to 0, we obtain (provided that $p\notin0,1$):
113 |
114 | ``` math
115 | \begin{equation}
116 | \begin{aligned}
117 | \frac{d\ell\left(p\right)}{dp}=\frac{x_A+x_B}{p}-\frac{n_A+n_B-x_A-x_B}{1-p}=0, \\
118 | \left(x_A+x_B\right)\left(1-p\right)=\left(n_A+n_B-x_A-x_B\right)\bullet p \\
119 | x_A+x_B={(n}_A+n_B)\bullet p\\
120 | p\ =\ \frac{x_A+x_B}{n_A+n_B}
121 | \end{aligned}
122 | \hspace{2cm} (10)
123 | \end{equation}
124 | ```
125 |
126 | Or, alternatively, since $p_i={\frac{x_i}{n_i}}$:
127 |
128 | ``` math
129 | \begin{equation}
130 | p=\frac{p_An_A+p_Bn_B}{n_A+n_B}
131 | \hspace{2cm} (11)
132 | \end{equation}
133 | ```
134 |
135 | **Log-likelihood of the logistic regression with a single binary predictor**
136 | The log-likelihood function is of the form:
137 |
138 | ``` math
139 | \begin{equation}
140 | \ell(\beta)=log(L(\beta))=log\left(\prod_{i=1}^{n}p_i^{y_i}\bullet\left(1-p_i\right)^{1-y_i}\right)=\sum_{i=1}^{n}{{[y}_ilog}(p_i)+(1-y_i)log(1-p_i)]
141 | \hspace{2cm} (12)
142 | \end{equation}
143 | ```
144 | where β is the vector of estimated coefficients, i.e. $(β_0, β_1)$.
145 |
146 | Let’s express p and 1-p in terms of $η=β_0+Xβ_1$:
147 |
148 | ``` math
149 | \begin{equation}
150 | \begin{cases}
151 | p=\frac{e^\eta}{1+e^\eta} \\
152 | 1-p=\frac{1}{1+e^\eta}
153 | \end{cases}
154 | \hspace{2cm} (13)
155 | \end{equation}
156 | ```
157 | Then:
158 | ``` math
159 | \begin{equation}
160 | \begin{aligned}
161 | \ell\left(\beta\right)&=\sum_{i=1}^{n}\left[y_ilog\left(\frac{e^{\eta_i}}{1+e^{\eta_i}}\right)+\left(1-y_i\right)log\left(\frac{1}{1+e^{\eta_i}}\right)\right] \\
162 | &=\sum_{i=1}^{n}\left[y_i\left(log\left(e^{\eta_i}\right)-log\left(1+e^{\eta_i}\right)\right)-\left(1-y_i\right)log\left(1+e^{\eta_i}\right)\right] \\
163 | &=\sum_{i=1}^{n}\left[y_i\eta_i\ -y_ilog\left(1+e^{\eta_i}\right)-\left(1-y_i\right)log\left(1+e^{\eta_i}\right)\right] \\
164 | &=\sum_{i=1}^{n}{y_i\eta_i-log\left(1+e^{\eta_i}\right)}
165 | \end{aligned}
166 | \hspace{2cm} (14)
167 | \end{equation}
168 | ```
169 |
170 | **Gradient, Score, Hessian, Fisher Information, Covariance…**
171 |
172 | We will need both the gradient and Hessian of the log-likelihood. For future use, we will call the gradient as _Rao score function_ denoted by “U” and the Hessian as “H”.
173 | First, let’s find the form of U(β):
174 | ``` math
175 | \begin{equation}
176 | U\left(\beta\right)=\left[\begin{matrix}\frac{\partial\ell\left(\beta\right)}{\partial\beta_0}\\\frac{\partial\ell\left(\beta\right)}{\partial\beta_1}\\\end{matrix}\right]
177 | \hspace{2cm} (15)
178 | \end{equation}
179 | ```
180 |
181 | By noticing that ${\frac{\partial\eta_i}{\partial\beta_0}}=1 , {\frac{\partial\eta_i}{\partial\beta_1}}=x_i$ and remembering that $p=\frac{e^\eta}{1+e^\eta}$ we obtain:
182 |
183 | ``` math
184 | \begin{equation}
185 | \frac{\partial\ell\left(\beta\right)}{\partial\beta_0}=\sum_{i=1}^{n}{\left(y_i\frac{\partial\eta_i}{{\partial\beta}_0}-\frac{1}{1+e^{\eta_i}}e^{\eta_i}\frac{\partial\eta_i}{{\partial\beta}_0}\right)=\sum_{i=1}^{n}\left(y_{i\bullet}1-\frac{e^{\eta_i}}{1+e^{\eta_i}}\bullet1\right)=\sum_{i=1}^{n}\left(y_i-p_i\right)}
186 | \hspace{2cm} (16)
187 | \end{equation}
188 | ```
189 | and
190 |
191 | ``` math
192 | \begin{equation}
193 | \frac{\partial\ell\left(\beta\right)}{\partial\beta_1}=\sum_{i=1}^{n}{\left(y_i\frac{\partial\eta_i}{{\partial\beta}_1}-\frac{1}{1+e^{\eta_i}}e^{\eta_i}\frac{\partial\eta_i}{{\partial\beta}_1}\right)=\sum_{i=1}^{n}\left(y_ix_i-\frac{e^{\eta_i}}{1+e^{\eta_i}}x_i\right)=\sum_{i=1}^{n}{x_i\left(y_i-p_i\right)}}
194 | \hspace{2cm} (17)
195 | \end{equation}
196 | ```
197 | So finally:
198 | ``` math
199 | \begin{equation}
200 | U\left(\beta\right)=\left[\begin{matrix}\sum_{i=1}^{n}\left(y_i-p_i\right)\\\sum_{i=1}^{n}{x_i\left(y_i-p_i\right)}\\\end{matrix}\right]
201 | \hspace{2cm} (18)
202 | \end{equation}
203 | ```
204 |
205 | Now, the Hessian:
206 | ``` math
207 | \begin{equation}
208 | H\left(\beta\right)=\left[\begin{matrix}\frac{\partial^2\ell\left(\beta\right)}{\partial\beta_0^2}&\frac{\partial^2\ell\left(\beta\right)}{\partial\beta_0\beta_1}\\\frac{\partial^2\ell\left(\beta\right)}{\partial\beta_1\beta_0}&\frac{\partial^2\ell\left(\beta\right)}{\partial\beta_1^2}\\\end{matrix}\right]
209 | \hspace{2cm} (19)
210 | \end{equation}
211 | ```
212 | The partial derivatives are as follows:
213 | ``` math
214 | \begin{equation}
215 | \frac{\partial^2\ell\left(\beta\right)}{\partial\beta_0^2}=\frac{\partial}{\partial\beta_0}\sum_{i=1}^{n}\left(y_i-p_i\right)=\sum_{i=1}^{n}{-\frac{\partial p_i}{\partial\beta_0}}=-\sum_{i=1}^{n}{\frac{\partial p_i}{\partial\beta_0}=-}\sum_{i=1}^{n}{\frac{\partial p_i}{\partial\eta_0}\frac{\partial\eta_0}{\partial\beta_0}=}-\sum_{i=1}^{n}{p_i\left(1-p_i\right)}
216 | \hspace{1cm} (20)
217 | \end{equation}
218 | ```
219 | ``` math
220 | \begin{equation}
221 | \frac{\partial^2\ell\left(\beta\right)}{\partial\beta_1^2}=\frac{\partial}{\partial\beta_1}\sum_{i=1}^{n}{x_i\left(y_i-p_i\right)}=\sum_{i=1}^{n}{x_i\left(-\frac{\partial p_i}{\partial\beta_1}\right)}=-\sum_{i=1}^{n}{x_ip_i\left(1-p_i\right)}x_i=-\sum_{i=1}^{n}{x_i^2p_i\left(1-p_i\right)}
222 | \hspace{1cm} (21)
223 | \end{equation}
224 | ```
225 | ``` math
226 | \begin{equation}
227 | \frac{\partial^2\ell\left(\beta\right)}{\partial\beta_0\beta_1}=\frac{\partial^2\ell\left(\beta\right)}{\partial\beta_1\beta_0}=\frac{\partial}{\partial\beta_1}\sum_{i=1}^{n}\left(y_i-p_i\right)=-\sum_{i=1}^{n}\frac{\partial p_i}{\partial\beta_1}=-\sum_{i=1}^{n}{p_i\left(1-p_i\right)}x_i
228 | \hspace{1cm} (22)
229 | \end{equation}
230 | ```
231 | Therefore:
232 | ``` math
233 | \begin{equation}
234 | H\left(\beta\right)=\left[\begin{matrix}-\sum_{i=1}^{n}{p_i\left(1-p_i\right)}&-\sum_{i=1}^{n}{p_i\left(1-p_i\right)}x_i\\-\sum_{i=1}^{n}{p_i\left(1-p_i\right)}x_i&-\sum_{i=1}^{n}{x_i^2p_i\left(1-p_i\right)}\\\end{matrix}\right]
235 | \hspace{1cm} (23)
236 | \end{equation}
237 | ```
238 | Let’s also determine the Fisher Information matrix:
239 | ``` math
240 | \begin{equation}
241 | I\left(\beta\right)=-E(H(\beta))=\left[\begin{matrix}\sum_{i=1}^{n}{p_i\left(1-p_i\right)}&\sum_{i=1}^{n}{p_i\left(1-p_i\right)}x_i\\\sum_{i=1}^{n}{p_i\left(1-p_i\right)}x_i&\sum_{i=1}^{n}{x_i^2p_i\left(1-p_i\right)}\\\end{matrix}\right]=\left[\begin{matrix}\sum_{i=1}^{n}{p_iq_i}&\sum_{i=1}^{n}{p_iq_i}x_i\\\sum_{i=1}^{n}{p_iq_i}x_i&\sum_{i=1}^{n}{p_iq_i}x_i^2\\\end{matrix}\right]
242 | \hspace{1cm} (24)
243 | \end{equation}
244 | ```
245 | where $q_i=1-p_i$.
246 |
247 | This can be further expanded, by substituting sums with appropriate (per respective group, A and B) counts of elements,
248 | remembering that $n=n_A+n_B$, A: X=0, B: X=1, $p_A=p(Y=1|X=A)$ and $p_B=p(Y=1|X=B)$, and:
249 |
250 | ``` math
251 | \begin{equation}
252 | \sum_{i=1}^{n}{p_i=}\sum_{i:\ X_i=0} p_i+\sum_{i:\ X_i=1} p_i=\sum_{i=1}^{n_A}p_i+\sum_{i=n_A+1}^{{n_B+n}_B}p_i=n_Ap_A\ +\ n_Bp_B
253 | \hspace{2cm} (25)
254 | \end{equation}
255 | ```
256 | So the final, useful form is:
257 | ``` math
258 | \begin{equation}
259 | I\left(\beta\right)=\left[\begin{matrix}\sum_{i:\ X_i=0}{p_iq_i}+\sum_{i:\ X_i=1}{p_iq_i}&\sum_{i:\ X_i=1}{1p_iq_i}\\\sum_{i:\ X_i=1}{1p_iq_i}&\sum_{i:\ X_i=1}{1p_iq_i}\\\end{matrix}\right]=\left[\begin{matrix}n_Ap_Aq_A+n_Bp_Bq_B&n_Bp_Bq_B\\n_Bp_Bq_B&n_Bp_Bq_B\\\end{matrix}\right]
260 | \hspace{1cm} (26)
261 | \end{equation}
262 | ```
263 | This matrix will be used to find the covariance one:
264 | ``` math
265 | \begin{equation}
266 | {I\left(\beta\right)}^{-1}\ =\ \Sigma\left(\beta\right)=\left[\begin{matrix}var(\beta_0)&cov(\beta_0,\beta_1)\\cov(\beta_0,\beta_1&var(\beta_1)\\\end{matrix}\right]
267 | \hspace{2cm} (27)
268 | \end{equation}
269 | ```
270 |
271 | -----
272 | **Another way to obtain I(β)**
273 | The covariance matrix can also be obtained from:
274 | ``` math
275 | \begin{equation}
276 | I\left(\beta\right)=X^TWX
277 | \hspace{2cm} (28)
278 | \end{equation}
279 | ```
280 | where X is the design matrix with 2 columns ($β_0$ of 1s and $β_1$ indicating when X=1), with $n_A$ and $n_B$ number of rows corresponding to group A and B, respectively.
281 |
282 | ``` math
283 | \begin{equation}
284 | X=\left[\begin{matrix}1&A=0\\1&A=0\\\vdots&\vdots\\1&A=0\\1&B=1\\\vdots&\vdots\\1&B=1\\\end{matrix}\right]\
285 | \hspace{2cm} (29)
286 | \end{equation}
287 | ```
288 | Now, W is the diagonal matrix of weights, of the block-diagonal form:
289 | ``` math
290 | \begin{equation}
291 | W=diag\left(p_i\times\left(1-p_i\right)\right)=\left[\begin{matrix}p_Aq_A&0&\ldots&0&0&0&\ldots&0\\0&p_Aq_A&\cdots&0&0&0&\ldots&0\\\vdots&\vdots&\ddots&\vdots&\vdots&\vdots&\ddots&\vdots\\0&0&\ldots&p_Aq_A&0&0&\ldots&0\\0&0&\ldots&0&p_Bq_B&0&\ldots&0\\0&0&\ldots&0&0&p_Bq_B&\ldots&0\\\vdots&\vdots&\ddots&\vdots&\vdots&\vdots&\ddots&\vdots\\0&0&\ldots&0&0&0&\ldots&p_Bq_B\\\end{matrix}\right]
292 | \hspace{2cm} (30)
293 | \end{equation}
294 | ```
295 | which can be expressed in a simpler form as:
296 |
297 | ``` math
298 | \begin{equation}
299 | W=diag\left(p_i\times\left(1-p_i\right)\right)=\left[\begin{matrix}p_Aq_AI_{nA}&0\\0&p_Bq_BI_{nB}\\\end{matrix}\right]
300 | \hspace{2cm} (31)
301 | \end{equation}
302 | ```
303 | where $I_nA$ and $I_nB$ are respective identity matrices.
304 |
305 | The result of the matrix multiplication can be expressed as appropriate sums:
306 |
307 | ``` math
308 | \begin{equation}
309 | X^TWX=\left[\begin{matrix}\sum_{i=1}^{n}{1p_iq_i}&\sum_{i=1}^{n}{x_ip_iq_i}\\\sum_{i=1}^{n}{x_ip_iq_i}&\sum_{i=1}^{n}{x_i^2p_iq_i}\\\end{matrix}\right]=I\left(\beta\right)
310 | \hspace{2cm} (32)
311 | \end{equation}
312 | ```
313 | where 1 is the result of multiplying 1 x 1 (the $β_0$ vector), and X refer to the other products of the $β_0$ and $β_1$ vectors.
314 | Notice, that this is **exactly the matrix #24**.
315 |
316 | -----
317 | **The Rao Score and Information Matrix under $H_0: β_1=0$**
318 |
319 | _The Rao Score_
320 |
321 | The Rao Score and Information Matrix under $H0: β_1=0$
322 | Recall the formula #18:
323 |
324 | ``` math
325 | \begin{equation}
326 | U\left(\beta\right)=\left[\begin{matrix}\sum_{i=1}^{n}\left(y_i-p_i\right)\\\sum_{i=1}^{n}{x_i\left(y_i-p_i\right)}\\\end{matrix}\right]
327 | \hspace{2cm} (18)
328 | \end{equation}
329 | ```
330 | Again, $n_A$ and $n_B$ depict the respective number of rows corresponding to group A and B,
331 | for group A: X=0 and for B: X =1 and $n=n_A+n_B$. Also, under $H_0$, $p_A=p_B=p$ (i.e. $\forall_i{\ p}_i=p$), where p is the pooled probability.
332 |
333 | ``` math
334 | \begin{equation}
335 | U\left(\beta_0,\ \beta_1=0\right)=\left[\begin{matrix}\left(\sum_{i=1}^{n}y_i\right)-(n_{A\ }+n_B)\bullet p\\\left(\sum_{i:\ X_i=1} y_i\right){-\ n}_Bp\\\end{matrix}\right]
336 | \hspace{2cm} (33)
337 | \end{equation}
338 | ```
339 |
340 | Here, $y_i$ is the response vector containing **1**s in respective group. Summing all those **1**s yields the total number of successes in this group, which can be expressed as:
341 |
342 | ``` math
343 | \begin{equation}
344 | \sum_{i=1}^{n} y_i = \sum_{i:\ X_i\in\{0, 1\}} y_i = n_A p_A + n_B p_B \overset{H_0}{\Rightarrow} (n_A + n_B) \cdot p
345 | \hspace{2cm} (34)
346 | \end{equation}
347 | ```
348 | So, the first element of the vector becomes 0 out and finally:
349 | ``` math
350 | \begin{equation}
351 | U\left(\beta_0,\ \beta_1=0\right)=\left[\begin{matrix}0\\n_B(p_B-p)\\\end{matrix}\right]
352 | \hspace{2cm} (35)
353 | \end{equation}
354 | ```
355 |
356 | _Information Matrix_
357 |
358 | By recalling **matrix #26** and remembering that under $H_0$, $p_A=p_B=p$ we obtain:
359 | ``` math
360 | \begin{equation}
361 | I\left(\beta_0,\ \beta_1=0\right)=\left[\begin{matrix}pq(n_A+n_B)&n_Bpq\\n_Bpq&n_Bpq\\\end{matrix}\right]
362 | \hspace{2cm} (36)
363 | \end{equation}
364 | ```
365 | Let’s also calculate $I^{-1}$
366 |
367 | ``` math
368 | \begin{equation}
369 | \mathrm{\Sigma}={I\left(\beta_0,\ \beta_1=0\right)}^{-1}=\frac{1}{pq(n_A+n_B)\bullet n_Bpq-\left(n_Bpq\right)^2}\left[\begin{matrix}n_Bpq&-n_Bpq\\-n_Bpq&pq(n_A+n_B)\\\end{matrix}\right]
370 | \hspace{1cm} (37)
371 | \end{equation}
372 | ```
373 | After simplifying the denominator term:
374 |
375 | ``` math
376 | \begin{equation}
377 | pq(n_A+n_B)\bullet n_Bpq-\left(n_Bpq\right)^2=p^2q^2n_An_B\ +\ p^2q^2n_B^2-p^2q^2n_B^2\ =p^2q^2n_An_B
378 | \hspace{1cm} (38)
379 | \end{equation}
380 | ```
381 | we finally obtain:
382 |
383 | ``` math
384 | \begin{equation}
385 | \mathrm{\Sigma}(\beta_0,\ \beta_1=0)={I\left(\beta_0,\ \beta_1=0\right)}^{-1}=\frac{1}{p^2q^2n_An_B}\left[\begin{matrix}n_Bpq&-n_Bpq\\-n_Bpq&pq(n_A+n_B)\\\end{matrix}\right]=\left[\begin{matrix}\frac{1}{n_Apq}&-\frac{1}{n_Apq}\\-\frac{1}{n_Apq}&\frac{n_A+n_B}{n_An_Bpq}\\\end{matrix}\right]
386 | \hspace{1cm} (39)
387 | \end{equation}
388 | ```
389 |
390 | **Rao score test under $H_0: β_1=0$**
391 |
392 | The Rao score test (called also Lagrange multiplier test) under $H_0$ is defined as the following quadratic form:
393 |
394 | ``` math
395 | \begin{equation}
396 | R={U(\beta_0,\ \beta_1=0)}^T\bullet {I(\beta_0,\ \beta_1=0)}^{-1}\bullet U(\beta_0,\ \beta_1=0)
397 | \hspace{2cm} (40)
398 | \end{equation}
399 | ```
400 | But since the first element of U is 0, this reduces to just scalar operation: $U^2/I = U^2Σ$:
401 |
402 | ``` math
403 | \begin{equation}
404 | \begin{aligned}
405 | R&=U\left(\beta_0,\beta_1=0\right)\mathrm{\Sigma}\left(\beta_0,\beta_1=0\right)= \\
406 | &=\left[n_B\left(p_B-p\right)\right]^2\bullet\frac{n_A+n_B}{pqn_An_B}=\frac{\left[n_B\left(p_B-p\right)\right]^2}{pq\frac{n_An_B}{n_A+n_B}} \\
407 | &=\frac{n_B^2\left(p_B-\frac{n_Ap_A+n_Bp_B}{n_A+n_B}\right)^2}{pq\frac{n_An_B}{n_A+n_B}}=\frac{{n_B^2\left(\frac{p_Bn_A+p_Bn_B-n_Ap_A-n_Bp_B}{n_A+n_B}\right)}^2}{pq\frac{n_An_B}{n_A+n_B}} \\
408 | &=\frac{\frac{n_B^2\left(p_Bn_A-n_Ap_A\right)^2}{\left(n_A+n_B\right)^2}}{pq\frac{n_An_B}{n_A+n_B}}=\frac{\frac{n_B^2n_A^2\left(p_B-p_A\right)^2}{\left(n_A+n_B\right)^2}}{pq\frac{n_An_B}{n_A+n_B}} \\
409 | &=\frac{\left(p_B-p_A\right)^2}{pq\frac{n_An_B}{n_A+n_B}\frac{\left(n_A+n_B\right)^2}{n_B^2n_A^2}}=\frac{\left(p_B-p_A\right)^2}{pq\frac{n_A+n_B}{n_An_B}}=\frac{{{(p}_B-p_A)}^2}{pq\left(\frac{1}{n_B}+\frac{1}{n_A}\right)} \\
410 | &=\frac{{{(p}_B-p_A)}^2}{p\left(1-p\right)\left(\frac{1}{n_A}+\frac{1}{n_B}\right)}=z^2\blacksquare
411 | \end{aligned}
412 | \hspace{2cm} (41)
413 | \end{equation}
414 | ```
415 |
416 | This way I have proven the equivalence of the 2-sample Wald’s z-statistic for comparing proportions with pooled variances and the Rao score test over the logistic regression with a single binary predictor distinguishing the compared samples.
417 |
418 | ``` r
419 | > wald_z_test_pooled(x1 = 6, n1 = 20, x2 = 10, n2 = 20)
420 | diff z chi2 se p.value p.value_1 LCI HCI
421 | 1 -0.2 -1.290994 1.666667 0.1549193 0.1967056 0.0983528 -0.5036363 0.1036363
422 |
423 | > prop.test(c(6,10), c(20,20), correct = FALSE)
424 |
425 | 2-sample test for equality of proportions without continuity correction
426 |
427 | data: c(6, 10) out of c(20, 20)
428 | X-squared = 1.6667, df = 1, p-value = 0.1967
429 | alternative hypothesis: two.sided
430 | 95 percent confidence interval:
431 | -0.49724326 0.09724326
432 | sample estimates:
433 | prop 1 prop 2
434 | 0.3 0.5
435 |
436 | > data <- data.frame(response = factor(c(rep("Success", 6), rep("Failure", 20-6),
437 | + rep("Success", 10), rep("Failure", 20-10))),
438 | + grp = factor(rep(c("B", "A"), each=20)))
439 |
440 | > m <- glm(response ~ grp, data = data, family = binomial(link = "logit"))
441 |
442 | > anova(m, test = "Rao")
443 | Analysis of Deviance Table
444 |
445 | Model: binomial, link: logit
446 | Response: response
447 | Terms added sequentially (first to last)
448 |
449 | Df Deviance Resid. Df Resid. Dev Rao Pr(>Chi)
450 | NULL 39 53.841
451 | grp 1 1.6805 38 52.160 1.6667 0.1967
452 | ```
453 |
454 | 
455 |
456 | The implementation of the z statistic:
457 | ``` r
458 | wald_z_test_pooled <- function(x1, n1, x2, n2, conf.level=0.95) {
459 | p1 <- x1/n1
460 | p2 <- x2/n2
461 |
462 | p_pool <- (p1*n1 + p2*n2) / (n1+n2)
463 | se_p1 <- sqrt(p_pool * (1 - p_pool) / n1)
464 | se_p2 <- sqrt(p_pool * (1 - p_pool) / n2)
465 |
466 | se_diff <- sqrt(se_p1^2 + se_p2^2)
467 |
468 | z <- (p1 - p2) / se_diff
469 | p <- 2 * (1 - pnorm(abs(z)))
470 | hCI <- abs(qnorm((1 - conf.level)/2)) * se_diff
471 |
472 | return(data.frame(diff=p1-p2,
473 | z = z,
474 | chi2 = z^2,
475 | se = se_diff, p.value = p, p.value_1 =p/2,
476 | LCI = (p1-p2) - hCI,
477 | HCI = (p1-p2) + hCI,
478 | row.names = NULL))
479 | }
480 | ```
481 |
--------------------------------------------------------------------------------