├── Pearson_chi2_Wald_z_test_proportions.md ├── README.md ├── Testing hypotheses about proportions using logistic regression.md ├── logistic_regression_AME_Wald_z_test_proportions.md └── logistic_regression_Rao_Wald_z_test_proportions.md /Pearson_chi2_Wald_z_test_proportions.md: -------------------------------------------------------------------------------- 1 | # Proving the equivalence between the 2-sample Wald’s z-statistic for comparing proportions with pooled variances and the Pearson’s χ2 (independence) test for a 2×2 contingency table. 2 | #### Adrian Olszewski, 27.03.2025 3 | 4 | The Wald’s z-statistic for difference in 2 proportions with unpooled variances is of the following form: 5 | 6 | ``` math 7 | \begin{equation} 8 | z=\ \frac{\widehat{p_B}-\widehat{p_A}}{\sqrt{p\left(1-p\right)\left(\frac{1}{n_A}+\frac{1}{n_B}\right)}};\ z~N(0,\ 1) 9 | \hspace{2cm} (1) 10 | \end{equation} 11 | ``` 12 | Where $\widehat{p_A}\$ stands for the estimated probability (sample proportion, %) in the 1st group, 13 | $\widehat{p_B}$ is the estimated probability in the 2nd group, $n_A$ and $n_B$ denote respective group sizes, and p is the pooled probability $p=\frac{x_A+x_B}{n_A+n_B}=\frac{p_An_A+p_Bn_B}{n_A+n_B}$. 14 | 15 | Traditionally the Wald’s statistic is expressed in the squared form, becoming: $z^2=\chi_{df=1}^2$. 16 | Both forms yield the same p-value. For convenience I will show that this $\chi_{df=1}^2$ statistic is 1:1 equivalent to the $χ^2$ (independence) test for a 2×2 contingency table. 17 | 18 | Let’s define the form of the 2×2 contingency table with the observed (O) counts: 19 | | Status
Group | Success
(S) | Failure
(F) | Total per group
(ni·) | 20 | |-----------------------------------------:|:----------------:|:---------------:|:------------------------:| 21 | | **A** | $n_{11} = O_{SA}$ | $n_{12} = O_{FA}$ | $n_A = n_{1·}= n_{11} + n_{12}$ | 22 | | **B** | $n_{21} = O_{SB}$ | $n_{22} = O_{FB}$ | $n_B = n_{2·}= n_{21} + n_{22}$ | 23 | | **Total per status (n·i)** | $n_{·1} = n_{11} + n_{21}$ | $n_{·2} = n_{12} + n_{22}$ | $n (=n_A + n_B)$ | 24 | 25 | Where $O_{SA}$ stands for “observed number of successes in group A”, $O_{FA}$ stands for “observed number of failures in group A”, and so on. 26 | The test statistic is defined as follows: 27 | 28 | ``` math 29 | \begin{equation} 30 | X^2=\sum_{r=1}^{2}\sum_{c=1}^{2}\frac{\left(O_{rc}-E_{rc}\right)^2}{E_{rc}};\ X^2~\chi^2(df=1) 31 | \hspace{2cm} (2) 32 | \end{equation} 33 | ``` 34 | 35 | Let’s express the observed and expected number of events as products of totals and probabilities, remembering that the expected number (E) is calculated under $H_0$, i.e. using pooled probability p: 36 | 37 | ``` math 38 | \begin{equation} 39 | O = \begin{cases} 40 | O_{11}=O_{SA}=n_Ap_A \\ 41 | O_{12}=O_{FA}=n_A\left(1-p_A\right) \\ 42 | O_{21}=O_{SB}=n_Bp_B \\ 43 | {O_{22}=O}_{FB}=n_B\left(1-p_B\right) 44 | \end{cases} 45 | \hspace{2cm} (3) 46 | \end{equation} 47 | ``` 48 | and 49 | ``` math 50 | \begin{equation} 51 | E = \begin{cases} 52 | E_{11}=E_{SA}=n_Ap \\ 53 | E_{12}=E_{FA}=n_A\left(1-p\right) \\ 54 | E_{21}=E_{SB}=n_Bp \\ 55 | {E_{22}=E}_{FB}=n_B\left(1-p\right) 56 | \end{cases} 57 | \hspace{2cm} (4) 58 | \end{equation} 59 | ``` 60 | 61 | Let’s substitute the O and E elements in the $X^2$ test statistic: 62 | 63 | ``` math 64 | \begin{equation} 65 | \begin{aligned} 66 | X^2&=\frac{\left(n_Ap_A-n_Ap\right)^2}{n_Ap}+\frac{\left(n_A\left(1-p_A\right)-n_A\left(1-p\right)\right)^2}{n_A\left(1-p\right)}+\frac{\left(n_Bp_B-n_Bp\right)^2}{n_Bp}+\frac{\left(n_B\left(1-p_B\right)-n_B\left(1-p\right)\right)^2}{n_B\left(1-p\right)} \\ 67 | &=\frac{\left(n_A\left(p_A-p\right)\right)^2}{n_Ap}+\frac{\left(n_A\left(p-p_A\right)\right)^2}{n_A\left(1-p\right)}+\frac{\left(n_B\left(p_B-p\right)\right)^2}{n_Bp}+\frac{\left(n_B\left(p-p_B\right)\right)^2}{n_B\left(1-p\right)} \\ 68 | &=\frac{{n_A\left(p_A-p\right)}^2}{p}+\frac{{n_A\left(p-p_A\right)}^2}{\left(1-p\right)}+\frac{{n_B\left(p_B-p\right)}^2}{p}+\frac{{n_B\left(p-p_B\right)}^2}{\left(1-p\right)} \\ 69 | &=\frac{\left(1-p\right)n_A\left(p_A-p\right)^2+pn_A\left(p-p_A\right)^2}{p\left(1-p\right)}+\frac{{{\left(1-p\right)n}_B\left(p_B-p\right)}^2+pn_B\left(p-p_B\right)^2}{p\left(1-p\right)} \\ 70 | &=\frac{n_A\left(p_A-p\right)^2\left(1-p+p\right)}{p\left(1-p\right)}+\frac{n_B\left(p_B-p\right)^2\left(1-p+p\right)}{p\left(1-p\right)}=\frac{n_A\left(p_A-p\right)^2}{p\left(1-p\right)}+\frac{n_B\left(p_B-p\right)^2}{p\left(1-p\right)} \\ 71 | &=\frac{n_A\left(p_A-\frac{p_An_A+p_Bn_B}{n_A+n_B}\right)^2+n_B\left(p_B-\frac{p_An_A+p_Bn_B}{n_A+n_B}\right)^2}{p\left(1-p\right)} \\ 72 | &=\frac{n_A\left(\frac{p_An_B-p_Bn_B}{n_A+n_B}\right)^2+n_B\left(\frac{p_Bn_A-p_An_A}{n_A+n_B}\right)^2}{p\left(1-p\right)}=\frac{n_A\left(\frac{n_B\left(p_A-p_B\right)}{n_A+n_B}\right)^2+n_B\left(\frac{n_A\left(p_B-p_A\right)}{n_A+n_B}\right)^2}{p\left(1-p\right)} \\ 73 | &=\frac{\frac{n_An_B^2\left(p_A-p_B\right)^2}{\left(n_A+n_B\right)^2}+\frac{n_Bn_A^2\left(p_A-p_B\right)^2}{\left(n_A+n_B\right)^2}}{p\left(1-p\right)}=\frac{\frac{\left(p_A-p_B\right)^2\left(n_An_B^2+n_Bn_A^2\right)}{\left(n_A+n_B\right)^2}}{p\left(1-p\right)}=\frac{\frac{{{(p}_A-p_B)}^2\left(n_An_B\right)\left(n_B+n_A\right)}{\left(n_A+n_B\right)^2}}{p\left(1-p\right)} \\ 74 | \end{aligned} 75 | \hspace{2cm} (5a) 76 | \end{equation} 77 | ``` 78 | 79 | ``` math 80 | \begin{equation} 81 | \begin{aligned} 82 | &=\frac{\frac{{{n_An_B(p}_A-p_B)}^2}{n_A+n_B}}{p\left(1-p\right)}=\frac{{{(p}_A-p_B)}^2}{p\left(1-p\right)\frac{n_A+n_B}{n_An_B}}=\frac{{{(p}_A-p_B)}^2}{p\left(1-p\right)\left(\frac{1}{n_A}+\frac{1}{n_B}\right)}\blacksquare 83 | \end{aligned} 84 | \hspace{2cm} (5b) 85 | \end{equation} 86 | ``` 87 | This way I have proven that the z2 statistic is equivalent to the Pearson’s χ2 statistic for 2×2 table. 88 | 89 | ----- 90 | By the way, it is worth noticing, that in the χ2 test, the expected frequencies (E) in each column are based on the pooled proportion [p] which is just the weighted average proportion across both groups. 91 | So, in other words, this test evaluates how much each group’s proportion $p_A$ and $p_B$ deviates from this overall average (p). And this effectively comparing the two groups directly, which is what the z-test actually does. 92 | 93 | ----- 94 | 95 | ``` r 96 | > (m <- matrix(c(16, 2, 12, 11), 97 | + nrow = 2, ncol = 2, 98 | + dimnames=list(c("A", "B"), 99 | + c("Success", "Failure")))) 100 | Success Failure 101 | A 16 12 102 | B 2 11 103 | > 104 | > prop.test(m, correct = FALSE) 105 | 106 | 2-sample test for equality of proportions without continuity correction 107 | 108 | data: m 109 | X-squared = 6.2859, df = 1, p-value = 0.01217 110 | alternative hypothesis: two.sided 111 | 95 percent confidence interval: 112 | 0.1491317 0.6860332 113 | sample estimates: 114 | prop 1 prop 2 115 | 0.5714286 0.1538462 116 | 117 | > chisq.test(m, correct = FALSE) 118 | 119 | Pearson's Chi-squared test 120 | 121 | data: m 122 | X-squared = 6.2859, df = 1, p-value = 0.01217 123 | ``` 124 | 125 | ![obraz](https://github.com/user-attachments/assets/96158adb-97c5-4299-98f2-16a132760912) 126 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Logistic-regression-is-regression 2 | Despite the popular yet wrong claim that logistic regression "is not a regression", it's one of the key regression and hypothesis testing tools in epxerimental research (like clinical trials). I will share information from my field to break the bizzare situation, when people from Machine Learning tell me that "what we do every day cannot be done". 3 | 4 | # Logistic regression has been a regression since its birth - and is used this way every day 5 | 6 | Well, it's kinda... embarrassing for me to write about something that is (should be) obvious to anyone working with statistics but in the last decade has been distorted by hundreds of thousands of members of the #machinelearning community, so today lie replaced the truth... 7 | 8 | I remember the first time, when, during some discussion, I said that "I've been using logistic regression for long years on daily basis for 9 | regression and testing hypotheses, but I've never used it for classification" and a Data Scientist (with PhD degree) told me, that I 10 | must have been mistaken, because "*despite its name logistic regression not a regression algorithm*". I asked him "*then tell me, please, what 11 | do I do every day at work?*😉" he replied "*I have no idea, but this sounds a pure nonsense, because logistic regression predicts only two 12 | binary outcomes so you understand it cannot be a regression*" 🤦 13 | 14 | I remember the first time, when, during some discussion, I said that "I've been using logistic regression for long years on daily basis for regression and testing hypotheses, but I've never used it for classification" and a Data Scientist (with PhD degree) told me, that I must have been mistaken, because "despite its name logistic regression not a regression algorithm". I asked him "then tell me, please, what do I do every day at work?😉" he replied "I have no idea, but this sounds a pure nonsense, because logistic regression predicts only two binary outcomes so you understand it cannot be a regression" 🤦 15 | 16 | 17 | 18 |

19 | 20 |

In such moments I wish there existed a triple facepalm too...

21 |

22 | 23 | Earlier, people (mostly researchers, statisticians) already were 24 | reporting that a similar situation happened to them during interviews 25 | and internet discussions. I did small research, which results knocked me 26 | off my feet. I "googled" for terms like "*logistic regression is not 27 | (a) regression*", "*logistic regression is a misnomer*" or 28 | "*logistic regression, despite its name*". The number of findings was 29 | huge - they occurred everywhere: in articles (Medium, Quora), tutorials 30 | (also issued by companies offering paid courses), blogs, courses, books 31 | (including bestsellers in ML written by people holding PhD), YouTube 32 | videos. I also repeated the search on LinkedIn and found endless flood 33 | of posts repeating this nonsense just copy-pasted from other posts. 34 | 35 | Not only that! I asked Chat GPT 3 (then 3.5) and got identical results. 36 | No surprise! It was "fed" by garbage, so it learned garbage, and today 37 | it helps spreading garbage to learners, who usually don't even suspect 38 | something is wrong, so they trust AI and repeat the nonsense further and 39 | further and\... 40 | 41 | \... there is no single week on LinkedIn without someone repeating it, 42 | earning hundreds of 👍 proving that hundreds of people liked (so tens of 43 | thousands saw it) it and will likely repeat the same. 44 | 45 |

46 | 47 |

48 | 49 | Finally I decided to write a few words about this "issue". I write 50 | from the perspective of a clinical biostatistician, working in clinical 51 | trials - part of the pharmaceutical industry responsible for both 52 | existing and new therapies (drugs, procedures, devices) evaluation and 53 | approval. Here the logistic regression is the key regression algorithm, 54 | used to answer questions about treatment efficacy and safety based on 55 | the data from clinical trials with binary endpoints (success/failure). 56 | And yes, that's true - I have never used it for classification during 57 | the whole time of my professional work. 58 | 59 | # Birth of the logistic regression and the\... Nobel Prize 60 | 61 | The origins of the logistic function can be traced back to the 19th 62 | century ([[free 63 | PDF]](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=360300)), 64 | where it was employed in a "*model of population growth*". Early 65 | attempts (1930s) to model binary data in the regression manner resulted 66 | in ***probit regression*** model, which constituted a standard for the 67 | next few decades. Researchers found the outcome not much intuitive, so 68 | they searched for a regression model, which coefficients would be easier 69 | to interpret. In already 1944 **Joseph Berkson** started working on the 70 | alternative to the probit model, and the "logit" (by analogy to 71 | "probit") model was born. Unfortunately, the logit model was rejected 72 | by many as *inferior to the probit model*. It took long years, until the 73 | logit model gained similar "trust" (1960-1970), finally refined by 74 | **Sir David Cox** ("Some procedures connected with the logistic 75 | qualitative response curve", 1966 and "[*[The regression analysis of 76 | binary sequences]*](https://www.jstor.org/stable/2983890)", 77 | 1968). 78 | 79 | / BTW, check also other titles of this Great Mind of Statistics! 80 | [[https://www.nuff.ox.ac.uk/Users/Cox/Publications.html]](https://www.nuff.ox.ac.uk/Users/Cox/Publications.html) 81 | / 82 | 83 | Almost in parallel with the *multinomial logit model* (Cox, Theil), 84 | which, finally, in 1973, allowed **Daniel McFadden**, a famous 85 | econometrician, to piece together existing puzzles, including the 86 | **Duncan Luce**'s *choice axiom*, into a whole, which resulted in a 87 | theoretical foundation for the ***logistic regression***. At that time, 88 | McFadden was deeply involved in pioneering work in developing the 89 | theoretical basis for *discrete choice* where he applied the logistic 90 | regression for empirical analysis. His work, making a profound impact on 91 | the analysis of discrete choice problems in economics and other fields, 92 | gave him the Nobel Prize in 2000. 93 | 94 | I think we can fairly say that Daniel McFadden's work on the logistic 95 | (ordinary and multinomial) regression model and the discrete choice 96 | analysis was truly groundbreaking. It played a significant role in 97 | establishing logistic regression as a solid tool in statistical 98 | analysis, not only in econometrics! 99 | 100 | Remember the rejection of the logit model, found inferior to the probit 101 | one? Now the situation reversed, and logistic regression today is the 102 | default approach. 103 | 104 | 1970s were truly fruitful to logistic regression! In 1972, **Sir John 105 | Nelder** and **Robert Weddeburn** in their seminal work ([[free 106 | PDF]](http://www.medicine.mcgill.ca/epidemiology/hanley/bios601/Likelihood/NelderWedderburn1972.pdf)) 107 | introduced the idea of a unified framework: the ***Generalized Linear 108 | Model (GLM)***, enabling regression models to cope with response 109 | variables of any type (counts, categories, continuous), relaxing also 110 | the assumption of normal distribution of errors. 111 | 112 | / Logistic regression is a special case of the GLM. You can spot it 113 | easily when working with R statistical package, when you call the glm() 114 | and specify the family of conditional distribution of the response, here 115 | "binomial" with appropriate link, here "logit": glm(family = 116 | binomial(link = "logit")) / 117 | 118 | Just a decade later, two other big names you know for sure, **Trevor 119 | Hastie** and **Robert Tibshirani** extended the Generalized Linear Model 120 | (logistic regression is a special case of it) to the **Generalized 121 | Additive Model**. In their articles (e.g. "Generalized Additive Models 122 | for Medical Research", 123 | [[https://doi.org/10.1177/096228029500400]](https://doi.org/10.1177/096228029500400) 124 | ) they mention the role of logistic regression in identification and 125 | adjustment for prognostic factors in clinical trials and observational 126 | studies. 127 | 128 | / Did you know that Professor Hastie authored the glm() command in the 129 | S-PLUS statistical suite (the father of R, origin of the R syntax), 130 | estimating the coefficients of the regression models and performing the 131 | inference? / 132 | 133 | Additional extensions for handling repeated observations were made by 134 | **Kung-Yee Liang** and **Scott L. Zeger** in 1986 via *Generalized 135 | Estimating Equations* (GEE) and **Breslow, Clayton and others** around 136 | 1993, when the theory of *Generalized Linear Mixed Models* (GLMM) was 137 | born. 138 | 139 | **I can only imagine McFadden's and others' reaction to the nonsense 140 | "logistic regression is not a regression"\...** 141 | 142 |

143 | 144 |

145 | 146 | # Conditional expectation - the key to understand the GLM 147 | 148 | Every regression describes a relationship between the predictor and some 149 | function of the conditional response. It can be a quantile, 150 | Qith(Y\|x=x), as in the quantile regression. Or some trimmed estimator 151 | of the expected value, like in the robust regression. Or - the expected 152 | value of the conditional response (=*conditional expectation*) itself, 153 | like in the classic linear regression: **E(Y\|X=x)**. 154 | 155 | / so often confused with one of the estimation algorithms \--\> "OLS 156 | regression" - don't repeat that. / 157 | 158 | Now, it's all about the conditional distribution. If it's Gaussian 159 | (normal distribution), you obtain the linear regression. But the GLM 160 | allows you to use also other distributions: Bernoulli (or binomial), 161 | gamma, Poisson, negative binomial, etc. The problem is that then the 162 | conditional expectations are not linearly related with the predictor, 163 | which is something we really want. That's why we have the **link 164 | function**, linking the conditional expectation and the predictor for a 165 | given conditional distribution: **g(E(Y\|X=x)) = Xb** (sometimes you 166 | will see this formula reversed: E(Y\|X=x) = g-1(Xb). It's equivalent 167 | formulation). 168 | 169 | Now, the expected values are "linearized" with respect to the 170 | predictor. For the ordinary linear regression you don't need that, so 171 | the g() is just I() (identity function, which we omit) - the expected 172 | values lay on a straight line, plane, or hyperplane (depending on how 173 | many predictors you have). 174 | 175 | / The name, *conditional expectation*, is also perfectly visible when 176 | you do ANOVA. That's just 1:1, perfect example: the levels of 177 | categorical predictor(s) "form" sub-distributions, and mean is 178 | calculated in each. Now you also understand what it means: "**expected 179 | value CONDITIONAL to the predictor**"! / 180 | 181 | Below we can observe various conditional distributions and their means. 182 | The means lay on a straight line transformed by the g() function, **the 183 | link**. 184 | 185 | / OK, I know, the illustration isn't perfect, simplifications are made, 186 | but let's agree on its imperfection, as long as it shows the main idea, 187 | huh? / 188 | 189 |

190 | 191 |

beta regression isn't strictly a GLM, but an extension of it

192 |

193 | 194 | BTW: This is well explained in the book I recommend you to read: 195 | 196 |

197 | 198 |

Peter H. Westfall, Andrea L. Arias, Understanding Regression Analysis A Conditional Distribution Approach

199 |

200 | 201 | Now, let's answer a few questions: 202 | 203 | 1. **Is expected value numerical or categorical?** Of course it's 204 | numerical. It's just "average". So you instantly see that 205 | logistic regression CANNOT predict categorical (binary) outcome 206 | itself. Whatever you've been told - it cannot, and it does not. 207 | Period. 208 | 209 | 2. **What kind of conditional distribution does the logistic regression 210 | use?** It uses the Bernoulli's distribution of a single-trial 211 | random variable, taking value 1 (for success) with probability p and 212 | the value 0 (for failure) with probability 1−p. 213 | 214 | 3. **What is the expected value for the Bernoulli's distribution?** 215 | It's "p" - the probability of success. 216 | 217 | 4. **So the E(Y\|X=x) is numerical?** Yes, probabilities are numerical. 218 | 219 | 5. **Why "Bernoulli" if statistical manuals say "binomial"?** 220 | Bernoulli is binomial with k=1. Just a general term. 221 | 222 | **I hope you can see from this, that logistic regression, as any other 223 | regression, predicts a numerical outcome, NOT categorical.** 224 | 225 | # How is logistic regression turned into a classifier? 226 | 227 | The outcome from the logistic regression, the conditional probability 228 | (therefore logistic regression is called also a "*direct probability 229 | estimator*") subjected to a conditional rule IF-THEN-ELSE , which 230 | compares it against some threshold (usually 0.5, but this shouldn't be 231 | taken as granted!) and returns the category: 232 | 233 | **IF (p \< 0.5) THEN A ELSE B** 234 | 235 |

236 | 237 |

238 | 239 | - *Wait, but this is NOT a regression! This USES the regression prediction instead!* 240 | 241 | **Glad you spotted it!** 242 | 243 | Too often people do not and just repeat that "logistic regression 244 | predicts binary outcome". And when I tell them "*but what about the 245 | regression term in it, which means that it should predict a numerical 246 | value?*", they respond "***Oh! It's a misnomer! Despite its name, 247 | logistic regression isn't a regression because it it doesn't predict 248 | numerical outcome!*".** 249 | 250 | In other words, they do something like this: 251 | 252 |

253 | 254 |

255 | 256 | \... making a direct jump from binary input to binary output: 257 | 258 |

259 | 260 |

261 | 262 | But notice, they did NOT CHANGE THE NAME, accordingly. Instead of 263 | calling it "Logistic Classifier", the ML community left the name 264 | "Logistic Regression". We could say they "appropriated the logistic 265 | regression". 266 | 267 | Consequently, **they have problems with justifying the existing name**, 268 | so they try: 269 | 270 | - „*Oh! This name is a „misnomer*" 271 | 272 | - „*It's called regression because the equation has similar form to 273 | linear regression*" 274 | 275 | - "*It must be said that, despite its name, it's not a regression*" 276 | 277 | Isn't this just crazy? 278 | 279 | 1. Statisticians invent a regression algorithm (that is, to solve 280 | regression problems) and use this way for more than half a century 281 | 282 | 2. ML specialists find it useful for classification 283 | 284 | 3. ML specialists treat it as a classifier 285 | 286 | 4. ML specialists do NOT assign appropriate name to the classifier, 287 | leaving the "regression" in it 288 | 289 | 5. ML specialists deny that logistic regression is a regression 290 | 291 | 6. ML specialists correct statisticians that "LR is a misnomer and 292 | despite its name LR is not a regression" 293 | 294 | Despite numerous regression-related problems, where the logistic 295 | regression is used every day, the situation looks like below: 296 | 297 |

298 | 299 |

300 | 301 | So once in a lifetime, let's recall what is the difference between 302 | **logistic regression** and **logistic classifier**: 303 | 304 |

305 | 306 |

307 | 308 | # Maybe... But everyone uses logistic regression for classification! 309 | 310 | Ah, argumentum ad populum ;\] 311 | 312 | OK then: 313 | 314 | - First thing: not "**_everyone_**\*. I understand that ML is a hot topic today (and is here to stay) but it does NOT mean everything revolves around and _nothing else matters_. There are other areas of science and data analysis too. Yeah, really. 315 | 316 | - The fact, that gazillions of people use logistic regression for 317 | classification purposes doesn't remove it's regression nature. 318 | It's just one application. It's like using a hammer for cooling 319 | your forehead - you can call it a "forehead cooler", but it 320 | doesn't change the fact it's still a hammer. 321 | 322 | - You should add "\... in Machine Learning". Outside ML, the 323 | applications are much richer. 324 | 325 | - Believing that there's nothing beyond Machine Learning in this 326 | world doesn't change\... the world. And the reality is that 327 | **experimental research** (including clinical research, physical and 328 | chemical experiments, sociological and psychological studies, 329 | quality assessments), where the regression tools are applied to 330 | binary (and n-ary) endpoints on daily basis, is still the essential 331 | part of science. So no, it's not true that "everyone uses logistic 332 | regression for classification". You should be more specific and 333 | add: "*Everyone in Machine Learning*". 334 | 335 | So while I can understand someone saying that "*in ML, logistic 336 | regression is a classification algorithm*", I cannot agree that 337 | "*logistic regression is not a regression*". A single specific 338 | application, employing also additional steps, and producing a different 339 | (categorized) output does not invalidate the "core" engine. 340 | 341 | The fact that a tomato can be used to cook a soup (involving many steps) 342 | does not mean that "tomato is not a fruit - it is a misnomer, because 343 | tomato is a soup ingredient". It's that simple. 344 | 345 | # Regression-related applications of the logistic regression (and its friends) 346 | 347 | Multiple times I mentioned that logistic regression is used by me and 348 | other statisticians to non-classification, regression tasks. Believe me, 349 | there is NO difference from any other regression! 350 | 351 | - **Assessmen**t = direction and magnitude **of the impact of 352 | predictors on the response** expressed as: log-odds, odds-ratios, or 353 | probability (via estimated marginal means or marginal effects \--\> 354 | for non-identity links) 355 | 356 |

357 | 358 |

359 | 360 | - For categorical predictors: **inference about the main effects** 361 | (=ANOVA), optionally adjusted for numerical covariates (=ANCOVA); 362 | exploration of their **interactions** 363 | 364 | - For categorical predictors: **inference about the simple effects of 365 | interest**, analysed via planned or ad hoc contrasts; optionally 366 | adjusted for numerical covariates 367 | 368 |

369 | 370 |

371 | 372 | - For numerical or ordinal categorical predictors: **testing for 373 | trends** (linear, quadratic, cubic, higher) in proportions. 374 | **Comparisons of trends** between groups. 375 | 376 | - **Replicating the classic statistical tests**: of proportions, 377 | odd-ratios and stochastic superiority (Wald's and Rao z test, chi2, 378 | Cochran-Armitage, Breslow-Day, Cochran-Mantel-Haenszel, McNemar, 379 | Cochran Q, Friedman, Mann-Whitney (-Wilcoxon)) 380 | 381 | - **Extending the above tests** for multiple variables and their 382 | interactions, and numerical covariates. Just check the illustration 383 | below and then [[visit my GitHub for several 384 | examples]](https://github.com/adrianolszewski/Logistic-regression-is-regression/blob/main/Testing%20hypotheses%20about%20proportions%20using%20logistic%20regression.md): 385 | 386 |

387 | 388 |

logistic regression and friends can replicate lots of classic tests!

389 |

390 | 391 | - Bonus: the model-based approach ([[check my GitHub for some thoughts 392 | and 393 | notes]](https://github.com/adrianolszewski/model-based-testing-hypotheses/blob/main/README.md)) 394 | allows one to employ advanced parametric adjustment for multiple 395 | comparisons via multivariate t distribution, adjust numerical 396 | covariates, employ time-varying covariates, account for repeated and 397 | clustered observations and more! 398 | 399 | - Direct probability estimator used to implement the **inverse 400 | probability weighting** (IPW) and **propensity score matching** 401 | algorithms 402 | 403 | - Logistic regression is very useful in the **assessment of the 404 | Missing-Completely-At-Random (MCAR) pattern** when exploring the 405 | missing observations! 406 | 407 | In my field, clinical trials, I use the logistic regression for: 408 | 409 | - the **assessment of between-arm treatment effect** via comparison of 410 | the log-odds or the % of clinical success at certain timepoints 411 | 412 | - **non-inferiority, equivalence or superiority testing** *(employs 413 | clinical significance)* at selected timepoints via appropriately 414 | defined confidence intervals of difference between %s (average 415 | marginal effect) 416 | 417 | - the **assessment of the impact of predictors on the clinical 418 | success** + provide covariate-adjusted EM-means for their main 419 | effects, interactions and finally their appropriate contrasts 420 | 421 | - the **exploration of interactions** (simple effects), making the 422 | essential part of my daily work 423 | 424 | - **analysing the over-time within-arm trends of % of successes**, 425 | e.g. to assess the treatment or some practice persistence. 426 | 427 |

428 | 429 |

430 | 431 | # Friends of the logistic regression 432 | 433 | Logistic regression has many friends that were invented to address 434 | various problems related to regression. Let us enumerate them and 435 | briefly describe: 436 | 437 | - **Binary Logistic Regression -** that's our binomial regression 438 | with logit link, a special case of the Generalized Linear Model, 439 | modelling the % of successes. 440 | 441 | - **Multinomial Logistic Regression** (MLR) - helpful, when we deal 442 | with a response consisting of multiple unordered classes (e.g. 443 | colours). 444 | 445 | - **Nested MLR** - will help us when the classes are "organized" in 446 | groups, related in a hierarchy - thus nested. Imagine that a person 447 | chooses a mean of transport between **air** {plane} and **road** 448 | {car, train, bus}. When the road transportation is chosen, then the 449 | further decision is made only between the three alternatives. It's 450 | similar to multi-level models, where the error terms may present 451 | some correlation within the same nest, whereas uncorrelated between 452 | nests. Thank you, **McFadden**, also for this one! [[Read more 453 | here]](https://cran.r-project.org/web/packages/nestedLogit/vignettes/nestedLogit.html) 454 | and [[here (Applied Microeconometrix with 455 | R)]](https://discdown.org/microeconometrics/index.html) 456 | (or just "google" for more). 457 | 458 | - **Ordinal LR** (aka Proportional Odds Model) - allows you to deal 459 | with 2+ ordered classes, {horrible \< poor \< average \< good \< 460 | excellent} or {slow \< average \< fast}, {small \< medium \< big} 461 | and so on. This is the default method of **analysing responses from 462 | pools and questionnaires** (including Likert items), e.g. . Did you 463 | know, that the OLR is related with the *Mann-Whitney (-Wilcoxon) 464 | test*? Use it, if you need a flexible non-parametric test, that: a) 465 | handles multiple categorical variables, b) adjusts for numerical 466 | covariates (like ANCOVA). Don't hesitate to use it with NUMERICAL 467 | variables! Yes, you can always do this, the same way you employ 468 | rank-based methods (e.g. *Conover's AN\[C\]OVA*). Read also the 469 | articles by **Prof. Harrell**, namely: [[Resources for Ordinal 470 | Regression Models]](https://www.fharrell.com/post/rpo/), 471 | [[Equivalence of Wilcoxon Statistic and Proportional Odds 472 | Model]](https://www.fharrell.com/post/powilcoxon/), [[If 473 | You Like the Wilcoxon Test You Must Like the Proportional Odds 474 | Model]](https://www.fharrell.com/post/wpo/), and more. 475 | 476 | - **Generalized OLR** - aka Partial Proportional Odds Model is used 477 | when the proportionality of odds doesn't hold. (PS: read 478 | [[Violation of Proportional Odds is Not 479 | Fatal]](https://www.fharrell.com/post/po/)) 480 | 481 | - **Logistic Quantile Regression** - application similar to the 482 | above - performs logistic quantile regression for bounded responses, 483 | like percentages (0-1), school grades, visual analog scales and so 484 | on. Check [[this 485 | article]](https://www.amarun.org/images/amarun/materiales/articulos/Galaraza_11_2020.pdf) 486 | and manuals for 487 | [[Stata]](https://journals.sagepub.com/doi/pdf/10.1177/1536867X1101100301) 488 | and [[R 489 | (lqr)]](https://cran.r-project.org/web/packages/lqr/). 490 | 491 | - **Conditional Logistic Regression - helpful** when we deal with 492 | stratification and matching groups of data, e.g. in observational 493 | studies without randomization, to match subjects by some 494 | characteristics and create homogenous "baseline". It can be used 495 | to reproduce the *Cochran-Mantel-Haenszel test* (via 496 | *clogit(\...strata)* in R) 497 | 498 | - The **binary logistic regression and its multinomial LR and ordinal 499 | LR friends can account for dependent responses** (repeated 500 | measurements, clustered observations) through the *Generalized 501 | Estimating Equations* (GEE) semi-parametric estimation and the 502 | *Generalized Linear Mixed Models* (GLMM). No surprise that logistic 503 | regression is one of the core regression models for longitudinal 504 | clinical trials with binary endpoints. And no, we do NOT classify 505 | there anything ;\] 506 | 507 | - **Alternating Logistic Regression** - it's a quite rare (and 508 | forgotten) model, suitable if we deal with correlated observations, 509 | e.g. when we analyse repeated or clustered data. I mentioned already 510 | two methods: the mixed-effect LR, and GEE LR. The Alternating LR is 511 | the 3rd option, which models the dependency between pairs of 512 | observations by using log odds ratios instead of correlations (which 513 | is done by GEE). It handles ordinal responses too. There were some 514 | past implementations in R, but now they are removed from CRAN. [[SAS 515 | supports it as part of PROC 516 | GEE]](https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_gee_details06.htm). 517 | 518 |

519 | 520 |

Logistic regression has many friends

521 |

522 | 523 | # Literature 524 | 525 | I will populate this chapter with textual references later. For now, 526 | find the "collage" of covers. And believe, neither of these books will 527 | say that "logistic regression is not a regression" :) 528 | 529 |

530 | 531 |

NEITHER of these great books will give you nonsenses like "...is not a regression"

532 |

533 | 534 | \+ recently found an excellent one: 535 | 536 |

537 | 538 |

Norman Matloff, Statistical Regression and Classification From Linear Models to Machine Learning

539 |

540 | 541 | Other authors also prove it can be done properly: 542 | 543 |

544 | 545 |

Brett Lantz, Machine Learning with R: Learn How to Use R to Apply Powerful Machine Learning Methods and Gain and Insight into Real-world Applications

546 |

547 | 548 | ## ad hoc comments from my readers 549 | 550 | - *Q: "Adrian, but in their book, Hastie and Tibshirani put the 551 | logistic regression in the »classification« chapter!"* 552 | 553 | A: Of course they did! It's a book about machine learning, so this kind 554 | of *application* is of interest and highly expectable. **BUT they've 555 | never said it's not a regression model. ** They both wrote also a 556 | series of articles on the application of the proportional hazard models 557 | and the logistic regression in biostatistical (they worked in the 558 | division of biostatistics) applications in the regression manner 559 | (assessment of the prognostic factors, assessment of the treatment 560 | effect) and call it a regression model. 561 | 562 |

563 | 564 |

565 | 566 | Also in the book you mention, on page 121-122 + the following examples 567 | they say: "*Logistic regression models are used mostly as a data 568 | analysis and inference tool, where the goal is to understand the role of 569 | the input variables in explaining the outcome. Typically many models are 570 | fit in a search for a parsimonious model involving a subset of the 571 | variables, possibly with some interactions terms.*" 572 | 573 | - *Q: You said that Prof. Hastie authored the glm() function in S. Any 574 | source?* 575 | 576 | *A:* 577 | Here (just for instance): 578 | 579 |

580 | 581 |

582 | 583 | - *Q: ChatGPT 3.5 says that logistic regression is not a regression!* 584 | 585 | *A:* **ChatGPT will repeat what was trained on**. Don't rely on it 586 | strictly when you are learning a new topic, because what you will be 587 | told strongly depends on how you will ask. It was trained on mixed good 588 | and bad resources, so sometimes the *valid one* is "allowed to speak" 589 | but just a few questions later it may be messing again. This pertains to 590 | ANY kind of topic, not only in statistics. **DO ALWAYS verify the 591 | responses from any AI-based system if you are going to learn from it, 592 | pass your exams or an interview, or do your job.** 593 | 594 | PS: I was told that the newest version of ChatGPT is much better, so 595 | give it a try. 596 | 597 |

598 | 599 |

ChatGPT 3.5 in action

600 |

601 | -------------------------------------------------------------------------------- /Testing hypotheses about proportions using logistic regression.md: -------------------------------------------------------------------------------- 1 | Despite the widespread and nonsensical claim, that "logistic regression is not a regression", it constitutes one of the key regression and hypothesis testing tools used in the experimental research (like clinical trials). 2 | 3 | Let me show you how the logistic regression (with a few extensions) can be used to test hypotheses about fractions (%) of successes, repacling the classic "test for proportions". 4 | Namely, it can replicate the results of: 5 | 6 | 1. [the Wald's (normal approximation) **z test for 2 proportions with non-pooled standard errors**](#wald_2prop_z) (common in clinical trials). Testing is done via EM-means (estimated-marginal means) on the probability scale or the AME (average marginal effect) over the binary predictor distinguishing the two compared groups. 7 | 2. [the Rao's score (normal appr.) **z test for 2 proportions with pooled standard errors**](#rao_2prop_z) (just what the `prop.test()` does in R) 8 | 3. the **chi2 test for multiple (2+) proportions** 9 | 4. **ANOVA-like** (joint) test for multiple caterogical predictors (n-way ANOVA). Also (n-way) ANCOVA if you employ numerical covariates. 10 | 5. [the **Cochran-Mantel-Haenszel (CMH) for stratified/matched data**](#cmh) via _conditional logistic regression_ 11 | 7. [the **Breslow-Day test for odds ratios**](#breslow-day) through Rao's ANOVA --> the interaction term 12 | 8. [the **Cochran-Armitage test for trend in ordered proportions**](#armitage-trend) 13 | 9. [the **McNemar and Cochran Q** test of paired proportions](#mcnemar) via GEE estimation (Generalized Estimating Equations with compound symmetry) 14 | 10. [the **Friedman test**](#mcnemar) - as above 15 | 11. [the **Mann-Whitney-Wilcoxon and Kruskal-Wallis**](#mww) via Ordinal Logistic Regression (and paired Wilcoxon via GEE) 16 | 17 | Actually, the model-based approach to testing hypotheses is not anything new, and lots of other tests can be replicated with the general linear model via Ordinal Least Square (OLS) and Generalized Least Square (GLS) estimation, generalized linear models (GLM) via both Maximum-Likelihood estimation (MLE) and semi-parametric Generalized Estimating Equations (GEE). Let's add to this also the conditional approach via Mixed-Effect models (both general and generalized). And let's not forget about the Quantile Regression (with mixed effects), robust regression models, survival models (Cox, AFT, Andersen-Gill, frailty models) and dozens of others! 18 | 19 | All those models, followed by the _Likelihood Ratio_ (LR), _Rao (score)_, or _Wald's_ approach to testing of model coefficients, especially combined with the idea of _estimated marginal means_ (EM-means) will give you an incredibly flexible testing framework. 20 | 21 | This time we will look at the Logistic Regression, part of the Generalized Linear Model - the binomial regression with logit link. We will also employ certain extensions and generalizations to achieve concrete effects. 22 | 23 | ![obraz](https://github.com/adrianolszewski/Logistic-regression-is-regression/assets/95669100/953ab6ff-1930-4a02-9f7b-8be2cb1b7d71) 24 | 25 | --- 26 | 27 | We are going to use 3 data sets (defined at the bottom of this file): 28 | * unpaired 2-group data 29 | 30 | ``` r 31 | > head(unpaired_data) 32 | sex response trt 33 | 1 female 0 active 34 | 2 female 0 active 35 | 3 female 0 active 36 | 4 female 0 active 37 | 5 female 0 active 38 | 6 female 0 active 39 | > tail(unpaired_data) 40 | sex response trt 41 | 101 male 1 placebo 42 | 102 male 1 placebo 43 | 103 male 1 placebo 44 | 104 male 1 placebo 45 | 105 male 1 placebo 46 | 106 male 1 placebo 47 | ``` 48 | 49 | * paired 2-group data 50 | 51 | ``` r 52 | > head(paired_data) 53 | ID Time Treatment Response 54 | 1 1 Pre placebo 0 55 | 2 1 Post placebo 1 56 | 3 2 Pre placebo 0 57 | 4 2 Post placebo 0 58 | 5 3 Pre placebo 0 59 | 6 3 Post placebo 0 60 | > 61 | > tail(paired_data) 62 | ID Time Treatment Response 63 | 35 18 Pre active 0 64 | 36 18 Post active 1 65 | 37 19 Pre active 0 66 | 38 19 Post active 0 67 | 39 20 Pre active 0 68 | 40 20 Post active 0 69 | ``` 70 | 71 | * ordered data 72 | ``` r 73 | > head(ordered_paired_data) 74 | ID Time Response TimeUnord 75 | 1 1 T1 0 T1 76 | 2 2 T1 0 T1 77 | 3 3 T1 0 T1 78 | 4 4 T1 0 T1 79 | 5 5 T1 0 T1 80 | 6 6 T1 0 T1 81 | > tail(ordered_paired_data) 82 | ID Time Response TimeUnord 83 | 25 5 T3 1 T3 84 | 26 6 T3 1 T3 85 | 27 7 T3 1 T3 86 | 28 8 T3 0 T3 87 | 29 9 T3 1 T3 88 | 30 10 T3 1 T3 89 | ``` 90 | 91 | * unpaired 2-group ordinal data (Pain score of the ODI (Oswestry Disability Index) questionnaire; 6-items Likert data. 92 | https://www.lni.wa.gov/forms-publications/F252-130-000.pdf 93 | ``` r 94 | > head(ordinal_data) 95 | ODIPain Arm Age_centered 96 | 1 [2] Moderate pain B -6.15315 97 | 2 [0] No pain B 12.84685 98 | 3 [1] Very mild pain A -9.15315 99 | 4 [2] Moderate pain B 14.84685 100 | 5 [3] Fairly severe pain A 12.84685 101 | 6 [2] Moderate pain B 2.84685 102 | > tail(ordinal_data) 103 | ODIPain Arm Age_centered 104 | 106 [2] Moderate pain A -15.153153 105 | 107 [2] Moderate pain B -11.153153 106 | 108 [2] Moderate pain A -4.153153 107 | 109 [4] Very severe pain B -0.153153 108 | 110 [1] Very mild pain B -4.153153 109 | 111 [1] Very mild pain B -7.153153 110 | ``` 111 | 112 | --- 113 | Loading necessary packages 114 | ```{r} 115 | library(emmeans) 116 | library(broom) 117 | library(survival) 118 | library(marginaleffects) 119 | library(geepack) 120 | ``` 121 | 122 | Defining auxiliary function (to validate the results) 123 | ``` r 124 | wald_z_test <- function(table) { 125 | p1 <- prop.table(table, 1)[1, 1] 126 | p2 <- prop.table(table, 1)[2, 1] 127 | n1 <- rowSums(table)[1] 128 | n2 <- rowSums(table)[2] 129 | se_p1 <- sqrt(p1 * (1 - p1) / n1) 130 | se_p2 <- sqrt(p2 * (1 - p2) / n2) 131 | se_diff <- sqrt(se_p1^2 + se_p2^2) 132 | z <- (p1 - p2) / se_diff 133 | p <- 2 * (1 - pnorm(abs(z))) 134 | return(data.frame(estimate = p1 - p2, z = z, se = se_diff, p.value = p, row.names = NULL)) 135 | } 136 | ``` 137 | 138 | --- 139 | 140 | # Wald's z test for 2 proportions (non-pooled SE) 141 | 142 | We want to reproduce this result: 143 | ``` r 144 | > wald_z_test(xtabs(~ trt + response,data = unpaired_data)) 145 | estimate z se p.value 146 | 1 0.2737968 3.047457 0.08984435 0.002307865 147 | ``` 148 | 149 | We will use this logistic regression (LR) model: 150 | ``` r 151 | > summary(lr_model <- glm(response ~ trt , data = unpaired_data, family = binomial(link = "logit"))) 152 | 153 | Call: 154 | glm(formula = response ~ trt, family = binomial(link = "logit"), 155 | data = unpaired_data) 156 | 157 | Deviance Residuals: 158 | Min 1Q Median 3Q Max 159 | -1.7011 -1.1620 0.7325 1.0778 1.1929 160 | 161 | Coefficients: 162 | Estimate Std. Error z value Pr(>|z|) 163 | (Intercept) -0.03637 0.26972 -0.135 0.89274 164 | trtplacebo 1.21502 0.42629 2.850 0.00437 ** 165 | --- 166 | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 167 | 168 | (Dispersion parameter for binomial family taken to be 1) 169 | 170 | Null deviance: 140.50 on 105 degrees of freedom 171 | Residual deviance: 131.88 on 104 degrees of freedom 172 | AIC: 135.88 173 | 174 | Number of Fisher Scoring iterations: 4 175 | ``` 176 | 177 | ## Wald's z test via LS-means on re-grided scale (probability scale) 178 | ``` r 179 | > pairs(emmeans(lr_model, regrid="response", specs = ~ trt)) 180 | contrast estimate SE df z.ratio p.value 181 | active - placebo -0.274 0.0898 Inf -3.047 0.0023 182 | ``` 183 | Let's look closer at the results: 184 | | Outcome | LS-means | raw z test | comment | 185 | |-----------|----------|------------|---------| 186 | | estimate | -0.2737968 | 0.2737968| 👍; swap factor levels to change the sign or ignore | 187 | | SE | 0.08984432 | 0.08984435 | agreement by 7 dec. digits 👍 | 188 | | statistic | -3.047458 | 3.047457 | sign - as above; agreement by 5 dec. digits 👍 | 189 | | p-value | 0.002307857 | 0.002307865 | aggrement by 7 dec. digits 👍 | 190 | 191 | Excellent agreement! 192 | 193 | ## Wald's z test via AME (average marginal effect) 194 | / How does it work? Please read my explanation and a formal proof here: https://github.com/adrianolszewski/Logistic-regression-is-regression/blob/main/logistic_regression_AME_Wald_z_test_proportions.md / 195 | ``` r 196 | > marginaleffects::avg_slopes(lr_model) 197 | 198 | Term Contrast Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 % 199 | trt placebo - active 0.274 0.0898 3.05 0.00231 8.8 0.0977 0.45 200 | 201 | Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 202 | ``` 203 | Let's look closer at the results: 204 | | Outcome | AME | raw z test | comment | 205 | |-----------|----------|------------|------| 206 | | estimate | 0.2737968 | 0.2737968 | | 👍 | 207 | | SE | 0.08984433 | 0.08984435 | 👍 | 208 | | statistic | 3.047458 | 3.047457 | agreement by 5 dec. digits 👍 | 209 | | p-value | 0.002307859 | 0.002307865 | agreement by 6 dec. digits 👍 | 210 | 211 | Perfect agreement! 212 | --- 213 | 214 | # Rao score z test for 2 proportions (pooled SE) 215 | / How does it work? Please read my explanation and a formal proof here: https://github.com/adrianolszewski/Logistic-regression-is-regression/blob/main/logistic_regression_Rao_Wald_z_test_proportions.md / 216 | 217 | We want to reproduce this result: 218 | ``` r 219 | > prop.test(xtabs(~ trt + response,data=unpaired_data), correct = FALSE) 220 | 221 | 2-sample test for equality of proportions without continuity correction 222 | 223 | data: xtabs(~trt + response, data = unpaired_data) 224 | X-squared = 8.4429, df = 1, p-value = 0.003665 225 | alternative hypothesis: two.sided 226 | 95 percent confidence interval: 227 | 0.09770511 0.44988848 228 | sample estimates: 229 | prop 1 prop 2 230 | 0.5090909 0.2352941 231 | ``` 232 | 233 | We will use the same logistic regression (LR) model as previously 234 | 235 | ## Rao score z test via ANOVA with Rao test 236 | ``` r 237 | > anova(glm(response ~ trt , data = unpaired_data, family = binomial(link = "logit")), test = "Rao") 238 | Analysis of Deviance Table 239 | 240 | Model: binomial, link: logit 241 | 242 | Response: response 243 | 244 | Terms added sequentially (first to last) 245 | 246 | Df Deviance Resid. Df Resid. Dev Rao Pr(>Chi) 247 | NULL 105 140.50 248 | trt 1 8.6257 104 131.88 8.4429 0.003665 ** 249 | --- 250 | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 251 | ``` 252 | Let's look closer at the results: 253 | | Outcome | ANOVA + Rao test | prop.test() | comment | 254 | |-----------|----------|------------|---------| 255 | | statistic | 8.442898 | 8.442897 | agreement by 5 dec. digits 👍 | 256 | | p-value | 0.003664718 | 0.003664719 | agreement by 8 dec. digits 👍 | 257 | 258 | Perfect agreement! 259 | 260 | --- 261 | 262 | # Breslow-Day test for odds ratios via ANOVA with Rao test 263 | 264 | We want to reproduce this result for treatment and sex: 265 | ``` r 266 | > BreslowDayTest(xtabs(~ trt +response + sex, data=unpaired_data), correct = TRUE) 267 | 268 | Breslow-Day Test on Homogeneity of Odds Ratios (with Tarone correction) 269 | 270 | data: xtabs(~trt + response + sex, data = unpaired_data) 271 | X-squared = 1.4905, df = 1, p-value = 0.2221 272 | ``` 273 | This time add sex to the model and will look at the interaction term 274 | 275 | ``` r 276 | > as.data.frame(anova(glm(response ~ trt * sex , data = unpaired_data, family = binomial(link = "logit")), test="Rao")[4, ]) 277 | Df Deviance Resid. Df Resid. Dev Rao Pr(>Chi) 278 | trt:sex 1 1.498573 102 130.0512 1.496552 0.2212027 279 | ``` 280 | Let's look closer at the results: 281 | | Outcome | ANOVA + Rao test | Breslow-Day | comment | 282 | |-----------|----------|------------|---------| 283 | | statistic | 1.496552 | 1.490537 | agreement by 2 dec. digits 👍 | 284 | | p-value | 0.2212027 | 0.2221331 | agreement bt 2 dec. digits 👍 | 285 | 286 | Good agreement! 287 | 288 | --- 289 | 290 | # (Cochrane-) Mantel-Haenszel via conditional logistic regression 291 | 292 | We want to reproduce this result for sex strata: 293 | ``` r 294 | > mantelhaen.test(unpaired_data$response, unpaired_data$trt, unpaired_data$sex, exact = F, correct = F) 295 | 296 | Mantel-Haenszel chi-squared test without continuity correction 297 | 298 | data: unpaired_data$response and unpaired_data$trt and unpaired_data$sex 299 | Mantel-Haenszel X-squared = 8.3052, df = 1, p-value = 0.003953 300 | alternative hypothesis: true common odds ratio is not equal to 1 301 | 95 percent confidence interval: 302 | 1.445613 7.593375 303 | sample estimates: 304 | common odds ratio 305 | 3.313168 306 | ``` 307 | And through the model: 308 | ``` r 309 | > summary(clogit(response~trt + strata(sex),data=unpaired_data))$sctest 310 | test df pvalue 311 | 8.30516934 1.00000000 0.00395324 312 | ``` 313 | Let's look closer at the results: 314 | | Outcome | Cond. LR | CMH | comment | 315 | |-----------|----------|------------|---------| 316 | | statistic | 8.30516934 | 8.305169 | 👍 | 317 | | p-value | 0.00395324 | 0.00395324 | 👍 | 318 | 319 | Ideal agreement! 320 | 321 | --- 322 | 323 | # McNemar's, Cochran Q, Friedman tests via GEE estimated LR 324 | We want to reproduce this result for sex strata: 325 | ``` r 326 | > mcnemar.test(x=paired_data[paired_data$Time == "Pre", "Response"], y=paired_data[paired_data$Time == "Post", "Response"], correct = F) 327 | 328 | McNemar's Chi-squared test 329 | 330 | data: paired_data[paired_data$Time == "Pre", "Response"] and paired_data[paired_data$Time == "Post", "Response"] 331 | McNemar's chi-squared = 10.286, df = 1, p-value = 0.001341 332 | 333 | # or this one 334 | 335 | > paired_data %>% rstatix::friedman_test(Response ~ Time |ID) 336 | # A tibble: 1 × 6 337 | .y. n statistic df p method 338 | * 339 | 1 Response 20 10.3 1 0.00134 Friedman test 340 | 341 | # or this one 342 | 343 | > RVAideMemoire::cochran.qtest(Response ~ Time | ID,data=paired_data) 344 | 345 | Cochran's Q test 346 | 347 | data: Response by Time, block = ID 348 | Q = 10.2857, df = 1, p-value = 0.001341 349 | alternative hypothesis: true difference in probabilities is not equal to 0 350 | sample estimates: 351 | proba in group Post proba in group Pre 352 | 0.7 0.1 353 | ``` 354 | 355 | Through the GEE-estimated model: 356 | ``` r 357 | > summary(geepack::geeglm(Response ~ Time, id = ID,data=paired_data, family = binomial(), corstr = "exchangeable")) 358 | 359 | Call: 360 | geepack::geeglm(formula = Response ~ Time, family = binomial(), 361 | data = paired_data, id = ID, corstr = "exchangeable") 362 | 363 | Coefficients: 364 | Estimate Std.err Wald Pr(>|W|) 365 | (Intercept) 0.8473 0.4880 3.015 0.08249 . 366 | TimePre -3.0445 0.9484 10.305 0.00133 ** 367 | --- 368 | Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 369 | 370 | Correlation structure = exchangeable 371 | Estimated Scale Parameters: 372 | 373 | Estimate Std.err 374 | (Intercept) 1 0.7215 375 | Link = identity 376 | 377 | Estimated Correlation Parameters: 378 | Estimate Std.err 379 | alpha -0.1455 0.2819 380 | Number of clusters: 20 Maximum cluster size: 2 381 | 382 | # or in a more compact form: 383 | > coef(summary(geepack::geeglm(Response ~ Time, id = ID,data=paired_data, family = binomial(), corstr = "exchangeable")))[2,] 384 | Estimate Std.err Wald Pr(>|W|) 385 | TimePre -3.045 0.9484 10.31 0.001327 386 | ``` 387 | 388 | Let's look closer at the results: 389 | | Outcome | GEE LR | Tests | comment | 390 | |-----------|----------|------------|---------| 391 | | statistic | 10.31 | 10.2857 | agreement by 1 deci. digits 👍 | 392 | | p-value | 0.001327 | 0.001341 | agreement by 4 dec. digits 👍 | 393 | 394 | Acceptable agreement! 395 | 396 | --- 397 | 398 | # Cochrane-Armitage test for trend via GLM + ANOVA LRT (Likelihood Ratio Test) 399 | We want to reproduce this result for sex strata: 400 | ``` r 401 | > DescTools::CochranArmitageTest(xtabs(~Response + Time,data=ordered_paired_data)) 402 | 403 | Cochran-Armitage test for trend 404 | 405 | data: xtabs(~Response + Time, data = ordered_paired_data) 406 | Z = -3.6, dim = 3, p-value = 0.0003 407 | alternative hypothesis: two.sided 408 | 409 | # or this one 410 | 411 | > rstatix::prop_trend_test(xtabs(~Response + Time,data=ordered_paired_data)) 412 | # A tibble: 1 × 6 413 | n statistic p p.signif df method 414 | * 415 | 1 30 12.9 0.000336 *** 1 Chi-square trend test 416 | ``` 417 | 418 | Through the GLM model: 419 | ``` r 420 | > as.data.frame(anova(glm(Response ~ Time,data=ordered_paired_data, family = binomial()), test="LRT"))[2,] 421 | Df Deviance Resid. Df Resid. Dev Pr(>Chi) 422 | Time 2 14.99 27 26.46 0.0005553 423 | ``` 424 | 425 | Let's look closer at the results: 426 | | Outcome | GLM + LRT ANOVA | Test | comment | 427 | |-----------|----------|------------|---------| 428 | | statistic | 12.86 | 14.99 | same order of magnitude | 429 | | p-value | 0.0005553 | 0.000336 | agreement by 3 dec. digits 👍 | 430 | 431 | Reasonable agreement. (Maybe I'll find a better one). 432 | 433 | --- 434 | 435 | # Mann-Whitney (-Wilcoxon) test of stochastic equivalence (vs. stochastic superiority / dominance) 436 | **Note:** This test DOES NOT TEST MEDIANS in general, unless strong distributional assumptions hold: 437 | 1) IID samples (same dispersion, variance & same shape - if skewed, then in the same direction) 438 | 2) Symmetric around their medians. 439 | For detailed explanations, read my gist and find a rich list of literature (mostly freely accessible) and examples: https://gist.github.com/adrianolszewski/2cec75678e1183e4703589bfd22fa8b2 440 | 441 | We want to reproduce this result: 442 | ``` r 443 | > (wtest <- wilcox.test(as.numeric(ODIPain) ~ Arm, data = ordinal_data, exact = FALSE, correct = FALSE)) 444 | 445 | Wilcoxon rank sum test 446 | 447 | data: as.numeric(ODIPain) by Arm 448 | W = 1472, p-value = 0.68 449 | alternative hypothesis: true location shift is not equal to 0 450 | 451 | > wtest$p.value 452 | [1] 0.679575 453 | ``` 454 | By using the proportional-odds model (ordinal logistic regression) we obtain: 455 | ``` r 456 | > coef(summary(m <- MASS::polr(ODIPain ~ Arm , data = ordinal_data, Hess=T))) 457 | Value Std. Error t value 458 | ArmB 0.141709 0.341471 0.414995 459 | [0] No pain|[1] Very mild pain -1.444439 0.299213 -4.827458 460 | [1] Very mild pain|[2] Moderate pain -0.273260 0.259784 -1.051875 461 | [2] Moderate pain|[3] Fairly severe pain 1.361363 0.291704 4.666935 462 | [3] Fairly severe pain|[4] Very severe pain 2.093502 0.345203 6.064551 463 | [4] Very severe pain|[5] Worst imaginable pain 4.072209 0.736078 5.532306 464 | 465 | > pairs(emmeans(m, specs = ~Arm)) 466 | contrast estimate SE df z.ratio p.value 467 | A - B -0.142 0.341 Inf -0.415 0.6781 468 | 469 | # or 470 | > (mtest <- joint_tests(m)) 471 | model term df1 df2 F.ratio p.value 472 | Arm 1 Inf 0.172 0.6781 473 | 474 | mtest$p.value 475 | [1] 0.678146 476 | ``` 477 | 478 | This time, the two outputs (model vs. test) look very different but give a very close p-value! 479 | It's not a coincidence. 480 | You can find detailed explanations and necessary formulas here: [Equivalence of Wilcoxon Statistic and Proportional Odds Model](https://www.fharrell.com/post/powilcoxon/) | [Resources for Ordinal Regression Models](https://www.fharrell.com/post/rpo/) | [If You Like the Wilcoxon Test You Must Like the Proportional Odds Model](https://www.fharrell.com/post/wpo/) 481 | 482 | So, like Prof. Harrell, we will check also the concordance index: 483 | ``` r 484 | # From the Wilcoxon statistic 485 | > (bind_cols(tidy(wilcox.test(as.numeric(ODIPain) ~ Arm, data = ordinal_data, exact = FALSE, correct = FALSE)), 486 | ordinal_data %>% 487 | group_by(Arm) %>% 488 | summarize(n=n()) %>% 489 | summarize("n1*n2" = prod(n))) %>% 490 | mutate(c = statistic / `n1*n2`) -> concor) 491 | 492 | # A tibble: 1 × 6 493 | statistic p.value method alternative `n1*n2` c 494 | 495 | 1 1472. 0.680 Wilcoxon rank sum test two.sided 3080 0.478 496 | 497 | > concor$c 498 | 0.478084 499 | 500 | # From the odds ratio taken from the model: 501 | > (OR <- 1/exp((coef(summary(m)))[1,1])) 502 | [1] 0.867874 503 | 504 | and finally 505 | > (c_mod <- OR^0.66 / (1 + OR ^ 0.66)) 506 | 0.476635 507 | 508 | # So we are off by: 509 | > sprintf("%.2f%%",100*(concor$c - c_mod) / concor$c) 510 | [1] "0.30%" 511 | 512 | # Isn't this IMPRESSIVE? 513 | ``` 514 | 515 | Let's collect the results closer at the results: 516 | | Outcome | OLR | Wilcox | comment | 517 | |-----------|----------|------------|---------| 518 | | concordance | 0.478084 | 0.476635 | agreement by 2 dec. digits 👍 | 519 | | p-value | 0.679575 | 0.678146 | agreement by 2 dec. digits 👍 | 520 | 521 | Very good agreement! 522 | 523 | Later, in a separate gist, I will shoud you, through simulation, that this equivalence holds very well! 524 | 525 | **Think about the consequences. This way obtain the Mann-Whitney (-Wilcoxon) test adjusted for covariates.** 526 | By the way, this is another interesting example, where the result of a completely non-parametric test can be obtained via parametric method. 527 | 528 | --- 529 | * _EM-means_ (estimated marginal means) is another name of the well-known in experimental research _LS-means_ (least-square means) 530 | It's a model-based predicted (estimated) mean. If you remember the definition of regression (NO, not the Machine Learning one...) 531 | then you know that regresion gives you some function of the data conditional to the predictor. 532 | For the linear regression it's E(Y|X=x), for the GLM it is link(E(Y|X=x)), for quantile regression it's Qi(Y|X=x). 533 | And since the predictor exclusively consists of categorical variables, they form sub-groups in which the (conditional) 534 | means are calculated. If we include also numerical covariates into the model, the predictions will account for it, giving us so-called "covariate-adjusted means". 535 | 536 | ---- 537 | The datasets for your own experiments: 538 | 539 | ``` r 540 | unpaired_data <- structure(list(sex = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 541 | 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 542 | 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 543 | 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 544 | 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 545 | 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 546 | 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 547 | 2L, 2L, 2L), levels = c("female", "male"), class = "factor"), 548 | response = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 549 | 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 550 | 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 551 | 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 552 | 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 553 | 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), trt = structure(c(1L, 554 | 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 555 | 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 556 | 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 557 | 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 558 | 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 559 | 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 560 | 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L 561 | ), levels = c("active", "placebo"), class = "factor")), row.names = c(NA, 562 | -106L), class = "data.frame") 563 | 564 | paired_data <- structure(list(ID = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 565 | 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L, 11L, 12L, 12L, 566 | 13L, 13L, 14L, 14L, 15L, 15L, 16L, 16L, 17L, 17L, 18L, 18L, 19L, 567 | 19L, 20L, 20L), Time = structure(c(2L, 1L, 2L, 1L, 2L, 1L, 2L, 568 | 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 569 | 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 570 | 1L), levels = c("Post", "Pre"), class = "factor"), Treatment = structure(c(2L, 571 | 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 572 | 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 573 | 1L, 1L, 1L, 1L, 1L, 1L, 1L), levels = c("active", "placebo"), class = "factor"), 574 | Response = c(0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 1L, 575 | 0L, 0L, 1L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 576 | 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L)), row.names = c(NA, 577 | -40L), class = "data.frame") 578 | 579 | ordered_paired_data <- structure(list(ID = structure(c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 580 | 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 581 | 4L, 5L, 6L, 7L, 8L, 9L, 10L), levels = c("1", "2", "3", "4", 582 | "5", "6", "7", "8", "9", "10"), class = "factor"), Time = structure(c(1L, 583 | 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 584 | 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), levels = c("T1", 585 | "T2", "T3"), class = c("ordered", "factor")), Response = c(0L, 586 | 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 587 | 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L), TimeUnord = structure(c(1L, 588 | 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 589 | 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), levels = c("T1", 590 | "T2", "T3"), class = "factor")), row.names = c(NA, -30L), class = "data.frame") 591 | 592 | ordinal_data <- structure(list(ODIPain = structure(c(3L, 1L, 2L, 3L, 4L, 3L, 593 | 4L, 2L, 3L, 5L, 2L, 5L, 5L, 6L, 2L, 3L, 1L, 2L, 3L, 3L, 1L, 3L, 594 | 3L, 2L, 2L, 5L, 5L, 2L, 5L, 3L, 5L, 1L, 3L, 3L, 3L, 1L, 5L, 3L, 595 | 5L, 1L, 1L, 2L, 1L, 2L, 3L, 2L, 3L, 1L, 2L, 1L, 2L, 4L, 6L, 4L, 596 | 3L, 3L, 3L, 3L, 1L, 4L, 5L, 4L, 3L, 3L, 1L, 3L, 1L, 4L, 3L, 3L, 597 | 2L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 1L, 2L, 2L, 1L, 3L, 4L, 4L, 3L, 598 | 2L, 2L, 2L, 2L, 2L, 1L, 1L, 3L, 1L, 3L, 1L, 3L, 4L, 4L, 3L, 3L, 599 | 1L, 2L, 3L, 3L, 3L, 3L, 5L, 2L, 2L), levels = c("[0] No pain", 600 | "[1] Very mild pain", "[2] Moderate pain", "[3] Fairly severe pain", 601 | "[4] Very severe pain", "[5] Worst imaginable pain"), class = c("ordered", 602 | "factor")), Arm = structure(c(2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 603 | 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 604 | 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 605 | 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 606 | 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 607 | 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 608 | 1L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 609 | 1L, 1L, 2L, 1L, 2L, 2L, 2L), levels = c("A", "B"), class = "factor"), 610 | Age_centered = c(-6.15315315315316, 12.8468468468468, -9.15315315315316, 611 | 14.8468468468468, 12.8468468468468, 2.84684684684684, -10.1531531531532, 612 | -18.1531531531532, -1.15315315315316, 8.84684684684684, -17.1531531531532, 613 | 13.8468468468468, 9.84684684684684, 17.8468468468468, -19.1531531531532, 614 | -7.15315315315316, -10.1531531531532, -19.1531531531532, 615 | -7.15315315315316, 0.846846846846844, -17.1531531531532, 616 | 5.84684684684684, -25.1531531531532, -1.15315315315316, -15.1531531531532, 617 | 4.84684684684684, 1.84684684684684, 12.8468468468468, -11.1531531531532, 618 | 5.84684684684684, -6.15315315315316, -0.153153153153156, 619 | 20.8468468468468, 5.84684684684684, -0.153153153153156, 12.8468468468468, 620 | -19.1531531531532, -11.1531531531532, 1.84684684684684, 0.846846846846844, 621 | -21.1531531531532, 9.84684684684684, 15.8468468468468, 14.8468468468468, 622 | -12.1531531531532, -11.1531531531532, -9.15315315315316, 623 | 5.84684684684684, -4.15315315315316, 12.8468468468468, 1.84684684684684, 624 | -7.15315315315316, -3.15315315315316, 7.84684684684684, 0.846846846846844, 625 | -4.15315315315316, 5.84684684684684, -0.153153153153156, 626 | 1.84684684684684, -7.15315315315316, 1.84684684684684, -9.15315315315316, 627 | 6.84684684684684, 9.84684684684684, 17.8468468468468, 5.84684684684684, 628 | 9.84684684684684, -10.1531531531532, -5.15315315315316, 18.8468468468468, 629 | 21.8468468468468, -0.153153153153156, 2.84684684684684, -8.15315315315316, 630 | -5.15315315315316, 5.84684684684684, 2.84684684684684, -15.1531531531532, 631 | 2.84684684684684, 25.8468468468468, -11.1531531531532, 27.8468468468468, 632 | 2.84684684684684, 20.8468468468468, -0.153153153153156, -2.15315315315316, 633 | 12.8468468468468, -0.153153153153156, 0.846846846846844, 634 | 11.8468468468468, -8.15315315315316, 3.84684684684684, 22.8468468468468, 635 | 5.84684684684684, 12.8468468468468, 4.84684684684684, 11.8468468468468, 636 | -5.15315315315316, -17.1531531531532, -7.15315315315316, 637 | -16.1531531531532, 0.846846846846844, -13.1531531531532, 638 | -13.1531531531532, -19.1531531531532, -15.1531531531532, 639 | -11.1531531531532, -4.15315315315316, -0.153153153153156, 640 | -4.15315315315316, -7.15315315315316)), row.names = c(NA, 641 | -111L), class = "data.frame") 642 | ``` 643 | -------------------------------------------------------------------------------- /logistic_regression_AME_Wald_z_test_proportions.md: -------------------------------------------------------------------------------- 1 | # Proving the equivalence between the 2-sample Wald’s z-statistic for comparing proportions with unpooled variances and the Average Marginal Effect over logistic regression with a single binary predictor 2 | #### Adrian Olszewski, 02.03.2025 3 | 4 | The Wald’s z-statistic for difference in 2 proportions with unpooled variances is of the following form: 5 | 6 | ``` math 7 | \begin{equation} 8 | z=\ \frac{\widehat{p_B}-\widehat{p_A}}{\sqrt{\frac{\widehat{p_A}\left(1-\widehat{p_A}\right)}{n_A}+\frac{\widehat{p_B}\left(1-\widehat{p_B}\right)}{n_B}}} 9 | \hspace{2cm} (1) 10 | \end{equation} 11 | ``` 12 | Where $\widehat{p_1}\$ stands for the estimated probability (sample proportion, %) in the 1st group, $\widehat{p_2}$ is the estimated probability (sample proportion, %) in the 2nd group, $n_1$ and $n_2$ denote respective group sizes. 13 | I will show that this statistic is 1:1 equivalent to the average marginal effect (AME) of the logistic regression with a single binary predictor playing role of indicator for the compared samples. 14 | 15 | To simplify calculations, I will show the equivalence of the nominator and denominator of the z statistic, i.e. the difference between two probabilities and its variance, separately. 16 | 17 | **The equation of logistic regression** 18 | 19 | Let’s start from the equation of the logistic regression with a single binary predictor: 20 | ``` math 21 | \begin{equation} 22 | logit\left(E\left(Y\middle| X\right)=logit\left(\hat{p}(Y=1|X\right)\right)=\ln{\left(\frac{\hat{p}\left(Y=1\middle| X\right)}{1-\hat{p}\left(Y=1\middle| X\right)}\right)}=\beta_0+X_1\beta_1 23 | \hspace{2cm} (2) 24 | \end{equation} 25 | ``` 26 | or equivalently, after applying the inverse-logit, i.e. sigmoid function (let’s also simplify X1 to X) 27 | ``` math 28 | \begin{equation} 29 | \hat{p}\left(Y=1\middle| X\right)=\frac{e^{\beta_0+X\beta_1}}{1+e^{\beta_0+X\beta_1}} 30 | \hspace{2cm} (3) 31 | \end{equation} 32 | ``` 33 | where $$Y_i$$​ are independent Bernoulli random variables with probabilities $$\hat{p_i}$$​. 34 | 35 | **(Average) Marginal Effect** 36 | 37 | Now, let’s introduce the marginal effect (ME) of a regression model. The ME of a given predictor variable X is the slope of the regression surface with respect to X, reflecting the rate at which Y changes at a given point with respect to X, while holding values of all other predictors constant. In other words, ME is an instantaneous rate of change, calculated as partial derivative of the regression equation with respect to X. For example, for an equation: Y = β0 + β1X1+ β2X2 + β3X1X2 corresponding ME(Y, X2) = ∂Y/ ∂X2 = β2 + β3X1 38 | The average marginal effect (AME) calculates ME at every observed value of X and averages across the resulting effect estimates. In other words, AME is an average effect of an infinitesimal change in X across all observations: 39 | 40 | ``` math 41 | \begin{equation} 42 | {AME}_{Xk}(model)=\beta_k\times\frac{1}{N}\sum_{i=1}^{N}\frac{\partial E(Y_i|{Xk}_i,\ covariates)}{\partial Xk} 43 | \hspace{2cm} (4) 44 | \end{equation} 45 | ``` 46 |   47 | For a generally defined equation: 48 | ``` math 49 | \begin{equation} 50 | \sigma\left(x\right)=\frac{e^x}{1+e^x} 51 | \hspace{2cm} (5) 52 | \end{equation} 53 | ``` 54 | 55 | the partial derivative (using the quotient rule and rewriting back in terms of σ) is of the form: 56 | ``` math 57 | \begin{equation} 58 | \frac{d\sigma(x)}{dx}=\sigma(x)(1-\sigma\left(x\right)) 59 | \hspace{2cm} (6) 60 | \end{equation} 61 | ``` 62 | 63 | Therefore, 64 | ``` math 65 | \begin{equation} 66 | {AME}_X(model)=\beta\times\frac{1}{N}\sum_{i=1}^{N}{\hat{p}\left(Y=1\middle| X=x_i\right)\times\hat{p}\left(Y=0\middle| X=x_i\right)} 67 | \hspace{2cm} (7) 68 | \end{equation} 69 | ``` 70 | 71 | **AME for the binary predictor** 72 | 73 | For a categorical predictor, however, there is no something like “infinitesimal change”. There is just switch between categories, so the AME becomes a contrast, i.e. a difference between values of the appropriate partial derivative calculated at the selected category levels, thus: 74 | 75 | ``` math 76 | \begin{equation} 77 | {AME}_X\left(model\right)=\hat{p}\left(Y=1\middle| X=B\right)-\hat{p}\left(Y=1\middle| X=A\right)=\ \widehat{p_B}-\widehat{p_A} 78 | \hspace{2cm} (8) 79 | \end{equation} 80 | ``` 81 | 82 | Which means, that the AME for such defined logistic regression corresponds to a difference in two estimated group probabilities, expressed in percentage points. 83 | 84 | **Variance of AME for the binary predictor** 85 | 86 | Now, I will show the equivalence between the variance in the Wald’s z statistic and the variance of the AME for the binary predictor. 87 | 88 | ``` math 89 | \begin{equation} 90 | var\left(AME\right)=var(\widehat{p_B}-\widehat{p_A})=\ \frac{\widehat{p_A}\left(1-\widehat{p_A}\right)}{n_A}+\frac{\widehat{p_B}\left(1-\widehat{p_B}\right)}{n_B} 91 | \hspace{2cm} (9) 92 | \end{equation} 93 | ``` 94 | Several replacements will save a lot of typing: 95 | - $$P_A=\ \hat{p}\left(Y=1\middle| X=A\right),\ P_B=\ \hat{p}\left(Y=1\middle| X=B\right)$$ 96 | - $$P_i\times\left(1-P_i\right)=P_iQ_i$$ 97 | 98 | Let’s introduce the AME function: 99 | ``` math 100 | \begin{equation} 101 | AME=\ g\left(\beta_0,\beta_1\right)=P_B-P_A 102 | \hspace{2cm} (10) 103 | \end{equation} 104 | ``` 105 | 106 | Let’s also encode the two levels {A, B} using a single binary predictor X such that: A: X=0, B: X=1, let’s express $$P_A$$ and $$P_B$$ in terms of beta coefficients: 107 | ``` math 108 | \begin{equation} 109 | \left\{ 110 | \begin{aligned} 111 | P_A &= P(Y = 1, X = 0) = \frac{e^{\beta_0}}{1 + e^{\beta_0}} \\ 112 | P_B &= P(Y = 1, X = 1) = \frac{e^{\beta_0 + \beta_1}}{1 + e^{\beta_0 + \beta_1}} 113 | \end{aligned} 114 | \right. 115 | \hspace{2cm} (11) 116 | \end{equation} 117 | ``` 118 | so the AME function is now expressed as: 119 | 120 | ``` math 121 | \begin{equation} 122 | g\left(\beta_0,\beta_1\right)=\ \frac{e^{\beta_0+\beta_1}}{1+e^{\beta_0+\beta_1}}-\frac{e^{\beta_0}}{1+e^{\beta_0}} 123 | \hspace{2cm} (12) 124 | \end{equation} 125 | ``` 126 | 127 | The variance for the AME is typically obtained by the _delta method_: 128 | ``` math 129 | \begin{equation} 130 | var\left(AME\right)=var\left(g\left(\beta_0,\beta_1\right)\right)\approx{\nabla g}^T\left(\beta_0,\beta_1\right)\times\Sigma\times\nabla g\left(\beta_0,\beta_1\right) 131 | \hspace{2cm} (13) 132 | \end{equation} 133 | ``` 134 | 135 | Let’s first obtain the derivatives. 136 | Recall, that: 137 | 138 | ``` math 139 | \begin{equation} 140 | \begin{aligned} 141 | \text{for } \sigma(x) &= \frac{e^x}{1 + e^x} \\ 142 | \frac{d\sigma(x)}{dx} &= \sigma(x)(1 - \sigma(x)) = \frac{e^x}{1 + e^x} \times \frac{1}{1 + e^x} = \frac{e^x}{(1 + e^x)^2} 143 | \end{aligned} 144 | \hspace{2cm} (14) 145 | \end{equation} 146 | ``` 147 | Therefore, 148 | 149 | ``` math 150 | \begin{equation} 151 | \nabla g\left(\beta_0,\beta_1\right) = 152 | \left[ 153 | \begin{matrix} 154 | \frac{\partial g}{\partial\beta_0} \\ 155 | \frac{\partial g}{\partial\beta_1} 156 | \end{matrix} 157 | \right] 158 | = 159 | \left[ 160 | \begin{matrix} 161 | P_B\left(1 - P_B\right) - P_A\left(1 - P_A\right) \\ 162 | P_B\left(1 - P_B\right) 163 | \end{matrix} 164 | \right] 165 | = 166 | \left[ 167 | \begin{matrix} 168 | P_B Q_B - P_A Q_A \\ 169 | P_B Q_B 170 | \end{matrix} 171 | \right] 172 | = 173 | \left[ 174 | \begin{matrix} 175 | \frac{e^{\beta_0+\beta_1}}{\left(1+e^{\beta_0+\beta_1}\right)^2} \\ 176 | \frac{e^{\beta_0}}{\left(1+e^{\beta_0}\right)^2} 177 | \end{matrix} 178 | \right] 179 | \hspace{1.5cm} (15) 180 | \end{equation} 181 | ``` 182 | Now, we need the variance-covariance matrix, i.e. 183 | ``` math 184 | \begin{equation} 185 | \Sigma=\left[\begin{matrix}var(\beta_0)&cov(\beta_0,\beta_1)\\cov(\beta_0,\beta_1&var(\beta_1)\\\end{matrix}\right] 186 | \hspace{2cm} (16) 187 | \end{equation} 188 | ``` 189 | 190 | This can be obtained by inverting the Fisher information matrix given by: 191 | ``` math 192 | \begin{equation} 193 | \Sigma=I^{-1}=\left(X^TWX\right)^{-1} 194 | \hspace{2cm} (17) 195 | \end{equation} 196 | ``` 197 | 198 | where X is the design matrix with 2 columns ($β_0$ of ones and $β_1$ indicating when X=1), with $n_A$ and $n_B$ number of rows corresponding to group A and B, respectively. 199 | 200 | ``` math 201 | \begin{equation} 202 | X=\left[\begin{matrix}1&A=0\\1&A=0\\\vdots&\vdots\\1&A=0\\1&B=1\\\vdots&\vdots\\1&B=1\\\end{matrix}\right] 203 | \hspace{2cm} (18) 204 | \end{equation} 205 | ``` 206 | and W is the diagonal matrix of weights, of the block-diagonal form: 207 | ``` math 208 | \begin{equation} 209 | W=diag\left(P_i\times\left(1-P_i\right)\right)=\left[\begin{matrix}P_AQ_A&0&\ldots&0&0&0&\ldots&0\\0&P_AQ_A&\cdots&0&0&0&\ldots&0\\\vdots&\vdots&\ddots&\vdots&\vdots&\vdots&\ddots&\vdots\\0&0&\ldots&P_AQ_A&0&0&\ldots&0\\0&0&\ldots&0&P_BQ_B&0&\ldots&0\\0&0&\ldots&0&0&P_BQ_B&\ldots&0\\\vdots&\vdots&\ddots&\vdots&\vdots&\vdots&\ddots&\vdots\\0&0&\ldots&0&0&0&\ldots&P_BQ_B\\\end{matrix}\right] 210 | \hspace{2cm} (19) 211 | \end{equation} 212 | ``` 213 | which can be simplified to: 214 | 215 | ``` math 216 | \begin{equation} 217 | W=diag\left(P_i\times\left(1-P_i\right)\right)=\left[\begin{matrix}P_AQ_AI_{nA}&0\\0&P_BQ_BI_{nB}\\\end{matrix}\right] 218 | \hspace{2cm} (20) 219 | \end{equation} 220 | ``` 221 | where $I_nA$ and $I_nB$ are respective identity matrices. 222 | 223 | The multiplication result can be abbreviated to appropriate sums: 224 | ``` math 225 | \begin{equation} 226 | X^TWX=\left[\begin{matrix}\Sigma1P_iQ_i&\Sigma X P_iQ_i\\\Sigma X P_iQ_i&\Sigma X^2P_iQ_i\\\end{matrix}\right] 227 | \hspace{2cm} (21) 228 | \end{equation} 229 | ``` 230 | 231 | where **1** is the result of multiplying 1 x 1 (the $β_0$ vector), and **X**s refer to the other products of the $β_0$ and $β_1$ vectors. Since these vectors consist only of 0 and 1, and 0 refers to the “A” group and 1 refers to the “B” group, their products keep only the “B” part. 232 | Therefore, the final result is: 233 | ``` math 234 | \begin{equation} 235 | X^TWX=\left[\begin{matrix}{n_AP}_AQ_A+{n_BP}_BQ_B&{n_BP}_BQ_B\\{n_BP}_BQ_B&{n_BP}_BQ_B\\\end{matrix}\right] 236 | \hspace{2cm} (22) 237 | \end{equation} 238 | ``` 239 | 240 | So the inverse can be computed (remembering that both covariance elements are equal): 241 | ``` math 242 | \begin{equation} 243 | \Sigma=\left(X^TWX\right)^{-1}=\frac{1}{\left({n_AP}_AQ_A+{n_BP}_BQ_B\right)\times{n_BP}_BQ_B-\left({n_BP}_BQ_B\right)^2}\left[\begin{matrix}{n_BP}_BQ_B&-{n_BP}_BQ_B\\-{n_BP}_BQ_B&{n_AP}_AQ_A+{n_BP}_BQ_B\\\end{matrix}\right] 244 | \hspace{2cm} (23) 245 | \end{equation} 246 | ``` 247 | 248 | After simplifying the denominator term: 249 | ``` math 250 | \begin{equation} 251 | \left({n_AP}_AQ_A+{n_BP}_BQ_B\right)\times{n_BP}_BQ_B-\left({n_BP}_BQ_B\right)^2={n_AP}_AQ_A\times{n_BP}_BQ_B+\left({n_BP}_BQ_B\right)^2-\left({n_BP}_BQ_B\right)^2={n_AP}_AQ_A\times{n_BP}_BQ_B 252 | \hspace{1cm} (24) 253 | \end{equation} 254 | ``` 255 | finally: 256 | 257 | ``` math 258 | \begin{equation} 259 | \Sigma=\frac{1}{{n_AP}_AQ_A\times{n_BP}_BQ_B}\left[\begin{matrix}{n_BP}_BQ_B&-{n_BP}_BQ_B\\-{n_BP}_BQ_B&{n_AP}_AQ_A+{n_BP}_BQ_B\\\end{matrix}\right]=\left[\begin{matrix}\frac{1}{{n_AP}_AQ_A}&-\frac{1}{{n_AP}_AQ_A}\\-\frac{1}{{n_AP}_AQ_A}&\frac{{n_AP}_AQ_A+{n_BP}_BQ_B}{{n_AP}_AQ_A\times{n_BP}_BQ_B}\\\end{matrix}\right] 260 | \hspace{2cm} (25) 261 | \end{equation} 262 | ``` 263 | 264 | By recalling formula #13 and matrix #15, we can express the variance of AME as: 265 | ``` math 266 | \begin{equation} 267 | var\left(AME\right)=\left(\frac{\partial g}{\partial\beta_0}\right)^2var\left(\beta_0\right)+\left(\frac{\partial g}{\partial\beta_1}\right)^2var\left(\beta_1\right)+2\ast\frac{\partial g}{\partial\beta_0}\frac{\partial g}{\partial\beta_1}covar\left(\beta_0,\ \beta_1\right) 268 | \hspace{2cm} (26) 269 | \end{equation} 270 | ``` 271 | 272 | which expands to: 273 | 274 | ``` math 275 | \begin{equation} 276 | var\left(AME\right)=\frac{\left(P_BQ_B-P_AQ_A\right)^2}{{n_AP}_AQ_A}+\frac{\left(P_BQ_B\right)^2\times\left({n_AP}_AQ_A+{n_BP}_BQ_B\right)}{{n_AP}_AQ_A\times{n_BP}_BQ_B}-2\ast\frac{\left(P_BQ_B-P_AQ_A\right)\times P_BQ_B}{{n_AP}_AQ_A} 277 | \hspace{2cm} (27) 278 | \end{equation} 279 | ``` 280 | 281 | Let's replace $P*Q$ with V to simplify (note: I occasionally use "×" to denote arithmetic multiplication to facilicate reading; A×B is just AB): 282 | 283 | ``` math 284 | \begin{equation} 285 | \begin{aligned} 286 | var(AME) &= \frac{(V_B - V_A)^2}{n_A V_A} + \frac{V_B^2 \times (n_A V_A + n_B V_B)}{n_A V_A \times n_B V_B} - 2 \times \frac{(V_B - V_A) \times V_B}{n_A V_A} \\[10pt] 287 | &= \frac{(V_B - V_A)(V_B - V_A - 2V_B)}{n_A V_A} + \frac{V_B \times (n_A V_A + n_B V_B)}{n_A V_A \times n_B} \\[10pt] 288 | &= \frac{(V_B - V_A)(-V_B - V_A)}{n_A V_A} + \frac{V_B n_A V_A + V_B^2 n_B}{n_A V_A \times n_B} \\[10pt] 289 | &= \frac{V_A^2 - V_B^2}{n_A V_A} + \frac{V_B}{n_B} + \frac{V_B^2}{n_A V_A} \\[10pt] 290 | &= \frac{V_A}{n_A} + \frac{V_B}{n_B} 291 | \hspace{1cm} (28) 292 | \end{aligned} 293 | \end{equation} 294 | ``` 295 | Therefore, 296 | ``` math 297 | \begin{equation} 298 | var\left(AME\right)=\frac{V_A}{n_A}+\frac{V_B}{n_B}=\frac{P_AQ_A}{n_A}+\frac{P_BQ_B}{n_B}=\frac{\widehat{p_A}\left(1-\widehat{p_A}\right)}{n_A}+\frac{\widehat{p_B}\left(1-\widehat{p_B}\right)}{n_B}\blacksquare 299 | \hspace{2cm} (29) 300 | \end{equation} 301 | ``` 302 | This way I have shown the equivalence of the 2-sample Wald’s z-statistic for comparing proportions with unpooled variances and the average marginal effect of the logistic regression with a single binary predictor distinguishing the compared samples. 303 | Also, since the Estimated Marginal Means (EM-Means) on so defined logistic regression and “re-grided” to the probability scale represent the estimated probabilities, the contrast comparing them through the Wald’s approach yields exactly the same result. 304 |   305 | Although some minor discrepancies exist due to Maximum Likelihood Estimation, even for so small samples (N=10 and 20) the agreement is just perfect. 306 | 307 | **Average Marginal Effect** 308 | ```r 309 | > wald_z_test_non_pooled(x1 = 6, n1 = 20, x2 = 10, n2 = 20) 310 | diff z chi2 se p.value p.value_1 LCI HCI 311 | 1 -0.2 -1.318761 1.73913 0.1516575 0.187249 0.09362452 -0.4972433 0.09724326 312 | > 313 | > data <- data.frame(response = factor(c(rep("Success", 6), rep("Failure", 20-6), 314 | + rep("Success", 10), rep("Failure", 20-10))), 315 | + grp = factor(rep(c("B", "A"), each=20))) 316 | > 317 | > m <- glm(response ~ grp, data = data, family = binomial(link = "logit")) 318 | > data.frame(marginaleffects::avg_slopes(m)) %>% mutate(across(where(is.numeric), ~round(., 6))) 319 | term contrast estimate std.error statistic p.value s.value conf.low conf.high 320 | 1 grp B - A -0.2 0.151657 -1.318762 0.187249 2.416973 -0.497243 0.097243 321 | ``` 322 | ![obraz](https://github.com/user-attachments/assets/4f8db144-1a41-4baa-bf10-df8e062bf6ff) 323 | 324 | **EM-means** 325 | ```r 326 | > library(emmeans) 327 | > update(pairs(emmeans(m, specs = ~grp, regrid="response")), infer = c(TRUE, TRUE)) %>% 328 | + data.frame() %>% 329 | + mutate(across(where(is.numeric), ~round(., 6))) 330 | contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value 331 | 1 A - B 0.2 0.151657 Inf -0.097243 0.497243 1.318761 0.187249 332 | ``` 333 | 334 | ![obraz](https://github.com/user-attachments/assets/56c91bf1-f22f-49a3-96ba-38a0470710b7) 335 | 336 | The implementation of the z statistic: 337 | ```r 338 | wald_z_test_non_pooled <- function(x1, n1, x2, n2, conf.level=0.95) { 339 | p1 <- x1/n1 340 | p2 <- x2/n2 341 | 342 | se_p1 <- sqrt(p1 * (1 - p1) / n1) 343 | se_p2 <- sqrt(p2 * (1 - p2) / n2) 344 | 345 | se_diff <- sqrt(se_p1^2 + se_p2^2) 346 | 347 | z <- (p1 - p2) / se_diff 348 | p <- 2 * (1 - pnorm(abs(z))) 349 | hCI <- abs(qnorm((1 - conf.level)/2)) * se_diff 350 | 351 | return(data.frame(diff=p1-p2, 352 | z = z, chi2 = z^2, 353 | se = se_diff, 354 | p.value = p, p.value_1 =p/2, 355 | LCI = (p1-p2) - hCI, 356 | HCI = (p1-p2) + hCI, 357 | row.names = NULL)) 358 | } 359 | ``` 360 | 361 | ```````` 362 | -------------------------------------------------------------------------------- /logistic_regression_Rao_Wald_z_test_proportions.md: -------------------------------------------------------------------------------- 1 | # Proving the equivalence between the 2-sample Wald’s z-statistic for comparing proportions with pooled variances and the Rao score test over logistic regression with a single binary predictor 2 | #### Adrian Olszewski, 21.03.2025 3 | 4 | The Wald’s z-statistic for difference in 2 proportions with unpooled variances is of the following form: 5 | 6 | ``` math 7 | \begin{equation} 8 | z=\ \frac{\widehat{p_B}-\widehat{p_A}}{\sqrt{p\left(1-p\right)\left(\frac{1}{n_A}+\frac{1}{n_B}\right)}};\ z~N(0,\ 1) 9 | \hspace{2cm} (1) 10 | \end{equation} 11 | ``` 12 | Where $\widehat{p_A}\$ stands for the estimated probability (sample proportion, %) in the 1st group, 13 | $\widehat{p_B}$ is the estimated probability in the 2nd group, $n_A$ and $n_B$ denote respective group sizes, and p is the pooled probability $p=\frac{x_A+x_B}{n_A+n_B}$. 14 | 15 | Traditionally the Wald’s statistic is expressed in the squared form, becoming: $z^2=\chi_{df=1}^2$. 16 | Both forms yield the same p-value. For convenience I will show that this $\chi_{df=1}^2$ statistic is 1:1 equivalent to the Rao score test over the logistic regression with a single binary predictor playing role of indicator for the compared samples. 17 | To simplify calculations, I will derive the formula for pooled probability p and the overall statistic form separately. 18 | 19 | **The equation of logistic regression** 20 | 21 | Let’s start from the equation of the logistic regression with a single binary predictor: 22 | 23 | ``` math 24 | \begin{equation} 25 | logit\left(E\left(Y\middle| X\right)=logit\left(\hat{p}(Y=1|X\right)\right)=\ln{\left(\frac{\hat{p}\left(Y=1\middle| X\right)}{1-\hat{p}\left(Y=1\middle| X\right)}\right)}=\beta_0+X_1\beta_1 26 | \hspace{2cm} (2) 27 | \end{equation} 28 | ``` 29 | 30 | or equivalently, after applying the inverse-logit, i.e. sigmoid function (let’s also simplify X1 to just X) 31 | 32 | ``` math 33 | \begin{equation} 34 | \hat{p}\left(Y=1\middle| X\right)=\frac{e^{\beta_0+X\beta_1}}{1+e^{\beta_0+X\beta_1}}=\frac{e^\eta}{1+e^\eta} 35 | \hspace{2cm} (3) 36 | \end{equation} 37 | ``` 38 | where Yi are independent Bernoulli random variables with probabilities pi. The second, simpler form with η=β0+Xβ1 will facilitate later calculations. 39 | 40 | Let’s also encode the two levels {A, B} using a single binary predictor X such that: A: X=0, B: X=1, let’s simplify notation and express $p_A$ and $p_B$ in terms of beta coefficients: 41 | 42 | ``` math 43 | \begin{equation} 44 | \begin{cases} 45 | \hat{p}_A = P(Y = 1, X = 0) = \frac{e^{\beta_0}}{1 + e^{\beta_0}} \\ 46 | \hat{p}_B = P(Y = 1, X = 1) = \frac{e^{\beta_0 + \beta_1}}{1 + e^{\beta_0 + \beta_1}} 47 | \end{cases} 48 | \hspace{2cm} (4) 49 | \end{equation} 50 | ``` 51 | 52 | Let’s also skip the hat notation for estimated p and use simply $p_A$ and $p_B$ until stated otherwise. 53 | 54 | **Pooled probability** 55 | 56 | Under $H_0$, i.e. $β_1=0$, $p_A = p_B$, and the best estimator of this common proportion is the pooled proportion p. Let’s find its form. 57 | We need to assume that the data consists of two independent binomial samples: 58 | ``` math 59 | \begin{equation} 60 | \begin{cases} 61 | {group\ A=X}_A~Binomial\left(n_A,p_A\right) \\ 62 | {group\ B=X}_B~Binomial\left(n_B,p_B\right) 63 | \end{cases} 64 | \hspace{2cm} (5) 65 | \end{equation} 66 | ``` 67 | 68 | So the likelihood function is: 69 | 70 | ``` math 71 | \begin{equation} 72 | L(p_A,p_B)=\left(\begin{matrix}n_A\\x_A\\\end{matrix}\right)p_A^{x_A}\left(1-p_A\right)^{n_A-x_A}\bullet\left(\begin{matrix}n_B\\x_B\\\end{matrix}\right)p_B^{x_B}\left(1-p_B\right)^{n_B-x_B} 73 | \hspace{2cm} (6) 74 | \end{equation} 75 | ``` 76 | where $x_A$ and $x_B$ are the observed number of successes in groups A and B, respectively, and $n_A$ and $n_B$ are the total sample sizes in each group. 77 | 78 | Under the null hypothesis $H0: p_A=p_B=p_{pooled}$. The pooled p is of some form yet unknown. 79 | Knowing that: 80 | 81 | ``` math 82 | \begin{equation} 83 | \begin{aligned} 84 | L(p_{pooled}) &= \binom{n_A}{x_A} \binom{n_B}{x_B} p_{pooled}^{x_A + x_B} (1 - p_{pooled})^{(n_A - x_A) + (n_B - x_B)} \\ 85 | &= \binom{n_A}{x_A} \binom{n_B}{x_B} p_{pooled}^{x_A + x_B} (1 - p_{pooled})^{(n_A + n_B - x_A - x_B)} 86 | \end{aligned} 87 | \hspace{2cm} (7) 88 | \end{equation} 89 | ``` 90 | we obtain a single binomial likelihood where the total number of successes is $x_A+x_B$ and the total number of trials is $n_A+n_B$, 91 | i.e. 92 | 93 | ``` math 94 | \begin{equation} 95 | $X_A+X_B~Binomial\left(n_A+n_B,p_{pooled}\right) 96 | \hspace{2cm} (8) 97 | \end{equation} 98 | ``` 99 | 100 | **Note**: The binomial coefficients are multiplied rather than pooled, as they originally come from two independent samples, the true $H_0$ is an assumption, not the property of the data. Actually, the choice doesn’t matter, as this term will be zeroed when taking the derivative. 101 | 102 | Let’s simplify notation and replace $p_{pooled}$ with just p. Now, the log-likelihood, $log(L(p))=\ell\(p)$, is defined as: 103 | 104 | ``` math 105 | \begin{equation} 106 | \ell(p)=const\bullet(x_A+x_B)\bullet log(p)+(n_A+n_B-x_A-x_B)\bullet log(1-p) 107 | \hspace{2cm} (9) 108 | \end{equation} 109 | ``` 110 | 111 | (I wrote _const_ to highlight that this term will disappear after taking the derivative). 112 | Now by taking ${\frac{d\ell\left(p\right)}{dp}}$ and setting it to 0, we obtain (provided that $p\notin0,1$): 113 | 114 | ``` math 115 | \begin{equation} 116 | \begin{aligned} 117 | \frac{d\ell\left(p\right)}{dp}=\frac{x_A+x_B}{p}-\frac{n_A+n_B-x_A-x_B}{1-p}=0, \\ 118 | \left(x_A+x_B\right)\left(1-p\right)=\left(n_A+n_B-x_A-x_B\right)\bullet p \\ 119 | x_A+x_B={(n}_A+n_B)\bullet p\\ 120 | p\ =\ \frac{x_A+x_B}{n_A+n_B} 121 | \end{aligned} 122 | \hspace{2cm} (10) 123 | \end{equation} 124 | ``` 125 | 126 | Or, alternatively, since $p_i={\frac{x_i}{n_i}}$: 127 | 128 | ``` math 129 | \begin{equation} 130 | p=\frac{p_An_A+p_Bn_B}{n_A+n_B} 131 | \hspace{2cm} (11) 132 | \end{equation} 133 | ``` 134 | 135 | **Log-likelihood of the logistic regression with a single binary predictor** 136 | The log-likelihood function is of the form: 137 | 138 | ``` math 139 | \begin{equation} 140 | \ell(\beta)=log(L(\beta))=log\left(\prod_{i=1}^{n}p_i^{y_i}\bullet\left(1-p_i\right)^{1-y_i}\right)=\sum_{i=1}^{n}{{[y}_ilog}(p_i)+(1-y_i)log(1-p_i)] 141 | \hspace{2cm} (12) 142 | \end{equation} 143 | ``` 144 | where β is the vector of estimated coefficients, i.e. $(β_0, β_1)$. 145 | 146 | Let’s express p and 1-p in terms of $η=β_0+Xβ_1$: 147 | 148 | ``` math 149 | \begin{equation} 150 | \begin{cases} 151 | p=\frac{e^\eta}{1+e^\eta} \\ 152 | 1-p=\frac{1}{1+e^\eta} 153 | \end{cases} 154 | \hspace{2cm} (13) 155 | \end{equation} 156 | ``` 157 | Then: 158 | ``` math 159 | \begin{equation} 160 | \begin{aligned} 161 | \ell\left(\beta\right)&=\sum_{i=1}^{n}\left[y_ilog\left(\frac{e^{\eta_i}}{1+e^{\eta_i}}\right)+\left(1-y_i\right)log\left(\frac{1}{1+e^{\eta_i}}\right)\right] \\ 162 | &=\sum_{i=1}^{n}\left[y_i\left(log\left(e^{\eta_i}\right)-log\left(1+e^{\eta_i}\right)\right)-\left(1-y_i\right)log\left(1+e^{\eta_i}\right)\right] \\ 163 | &=\sum_{i=1}^{n}\left[y_i\eta_i\ -y_ilog\left(1+e^{\eta_i}\right)-\left(1-y_i\right)log\left(1+e^{\eta_i}\right)\right] \\ 164 | &=\sum_{i=1}^{n}{y_i\eta_i-log\left(1+e^{\eta_i}\right)} 165 | \end{aligned} 166 | \hspace{2cm} (14) 167 | \end{equation} 168 | ``` 169 | 170 | **Gradient, Score, Hessian, Fisher Information, Covariance…** 171 | 172 | We will need both the gradient and Hessian of the log-likelihood. For future use, we will call the gradient as _Rao score function_ denoted by “U” and the Hessian as “H”. 173 | First, let’s find the form of U(β): 174 | ``` math 175 | \begin{equation} 176 | U\left(\beta\right)=\left[\begin{matrix}\frac{\partial\ell\left(\beta\right)}{\partial\beta_0}\\\frac{\partial\ell\left(\beta\right)}{\partial\beta_1}\\\end{matrix}\right] 177 | \hspace{2cm} (15) 178 | \end{equation} 179 | ``` 180 | 181 | By noticing that ${\frac{\partial\eta_i}{\partial\beta_0}}=1 , {\frac{\partial\eta_i}{\partial\beta_1}}=x_i$ and remembering that $p=\frac{e^\eta}{1+e^\eta}$ we obtain: 182 | 183 | ``` math 184 | \begin{equation} 185 | \frac{\partial\ell\left(\beta\right)}{\partial\beta_0}=\sum_{i=1}^{n}{\left(y_i\frac{\partial\eta_i}{{\partial\beta}_0}-\frac{1}{1+e^{\eta_i}}e^{\eta_i}\frac{\partial\eta_i}{{\partial\beta}_0}\right)=\sum_{i=1}^{n}\left(y_{i\bullet}1-\frac{e^{\eta_i}}{1+e^{\eta_i}}\bullet1\right)=\sum_{i=1}^{n}\left(y_i-p_i\right)} 186 | \hspace{2cm} (16) 187 | \end{equation} 188 | ``` 189 | and 190 | 191 | ``` math 192 | \begin{equation} 193 | \frac{\partial\ell\left(\beta\right)}{\partial\beta_1}=\sum_{i=1}^{n}{\left(y_i\frac{\partial\eta_i}{{\partial\beta}_1}-\frac{1}{1+e^{\eta_i}}e^{\eta_i}\frac{\partial\eta_i}{{\partial\beta}_1}\right)=\sum_{i=1}^{n}\left(y_ix_i-\frac{e^{\eta_i}}{1+e^{\eta_i}}x_i\right)=\sum_{i=1}^{n}{x_i\left(y_i-p_i\right)}} 194 | \hspace{2cm} (17) 195 | \end{equation} 196 | ``` 197 | So finally: 198 | ``` math 199 | \begin{equation} 200 | U\left(\beta\right)=\left[\begin{matrix}\sum_{i=1}^{n}\left(y_i-p_i\right)\\\sum_{i=1}^{n}{x_i\left(y_i-p_i\right)}\\\end{matrix}\right] 201 | \hspace{2cm} (18) 202 | \end{equation} 203 | ``` 204 | 205 | Now, the Hessian: 206 | ``` math 207 | \begin{equation} 208 | H\left(\beta\right)=\left[\begin{matrix}\frac{\partial^2\ell\left(\beta\right)}{\partial\beta_0^2}&\frac{\partial^2\ell\left(\beta\right)}{\partial\beta_0\beta_1}\\\frac{\partial^2\ell\left(\beta\right)}{\partial\beta_1\beta_0}&\frac{\partial^2\ell\left(\beta\right)}{\partial\beta_1^2}\\\end{matrix}\right] 209 | \hspace{2cm} (19) 210 | \end{equation} 211 | ``` 212 | The partial derivatives are as follows: 213 | ``` math 214 | \begin{equation} 215 | \frac{\partial^2\ell\left(\beta\right)}{\partial\beta_0^2}=\frac{\partial}{\partial\beta_0}\sum_{i=1}^{n}\left(y_i-p_i\right)=\sum_{i=1}^{n}{-\frac{\partial p_i}{\partial\beta_0}}=-\sum_{i=1}^{n}{\frac{\partial p_i}{\partial\beta_0}=-}\sum_{i=1}^{n}{\frac{\partial p_i}{\partial\eta_0}\frac{\partial\eta_0}{\partial\beta_0}=}-\sum_{i=1}^{n}{p_i\left(1-p_i\right)} 216 | \hspace{1cm} (20) 217 | \end{equation} 218 | ``` 219 | ``` math 220 | \begin{equation} 221 | \frac{\partial^2\ell\left(\beta\right)}{\partial\beta_1^2}=\frac{\partial}{\partial\beta_1}\sum_{i=1}^{n}{x_i\left(y_i-p_i\right)}=\sum_{i=1}^{n}{x_i\left(-\frac{\partial p_i}{\partial\beta_1}\right)}=-\sum_{i=1}^{n}{x_ip_i\left(1-p_i\right)}x_i=-\sum_{i=1}^{n}{x_i^2p_i\left(1-p_i\right)} 222 | \hspace{1cm} (21) 223 | \end{equation} 224 | ``` 225 | ``` math 226 | \begin{equation} 227 | \frac{\partial^2\ell\left(\beta\right)}{\partial\beta_0\beta_1}=\frac{\partial^2\ell\left(\beta\right)}{\partial\beta_1\beta_0}=\frac{\partial}{\partial\beta_1}\sum_{i=1}^{n}\left(y_i-p_i\right)=-\sum_{i=1}^{n}\frac{\partial p_i}{\partial\beta_1}=-\sum_{i=1}^{n}{p_i\left(1-p_i\right)}x_i 228 | \hspace{1cm} (22) 229 | \end{equation} 230 | ``` 231 | Therefore: 232 | ``` math 233 | \begin{equation} 234 | H\left(\beta\right)=\left[\begin{matrix}-\sum_{i=1}^{n}{p_i\left(1-p_i\right)}&-\sum_{i=1}^{n}{p_i\left(1-p_i\right)}x_i\\-\sum_{i=1}^{n}{p_i\left(1-p_i\right)}x_i&-\sum_{i=1}^{n}{x_i^2p_i\left(1-p_i\right)}\\\end{matrix}\right] 235 | \hspace{1cm} (23) 236 | \end{equation} 237 | ``` 238 | Let’s also determine the Fisher Information matrix: 239 | ``` math 240 | \begin{equation} 241 | I\left(\beta\right)=-E(H(\beta))=\left[\begin{matrix}\sum_{i=1}^{n}{p_i\left(1-p_i\right)}&\sum_{i=1}^{n}{p_i\left(1-p_i\right)}x_i\\\sum_{i=1}^{n}{p_i\left(1-p_i\right)}x_i&\sum_{i=1}^{n}{x_i^2p_i\left(1-p_i\right)}\\\end{matrix}\right]=\left[\begin{matrix}\sum_{i=1}^{n}{p_iq_i}&\sum_{i=1}^{n}{p_iq_i}x_i\\\sum_{i=1}^{n}{p_iq_i}x_i&\sum_{i=1}^{n}{p_iq_i}x_i^2\\\end{matrix}\right] 242 | \hspace{1cm} (24) 243 | \end{equation} 244 | ``` 245 | where $q_i=1-p_i$. 246 | 247 | This can be further expanded, by substituting sums with appropriate (per respective group, A and B) counts of elements, 248 | remembering that $n=n_A+n_B$, A: X=0, B: X=1, $p_A=p(Y=1|X=A)$ and $p_B=p(Y=1|X=B)$, and: 249 | 250 | ``` math 251 | \begin{equation} 252 | \sum_{i=1}^{n}{p_i=}\sum_{i:\ X_i=0} p_i+\sum_{i:\ X_i=1} p_i=\sum_{i=1}^{n_A}p_i+\sum_{i=n_A+1}^{{n_B+n}_B}p_i=n_Ap_A\ +\ n_Bp_B 253 | \hspace{2cm} (25) 254 | \end{equation} 255 | ``` 256 | So the final, useful form is: 257 | ``` math 258 | \begin{equation} 259 | I\left(\beta\right)=\left[\begin{matrix}\sum_{i:\ X_i=0}{p_iq_i}+\sum_{i:\ X_i=1}{p_iq_i}&\sum_{i:\ X_i=1}{1p_iq_i}\\\sum_{i:\ X_i=1}{1p_iq_i}&\sum_{i:\ X_i=1}{1p_iq_i}\\\end{matrix}\right]=\left[\begin{matrix}n_Ap_Aq_A+n_Bp_Bq_B&n_Bp_Bq_B\\n_Bp_Bq_B&n_Bp_Bq_B\\\end{matrix}\right] 260 | \hspace{1cm} (26) 261 | \end{equation} 262 | ``` 263 | This matrix will be used to find the covariance one: 264 | ``` math 265 | \begin{equation} 266 | {I\left(\beta\right)}^{-1}\ =\ \Sigma\left(\beta\right)=\left[\begin{matrix}var(\beta_0)&cov(\beta_0,\beta_1)\\cov(\beta_0,\beta_1&var(\beta_1)\\\end{matrix}\right] 267 | \hspace{2cm} (27) 268 | \end{equation} 269 | ``` 270 | 271 | ----- 272 | **Another way to obtain I(β)** 273 | The covariance matrix can also be obtained from: 274 | ``` math 275 | \begin{equation} 276 | I\left(\beta\right)=X^TWX 277 | \hspace{2cm} (28) 278 | \end{equation} 279 | ``` 280 | where X is the design matrix with 2 columns ($β_0$ of 1s and $β_1$ indicating when X=1), with $n_A$ and $n_B$ number of rows corresponding to group A and B, respectively. 281 | 282 | ``` math 283 | \begin{equation} 284 | X=\left[\begin{matrix}1&A=0\\1&A=0\\\vdots&\vdots\\1&A=0\\1&B=1\\\vdots&\vdots\\1&B=1\\\end{matrix}\right]\ 285 | \hspace{2cm} (29) 286 | \end{equation} 287 | ``` 288 | Now, W is the diagonal matrix of weights, of the block-diagonal form: 289 | ``` math 290 | \begin{equation} 291 | W=diag\left(p_i\times\left(1-p_i\right)\right)=\left[\begin{matrix}p_Aq_A&0&\ldots&0&0&0&\ldots&0\\0&p_Aq_A&\cdots&0&0&0&\ldots&0\\\vdots&\vdots&\ddots&\vdots&\vdots&\vdots&\ddots&\vdots\\0&0&\ldots&p_Aq_A&0&0&\ldots&0\\0&0&\ldots&0&p_Bq_B&0&\ldots&0\\0&0&\ldots&0&0&p_Bq_B&\ldots&0\\\vdots&\vdots&\ddots&\vdots&\vdots&\vdots&\ddots&\vdots\\0&0&\ldots&0&0&0&\ldots&p_Bq_B\\\end{matrix}\right] 292 | \hspace{2cm} (30) 293 | \end{equation} 294 | ``` 295 | which can be expressed in a simpler form as: 296 | 297 | ``` math 298 | \begin{equation} 299 | W=diag\left(p_i\times\left(1-p_i\right)\right)=\left[\begin{matrix}p_Aq_AI_{nA}&0\\0&p_Bq_BI_{nB}\\\end{matrix}\right] 300 | \hspace{2cm} (31) 301 | \end{equation} 302 | ``` 303 | where $I_nA$ and $I_nB$ are respective identity matrices. 304 | 305 | The result of the matrix multiplication can be expressed as appropriate sums: 306 | 307 | ``` math 308 | \begin{equation} 309 | X^TWX=\left[\begin{matrix}\sum_{i=1}^{n}{1p_iq_i}&\sum_{i=1}^{n}{x_ip_iq_i}\\\sum_{i=1}^{n}{x_ip_iq_i}&\sum_{i=1}^{n}{x_i^2p_iq_i}\\\end{matrix}\right]=I\left(\beta\right) 310 | \hspace{2cm} (32) 311 | \end{equation} 312 | ``` 313 | where 1 is the result of multiplying 1 x 1 (the $β_0$ vector), and X refer to the other products of the $β_0$ and $β_1$ vectors. 314 | Notice, that this is **exactly the matrix #24**. 315 | 316 | ----- 317 | **The Rao Score and Information Matrix under $H_0: β_1=0$** 318 | 319 | _The Rao Score_ 320 | 321 | The Rao Score and Information Matrix under $H0: β_1=0$ 322 | Recall the formula #18: 323 | 324 | ``` math 325 | \begin{equation} 326 | U\left(\beta\right)=\left[\begin{matrix}\sum_{i=1}^{n}\left(y_i-p_i\right)\\\sum_{i=1}^{n}{x_i\left(y_i-p_i\right)}\\\end{matrix}\right] 327 | \hspace{2cm} (18) 328 | \end{equation} 329 | ``` 330 | Again, $n_A$ and $n_B$ depict the respective number of rows corresponding to group A and B, 331 | for group A: X=0 and for B: X =1 and $n=n_A+n_B$. Also, under $H_0$, $p_A=p_B=p$ (i.e. $\forall_i{\ p}_i=p$), where p is the pooled probability. 332 | 333 | ``` math 334 | \begin{equation} 335 | U\left(\beta_0,\ \beta_1=0\right)=\left[\begin{matrix}\left(\sum_{i=1}^{n}y_i\right)-(n_{A\ }+n_B)\bullet p\\\left(\sum_{i:\ X_i=1} y_i\right){-\ n}_Bp\\\end{matrix}\right] 336 | \hspace{2cm} (33) 337 | \end{equation} 338 | ``` 339 | 340 | Here, $y_i$ is the response vector containing **1**s in respective group. Summing all those **1**s yields the total number of successes in this group, which can be expressed as: 341 | 342 | ``` math 343 | \begin{equation} 344 | \sum_{i=1}^{n} y_i = \sum_{i:\ X_i\in\{0, 1\}} y_i = n_A p_A + n_B p_B \overset{H_0}{\Rightarrow} (n_A + n_B) \cdot p 345 | \hspace{2cm} (34) 346 | \end{equation} 347 | ``` 348 | So, the first element of the vector becomes 0 out and finally: 349 | ``` math 350 | \begin{equation} 351 | U\left(\beta_0,\ \beta_1=0\right)=\left[\begin{matrix}0\\n_B(p_B-p)\\\end{matrix}\right] 352 | \hspace{2cm} (35) 353 | \end{equation} 354 | ``` 355 | 356 | _Information Matrix_ 357 | 358 | By recalling **matrix #26** and remembering that under $H_0$, $p_A=p_B=p$ we obtain: 359 | ``` math 360 | \begin{equation} 361 | I\left(\beta_0,\ \beta_1=0\right)=\left[\begin{matrix}pq(n_A+n_B)&n_Bpq\\n_Bpq&n_Bpq\\\end{matrix}\right] 362 | \hspace{2cm} (36) 363 | \end{equation} 364 | ``` 365 | Let’s also calculate $I^{-1}$ 366 | 367 | ``` math 368 | \begin{equation} 369 | \mathrm{\Sigma}={I\left(\beta_0,\ \beta_1=0\right)}^{-1}=\frac{1}{pq(n_A+n_B)\bullet n_Bpq-\left(n_Bpq\right)^2}\left[\begin{matrix}n_Bpq&-n_Bpq\\-n_Bpq&pq(n_A+n_B)\\\end{matrix}\right] 370 | \hspace{1cm} (37) 371 | \end{equation} 372 | ``` 373 | After simplifying the denominator term: 374 | 375 | ``` math 376 | \begin{equation} 377 | pq(n_A+n_B)\bullet n_Bpq-\left(n_Bpq\right)^2=p^2q^2n_An_B\ +\ p^2q^2n_B^2-p^2q^2n_B^2\ =p^2q^2n_An_B 378 | \hspace{1cm} (38) 379 | \end{equation} 380 | ``` 381 | we finally obtain: 382 | 383 | ``` math 384 | \begin{equation} 385 | \mathrm{\Sigma}(\beta_0,\ \beta_1=0)={I\left(\beta_0,\ \beta_1=0\right)}^{-1}=\frac{1}{p^2q^2n_An_B}\left[\begin{matrix}n_Bpq&-n_Bpq\\-n_Bpq&pq(n_A+n_B)\\\end{matrix}\right]=\left[\begin{matrix}\frac{1}{n_Apq}&-\frac{1}{n_Apq}\\-\frac{1}{n_Apq}&\frac{n_A+n_B}{n_An_Bpq}\\\end{matrix}\right] 386 | \hspace{1cm} (39) 387 | \end{equation} 388 | ``` 389 | 390 | **Rao score test under $H_0: β_1=0$** 391 | 392 | The Rao score test (called also Lagrange multiplier test) under $H_0$ is defined as the following quadratic form: 393 | 394 | ``` math 395 | \begin{equation} 396 | R={U(\beta_0,\ \beta_1=0)}^T\bullet {I(\beta_0,\ \beta_1=0)}^{-1}\bullet U(\beta_0,\ \beta_1=0) 397 | \hspace{2cm} (40) 398 | \end{equation} 399 | ``` 400 | But since the first element of U is 0, this reduces to just scalar operation: $U^2/I = U^2Σ$: 401 | 402 | ``` math 403 | \begin{equation} 404 | \begin{aligned} 405 | R&=U\left(\beta_0,\beta_1=0\right)\mathrm{\Sigma}\left(\beta_0,\beta_1=0\right)= \\ 406 | &=\left[n_B\left(p_B-p\right)\right]^2\bullet\frac{n_A+n_B}{pqn_An_B}=\frac{\left[n_B\left(p_B-p\right)\right]^2}{pq\frac{n_An_B}{n_A+n_B}} \\ 407 | &=\frac{n_B^2\left(p_B-\frac{n_Ap_A+n_Bp_B}{n_A+n_B}\right)^2}{pq\frac{n_An_B}{n_A+n_B}}=\frac{{n_B^2\left(\frac{p_Bn_A+p_Bn_B-n_Ap_A-n_Bp_B}{n_A+n_B}\right)}^2}{pq\frac{n_An_B}{n_A+n_B}} \\ 408 | &=\frac{\frac{n_B^2\left(p_Bn_A-n_Ap_A\right)^2}{\left(n_A+n_B\right)^2}}{pq\frac{n_An_B}{n_A+n_B}}=\frac{\frac{n_B^2n_A^2\left(p_B-p_A\right)^2}{\left(n_A+n_B\right)^2}}{pq\frac{n_An_B}{n_A+n_B}} \\ 409 | &=\frac{\left(p_B-p_A\right)^2}{pq\frac{n_An_B}{n_A+n_B}\frac{\left(n_A+n_B\right)^2}{n_B^2n_A^2}}=\frac{\left(p_B-p_A\right)^2}{pq\frac{n_A+n_B}{n_An_B}}=\frac{{{(p}_B-p_A)}^2}{pq\left(\frac{1}{n_B}+\frac{1}{n_A}\right)} \\ 410 | &=\frac{{{(p}_B-p_A)}^2}{p\left(1-p\right)\left(\frac{1}{n_A}+\frac{1}{n_B}\right)}=z^2\blacksquare 411 | \end{aligned} 412 | \hspace{2cm} (41) 413 | \end{equation} 414 | ``` 415 | 416 | This way I have proven the equivalence of the 2-sample Wald’s z-statistic for comparing proportions with pooled variances and the Rao score test over the logistic regression with a single binary predictor distinguishing the compared samples. 417 | 418 | ``` r 419 | > wald_z_test_pooled(x1 = 6, n1 = 20, x2 = 10, n2 = 20) 420 | diff z chi2 se p.value p.value_1 LCI HCI 421 | 1 -0.2 -1.290994 1.666667 0.1549193 0.1967056 0.0983528 -0.5036363 0.1036363 422 | 423 | > prop.test(c(6,10), c(20,20), correct = FALSE) 424 | 425 | 2-sample test for equality of proportions without continuity correction 426 | 427 | data: c(6, 10) out of c(20, 20) 428 | X-squared = 1.6667, df = 1, p-value = 0.1967 429 | alternative hypothesis: two.sided 430 | 95 percent confidence interval: 431 | -0.49724326 0.09724326 432 | sample estimates: 433 | prop 1 prop 2 434 | 0.3 0.5 435 | 436 | > data <- data.frame(response = factor(c(rep("Success", 6), rep("Failure", 20-6), 437 | + rep("Success", 10), rep("Failure", 20-10))), 438 | + grp = factor(rep(c("B", "A"), each=20))) 439 | 440 | > m <- glm(response ~ grp, data = data, family = binomial(link = "logit")) 441 | 442 | > anova(m, test = "Rao") 443 | Analysis of Deviance Table 444 | 445 | Model: binomial, link: logit 446 | Response: response 447 | Terms added sequentially (first to last) 448 | 449 | Df Deviance Resid. Df Resid. Dev Rao Pr(>Chi) 450 | NULL 39 53.841 451 | grp 1 1.6805 38 52.160 1.6667 0.1967 452 | ``` 453 | 454 | ![obraz](https://github.com/user-attachments/assets/183ecccc-14d1-4596-bfb8-0aa174621fee) 455 | 456 | The implementation of the z statistic: 457 | ``` r 458 | wald_z_test_pooled <- function(x1, n1, x2, n2, conf.level=0.95) { 459 | p1 <- x1/n1 460 | p2 <- x2/n2 461 | 462 | p_pool <- (p1*n1 + p2*n2) / (n1+n2) 463 | se_p1 <- sqrt(p_pool * (1 - p_pool) / n1) 464 | se_p2 <- sqrt(p_pool * (1 - p_pool) / n2) 465 | 466 | se_diff <- sqrt(se_p1^2 + se_p2^2) 467 | 468 | z <- (p1 - p2) / se_diff 469 | p <- 2 * (1 - pnorm(abs(z))) 470 | hCI <- abs(qnorm((1 - conf.level)/2)) * se_diff 471 | 472 | return(data.frame(diff=p1-p2, 473 | z = z, 474 | chi2 = z^2, 475 | se = se_diff, p.value = p, p.value_1 =p/2, 476 | LCI = (p1-p2) - hCI, 477 | HCI = (p1-p2) + hCI, 478 | row.names = NULL)) 479 | } 480 | ``` 481 | --------------------------------------------------------------------------------