This article with introduce the Simpson’s paradox and offers an solution through Pearl’s Graph Model. It also want to explore the philosophical aspect of causal inference through this problem.

The Simpson’s Paradox

The Paradox

One of the well known example of Simpson’s Paradox is the admission bias of UCB. If we calculate the admission rate of men and women to UCB respectively, we will find the data “indicates” the school has a small “preference” to male. However, if the rates are examined at each individual department, we find most departments are strongly biased against men.

People tend to think it is because those departments in favor of men recruit most of the student thus in general the school seems biased against men.

But the amazing aspect of this paradox is, even if in every department a preference for women is recognized, the whole admission rate for men may also exceeds women.

This phenomenon is often encountered in medical-science statistics. Here is an example

  Treatment Control
High GPA 0.93 (81/87) 0.87 (234/270)
Low GPA 0.73 (192/263) 0.69 (55/80)
Both 0.78 (275/350) 0.83 (268/350)

It seems like Treatment is effective when applied to both H and L group of people, however, if all samples are considered in the same time, we should better leave people untreated.

Here comes the problem, will you apply the treatment to a patient? How will you decide whether the treatment is effective?

A misleading hint: according to the naïve Bayesian point of view, if you don’t know the GPA of the patient, don’t let him take the treatment. Once you know, however his academic performance is, he should taken the treatment.