Correlation to Causation
CorrelationWhen we say that there is correlation between X and Y, what do we mean?
We mean that X and Y have happened together. (Happening together means happening simultaneously or one after the other)
Please note the verb tense - “have happened.” We’re talking about past.
Correlation is talked about ONLY w.r.t. PAST DATA.Thus, when we say that there is a correlation between X and Y, we mean that in the past, X and Y have happened together.
A common misconceptionGiven the term “correlation,” many people believe that correlation between X and Y means that there is a relation between X and Y.
And what does “relation” mean?
For many, it means that either X is the cause of Y or Y is the cause of X.
Thus, many people believe that a correlation between X and Y means that either X is the cause of Y or Y is the cause of X.
This is WRONG.A correlation between X and Y means that X and Y have happened together. There may not be any causal relationship between the two. Their happening together could be pure co-incidence.
For example: there is a very strong correlation between the number of people who drowned by falling into a pool and the number of films Nicolas Cage appeared in. You can look at the following graph to see how closely these two numbers move together.
Attachment:
chart.png [ 129.73 KiB | Viewed 12241 times ]
Do you think there is any causal relationship between these two numbers?
I hope not.
This correlation is pure coincidence. Many other variables are strongly correlated but have no causal relationship between them. You can visit
this link to look at more of weird correlations.
CausationWe say X is the cause of Y when we mean that X led to Y. In other words, X was the reason Y happened. A natural consequence of this relation is that if X had not happened, Y would not have happened.
My hypothesis is that we can never see causation; we can see only correlation. I’ve carried this hypothesis for a few months and haven’t come up with a counter-example. If you have a counter-example, please let me know.
If you believe in my hypothesis, take a deep breath to reflect on the immensity of my hypothesis - our lives revolve around causal relationships, and here I am, saying that we can never see causation. All causal relationships are a product of our mental analysis, not of our observations.An important idea to understand here is that
we talk about causality ONLY w.r.t. past events. For us to talk about the cause of an event, the event must have already happened. If the event hasn’t even happened, there is no sense in talking about the cause of that event.
Some of you may be wondering, “What about the statements
X will cause Y and X causes Y ?” These statements are NOT talking about causality; these statements present sufficient conditions. I've explored the difference between causal statements and sufficient condition statements in
this article.
In a gist,
The statement “X caused Y” can be challenged by saying that “Z caused Y” because the first statement is calling X the cause of Y whereas the second statement calls Z the cause of Y. In this case, we’re saying that only one cause of Y exists.
If there are multiple causes, then both statements have to be modified to say “X partly caused Y” and “Z partly caused Y.” In this case, these statements won’t impact each other since there can be two part-causes. Stating another part cause (Z) will not cast doubt on the idea that X is a part cause of Y.
The statement “X causes Y” is NOT IMPACTED by saying that “Z causes Y”. Why?
Because 'X causes Y' means that X leads to Y or that X is enough to get to Y, i.e., X is a sufficient condition for Y.
Z causes Y means that Z leads to Y or that Z is enough to get to Y, i.e., Z is a sufficient condition for Y.
The idea that Z is enough to get to Y doesn’t cast doubt on the idea that X is enough to get to Y since there can be two different ways to get to Y. The presence of one way doesn’t cast doubt on the presence of another way.
‘X is the cause of Y’ means the same as ‘X caused Y’.
‘X caused Y' means that X led to Y or that X is the reason Y happened.
Correlation to CausationA common jump in logic in our lives and on the GMAT is a jump from correlation to causation. We see that X and Y have happened together, and we conclude that X caused Y.
For example:
I had pani puri yesterday from a place I had never eaten earlier from. I fell ill today morning.
“Oh! That pani puri is the culprit.”This is an example of the jump from correlation to causation. Two events happened together - My eating pani puri and my falling ill. There was a correlation.
I concluded causation - my eating pani puri caused my illness.
Another example:
I have observed that whenever I go to a party, you don’t go, and whenever I don’t go, you go. Clearly, you have a problem with me.This is again a jump from correlation to causation. Two events happened together:
1. My going to party2. Your not going to party
I look at the correlation and jump to the causation that my going to party is the CAUSE of your not going to party.
------------------------------------------------------------------------------------------------------------------------------------------------
A typical format of correlation causation jump is:
X and Y happened together. Therefore, X caused Y.
The premise is the correlation between X and Y. The conclusion presents causation.
Take a moment to think about HOW the premise supports the conclusion.
HOW does ‘correlation between X and Y’ support ‘X caused Y’?
The answer is that ‘X caused Y’ is a possible explanation for ‘X and Y happened together.’
Here’s what’s going on in the author’s mind:
I see that X happened. I also see that Y happened. Why did they occur together?
Oh! I see. X must have caused Y. That’s why Y occurred after X.------------------------------------------------------------------------------------------------------------------------------------------------
In a way, the logic used here is the same as the logic used in the below argument:
Raj couldn’t solve an easy GMAT question. Therefore, Raj is not intelligent.What if I told you that Raj is a Sanskrit scholar and has never studied English?
Then, you wouldn’t believe the argument that Raj is not intelligent. You’d think that perhaps Raj is intelligent but can’t solve GMAT questions because he doesn’t know English.
My statement weakened the argument.
But exactly how did I weaken the argument?
By presenting an alternate reason for the premise.
As I said, this argument follows the same logic - the premise presents a scenario, and the conclusion is a possible explanation for the scenario.
“Raj is not intelligent” is a possible explanation for the fact that he couldn’t solve an easy GMAT question.
When I gave you an alternate explanation for the premise, the conclusion was weakened.
This argument structure, in which the conclusion is one possible explanation for the premise, is fairly common in life and on the GMAT.
------------------------------------------------------------------------------------------------------------------------------------------------
Now, coming back to our correlation causation thing.
The conclusion “X caused Y” is one possible explanation for “X and Y have happened together.”
What could be other explanations for “X and Y have happened together”?
1. Y caused X - we call it reverse causality2. Z caused X and Y - we call it the Third factor3. Pure co-incidence (As we discussed above, the correlation could be a result of randomness - pure coincidence.)
Presenting or suggesting an alternate explanation in a correlation-causation argument will weaken the argument. Three popular ways of weakening a correlation-causation argument are:
1. Suggesting Reverse causality - Y caused X2. Suggesting Third Factor - Z caused X and Y3. Suggesting Pure co-incidence
A few other common ways to weaken a correlation-causation argument:
4. Alternate cause for Y. (In this case, the correlation between X and Y is understood to be pure co-incidence)5. Even when X didn’t happen, Y happened - (This leads to the thought, “If Y could happen without X’s happening, then X is perhaps not the cause of Y.”)6. Even when X happened, Y didn’t happen - (This lead to the thought “If Y didn’t happen even in the presence of X, then X is perhaps not the cause of Y. Something else is the cause of Y. That something was perhaps not there. That’s why Y didn’t happen.”)7. Logically, X could not have led to Y - In this case, we logically argue that X could not have been the cause of Y.
For every way to weaken, there is a way to strengthen. Here are a few common ways to strengthen a correlation-causation argument:
1. Eliminating reverse causality - Y didn’t cause X2. Eliminating third factor - There’s no Z that caused both X and Y3. Eliminating pure co-incidence - The correlation between X and Y is not a result of pure co-incidence4. Eliminating an alternate cause - There is no other cause of Y.5. When X didn’t happen, Y didn’t happen - The absence of X was accompanied by the absence of Y in the past.6. When X happened, Y also happened - The presence of X was accompanied by the presence of Y in the past.7. Logically, X could have led to Y - In this case, we logically argue how X could be the cause of Y.
Application on Official Questions
Question 1Please attempt the question before reading any further.
In this WEAKEN question, the correct option follows Weaken Structure 5, i.e., when X didn’t happen, Y happened. (No defrosting on the back window; still the same speed of ice melting)
Question 2Please attempt the question before reading any further.
In this question in which we are supposed to weaken the causality between marriage and long life, the correct option follows Weaken Structure 5, i.e., when X didn’t happen, Y happened. (When the marriage didn’t happen, people still lived as long)
Question 3Please attempt the question before reading any further.
In this Strengthen question, the correct option follows Strengthen Structure 5, i.e., when X didn’t happen, Y didn’t happen. (When antibodies didn’t form, keratitis didn’t happen)
Question 4Please attempt the question before reading any further.
In this Strengthen question, the correct option follows Strengthen Structure 5, i.e., when X didn’t happen, Y didn’t happen. (When antibodies were different because of different surface proteins of the virus, keratitis didn’t happen)
Question 5Please attempt the question before reading any further.
In this WEAKEN question, the correct option follows Weaken Structure 5, i.e., when X didn’t happen, Y happened. (No alteration in the intensity of light, yet biological functions follow a 24-hour rhythm)
Question 6Please attempt the question before reading any further.
In this Strengthen question in which we’re supposed to strengthen the causality “Presence of diallyl sulfide → No mosquitoes”, the correct option follows Strengthen Structure 5, i.e., when X didn’t happen, Y didn’t happen. (Insects that were not repelled by diallyl sulfide, these insects were there in the flooded waters.)
Question 7Please attempt the question before reading any further.
In this WEAKEN question, the correct option follows a twisted form of Weaken Structure 2, i.e., Z caused X and Y. (Medical conditions and treatments → Absence of mental sharpness and absence of social contact)
Question 8Please attempt the question before reading any further.
This is a WEAKEN EXCEPT question. Thus, we have 4 weakeners in the options.
Options A, B, and C follow Weaken Structure 4. They provide alternate causes for more severe accidents in the US.
Option E follows Weaken Structure 5, i.e., even when X didn’t happen (no difference in seat belt now), Y happened (same severity of accidents as earlier).
Question 9Please attempt the question before reading any further.
In this WEAKEN question, the correct option follows Weaken Structure 6, i.e., when X happened, Y didn’t happen.
Question 10Please attempt the question before reading any further.
In this WEAKEN question, the correct option follows Weaken Structure 4, i.e., Alternate cause for Y. (The correct options provides another reason why stores are stocking products produced in concentrated form.)
Question 11Please attempt the question before reading any further.
In this WEAKEN question, the correct option follows Weaken Structure 1, i.e., Y caused X. (The correct option indicates that unhappy marriages → mismatched sleeping and waking cycles.)
Question 12Please attempt the question before reading any further.
In this WEAKEN question, the correct option follows Weaken Structure 2, i.e., Z caused both X and Y. (The correct option indicates that the presence of heart disease → medicines for heart disease→ both weakened immune system and deaths)
------------------------------------------------------------------------------------------------------------------------------------------------
Here are a few common ways in which a causal conclusion is phrased:
1.X caused Y, OR Y was caused by X Examples:One summer, floods covered low-lying garlic fields situated in a region with a large mosquito population. Since mosquitoes lay their eggs in standing water, flooded fields would normally attract mosquitoes, yet no mosquitoes were found in the fields. Diallyl sulfide, a major component of garlic, is known to repel several species of insects, including mosquitoes, so it is likely that diallyl sulfide from the garlic repelled the mosquitoes.In response to viral infection, the immune systems of mice typically produce antibodies that destroy the virus by binding to proteins on its surface. Mice infected with the herpesvirus generally develop keratitis, a degenerative disease affecting part of the eye. Since proteins on the surface of cells in this part of the eye closely resemble those on the herpesvirus surface, scientists hypothesize that these cases of keratitis are caused by antibodies to the herpesvirus.
2.X explains Y, OR Y can be explained by X Example:Many consumers are concerned about the ecological effects of wasteful packaging. This concern probably explains why stores have been quick to stock new cleaning products that have been produced in a concentrated form. The concentrated form is packaged in smaller containers that use less plastic and require less transportation space.
3.Y happened because X happenedExample:The ice on the front windshield of the car had formed when moisture condensed during the night. The ice melted quickly after the car was warmed up the next morning because the defrosting vent, which blows on the front windshield, was turned on full force.
------------------------------------------------------------------------------------------------------------------------------------------------
There are also many arguments in which causality is not a part of the conclusion but a part of the assumption. In other words, there are many arguments in which causality is assumed. These arguments are also impacted by many of the same ways in which a correlation-causation argument is impacted.
Here are two common argument structures in which causality is assumed:
1. X and Y have happened. Therefore, X causes/leads to Y.As we have discussed above, “X causes Y” is not a causal statement; it presents a sufficient condition. Every causality is in the past tense.In this argument structure, the author jumps from the correlation between X and Y to saying that X causes Y. What is his assumption?He assumes that X caused Y in the past. Think about it: If we say that X didn’t cause Y, then we’ll not be able to say 'X causes Y' on the basis of the correlation.Here’s what’s going on:The author looks at the correlation between X and Y. Looking at the correlation, he assumes that X must have caused Y. Based on this assumption, he concludes that X causes Y (in general).Here’s an official question that follows this structure:It is widely assumed that people need to engage in intellectual activities such as solving crossword puzzles or mathematics problems in order to maintain mental sharpness as they age. In fact, however, simply talking to other people—that is, participating in social interaction, which engages many mental and perceptual skills—suffices. Evidence to this effect comes from a study showing that the more social contact people report, the better their mental skills.Here, the second statement, “simply taking to other people suffices to maintain mental sharpness as people age” is based on the correlation presented in the third statement. The author assumes that the correlation presented in the third statement results from causation “more social contact led to better mental skills.” Thus, the causation is a part of the assumption here.Here’s another official question:When limitations were in effect on nuclear-arms testing, people tended to save more of their money, but when nuclear arms testing increased people tended to spend more of their money. The perceived threat of nuclear catastrophe, therefore, decreases the willingness of people to postpone consumption for the sake of saving money.Here, the second statement, “X leads to Y” is based on the correlation presented in the first statement. The author looks at a correlation between nuclear-arms testing and how much money people are spending/saving. The author then makes a general statement that the perceived threat leads to people’s spending more money. The author assumes that the correlation presented in the first statement is the result of a causal relationship “more nuclear arms testing led to more spending by people.” This causal relationship is assumed. Of course, this causal relationship need not be true. The correlation could hold for other reasons.
2.Y happened. Therefore, X must have happened.This structure is of the form: Effect. Therefore, Cause.The author sees that an effect has happened. The author concludes that the cause must have happened.The obvious gap in the argument is that the effect could have been caused by some cause other than the one presented in the conclusion.We looked at an example of this argument structure above:Raj couldn’t even solve an easy GMAT question. Therefore, Raj is not intelligent.Here, the conclusion is a possible cause of the premise. The author looks at the situation presented in the premise. The author then concludes a particular cause of the premise. This argument can be weakened by indicating another cause of the premise. We looked at one possible weakener - Raj is a Sanskrit scholar and is not fluent in the English language.Here are a couple of official questions following the same structure:Argument 1:In virtually any industry, technological improvements increase labor productivity, which is the output of goods and services per person-hour worked. In Parland's industries, labor productivity is significantly higher than it is in Vergia's industries. Clearly, therefore, Parland's industries must, on the whole, be further advanced technologically than Vergia's are.Here,Conclusion: Parland's industries must, on the whole, be further advanced technologically than Vergia's are.Premise: In virtually any industry, technological improvements increase labor productivity, which is the output of goods and services per person-hour worked. In Parland's industries, labor productivity is significantly higher than it is in Vergia's industries.The conclusion is a possible cause of the premise “In Parland's industries, labor productivity is significantly higher than it is in Vergia's industries.”This argument follows the structure: Effect. Therefore, Cause.The argument can be weakened by presenting an alternate cause of the effect.Here’s the link to this question.Argument 2:In the past most airline companies minimized aircraft weight to minimize fuel costs. The safest airline seats were heavy, and airlines equipped their planes with few of these seats. This year the seat that has sold best to airlines has been the safest one - a clear indication that airlines are assigning a higher priority to safe seating than to minimizing fuel costs.Here,Conclusion: airlines are assigning a higher priority to safe seating than to minimizing fuel costs.Premises: In the past, most airline companies minimized aircraft weight to minimize fuel costs. The safest airline seats were heavy, and airlines equipped their planes with few of these seats. This year the seat that has sold best to airlines has been the safest one.The conclusion is a possible cause of the premise “This year the seat that has sold best to airlines has been the safest one.”The argument can be weakened by presenting an alternate cause of the effect. The correct option in the question does EXACTLY that.Here’s the link to this question.
------------------------------------------------------------------------------------------------------------------------------------------------
Here’s what we learnt from this article:
1. Correlation between X and Y means that X and Y have happened together.2. Correlation is always about past events.3. Correlation between X and Y DOES NOT MEAN that there is a causal relation between X and Y.4. Causation means X led to Y, or X was the reason Y happened.5. Causation is talked about ONLY for past events.6. “X causes Y,” “X leads to Y,” and “X will cause Y” are NOT causal statements. Each statement presents a sufficient condition - X is sufficient to get to Y.7. “X caused Y” can be weakened by indicating that “Z caused Y.”8. “X causes Y” CANNOT be weakened by indicating that “Z causes Y.”9. “X partly caused Y” cannot be weakened by indicating that “Z partly caused Y.”10. A very common logical jump in life and on the GMAT is a jump from CORRELATION between X and Y TO CAUSATION (X caused Y)11. “X caused Y” is a possible explanation for the correlation between X and Y.12. Any other explanation for the correlation between X and Y weakens the correlation causation argument.13. We learned seven common ways to strengthen and weaken a correlation-causation argument. Do you remember them? Do you understand WHY each one creates the impact it creates?14. We looked at 12 official questions built around correlation causation.15. We looked at three ways a causal conclusion is phrased. Do you remember them?16. We looked at two ways causality is assumed in the argument. Do you remember them?17. One of the two ways causality is assumed in the argument is: Effect. Therefore, Cause.
Appeal - Help me make this article betterYou can make this article better by:
1. Sharing more official or unofficial questions around the concept of correlation-causation2. telling me which parts of the article are not clear or need more elaboration3. pointing out spelling, grammatical, or logical mistakes in the article
If you have any questions or doubts regarding anything covered in the article, please feel free to ask. I’ll be happy to help.
I’ll be happy to hear from you in the comments. Please be aware that I also LOVE reading appreciative comments

In the next part of this article, we’ll see how this concept of correlation causation plays out in our lives. I’ll offer a few more official questions built around causality.
Edit 1: I've posted Part-2 of this article. Here's
the link to the article.