If you’ve ever taken a statistics class and maybe even if you haven’t, you’ve likely heard the phrase “correlation is not causation” followed by examples where two nonsensical variables are correlated. If you want to see some humorous examples of nonsensical correlations, check out the fantastic Spurious Correlations website built by Tyler Vigen. If you haven’t seen the website lately, you’ll want to check out his newer section of AI-generated papers based on the spurious correlations.
So if correlation is not causation, then how do you get to causation? In the maturation of people analytics to move beyond just operational reporting, data pulls and dashboards, there would be a natural progression to move beyond correlation to establish predictive models based on causal relationships.
Take for example employee turnover as a common people analytics metric. Perhaps turnover is higher for certain groups of people in your organization. In other words, you’ve identified a correlation between that group and turnover. However, that does not mean we have causation that allows us to answer the natural question “How can we reduce turnover?” With that question, you endeavor to identify some causes of increased turnover so that you can take action to eliminate or counteract those causes.
Before getting into people analytics, I spent years working in manufacturing and new product development. In those environments, the search for causation involves adjusting machine settings and design attributes to determine the optimal manufacturing processes and product designs. The search was often successful with the use of powerful experimental design methods. However, this search for causation can be significantly more challenging when working in people analytics. Why?
Let’s start with the first big reason then I’ll share three more common reasons. I’ll turnover as an example to illustrate those reasons. After talking through those reasons, I’ll share some ideas of how to get past these challenges.
Biggest Reason 1) Non-Actionable Variables
First, many of the variables I see related to people data are not actionable. Consider the typical attributes of an employee that are collected and stored in an HR system. That would include the person’s job title, division, business unit, tenure, age, gender, ethnicity, work location, and more. These attributes are not actionable. In other words, you can’t just change the setting on a person like you would change the setting on a piece of manufacturing equipment. We’re not in a position to push a button and change someone’s tenure, age, gender, ethnicity, and other similar demographic variables.
Likewise, variables like work location and job types are not actionable. I don’t think anyone would be willing to try to just move everyone from the low-turnover location to the high-turnover location and expect improvement in the turnover. Have you seen some of the reactions to those forced return-to-office mandates? If anything, those mandates might have the opposite effect. Likewise, if you found that all the engineers had significantly lower turnover rates you wouldn’t change everyone’s job code to be an engineer.
If you are analyzing data to build a model using these non-actionable variables, you will not get any specific actions you can do based on the results.
2) Confounding Variables
A second reason is that your variables themselves are correlated with each other. The statistical term is confounding. As an illustrative example, suppose you find that the people who got promoted in the last year are also the same people who are the most highly compensated. If you find that this promoted, highly compensated group has lower turnover rates, is it the promotion that is the cause or is it the increased compensation? Perhaps it is a combination of both?
It becomes challenging to determine which matters most without doing something to reduce the confounding of the variables you are studying.
Experimental data is the primary way to reduce confounding variables. However, that is challenging in its own right because of the difficulty in doing experiments related to people/HR problems.
3) Lack of Data
Third, you don’t have the data on the actionable variables. Perhaps you’ve dug deeper to move beyond those non-actionable demographic variables. You have some hypotheses that you want to explore that are actionable. But then you realize you don't have the data you need.
For example, suppose you believe that having a good leader makes a difference in whether or not an employee stays. But you don’t have a way to identify good leaders. Maybe you do surveys of direct reports about their leaders, but you find that the direct reports are not as inclined to be fully transparent about what is happening. Or maybe the ones you’d like to talk to about their leaders are already gone before you can reach them.
In addition, some actionable data is creepy to collect. You would run into privacy or ethical concerns if you were to analyze employee emails and chats with text analytics to see what they are saying about their leader. Perhaps you don’t have data collection systems in place to know whether or not employees have regular meetings with their leader. There are many reasons why you don't have the data you need.
4) Noisy Data
Finally, there is simply too much that is unexplainable. Human beings have a beautiful set of diverse backgrounds and experiences. You can have 2 people with very different experiences at work even though they are of the same age and gender, who work in the same type of job for the same pay and benefits, and work on the same team with the same leader. One may be much more likely to leave the organization because of a different set of career goals or life changes such as the need to care for aging parents or other personal reasons.
No matter how much organizations try to create an ideal environment for their employees, some turnover happens because of factors beyond the employer’s control. The quest for causation is hampered by this noise in the data.
Implications
So what can we do to move beyond correlation? What do you do if you see correlations in these unactionable variables? That’s where you dig a little deeper into those actionable variables. There may be underlying reasons that explain the differences in the demographic variables. So instead of looking at differences by job families, look at the time to promote to a level within a job family. Instead of looking at the work location, explore the different types of work arrangements and options for flexibility in those arrangements. Instead of looking at the business units, explore the engagement levels and employee surveys for those business units. Instead of looking at gender and ethnicity differences, explore whether or not you have gender pay equity gaps or different utilization of leadership development programs for different ethnicities.
I’ve recapped these examples in the table below.
Push for the additional data sources that allow you to de-confound variables and fill in the gaps where you don’t have data. Don’t settle for the easiest data to get. You get what you are willing to pay for. Make the business case for why the additional actionable data values will help you move to another level in your people analytics journey.
If you are struggling trying to find those causal effects, you are in good company. It is hard to do the experiments that allow you to determine causality. Recognize that it is what science is all about. Finding explanations for the unexplainable. That is both the challenge and the opportunity. Just recognizing that you only have correlations is a start. The correlations can point to potential underlying variables that are most likely matter. To move beyond those superficial correlations to find those true causes is a quest worth undertaking.
Thank you, Willis. Often, I encounter dashboards that focus on "non-actionable" variables, which can obscure the important, actionable insights we need. Additionally, gathering data on non-actionable variables is easier than establishing relationships and delving deeper into actionable variables.
I wonder about the actionable or not distinctions. Age/Gender/Demographics...if Women are leaving at a significantly higher rate, perhaps a review of workplace culture could be in order. EG is there a source of some sort of sexism that is making women more likely to leave? As an example