Author(s): Fatbardha Maloku* and Besnik Maloku
Employee turnover is often perceived as detrimental to organizational efficiency, but is this always the case? This research explores the multifaceted factors influencing employee attrition and examines whether lower turnover invariably leads to greater efficiency. Utilizing the "IBM HR Analytics Employee Attrition and Performance" dataset, which includes variables such as employee age, department, education level, job satisfaction, gender, job role, marital status, and overtime hours, we conduct a comprehensive descriptive analysis to predict employee retention. By understanding these factors, HR and management can make informed decisions to mitigate attrition. This study aims to identify novel strategies to reduce employee turnover, providing actionable insights and predictions to help IBM retain talent and maintain productivity and success.
In general, it is considered that lower organizational turnover leads to greater efficiency, but is this always the case? Are there any other elements or reasons that have an impact on employee attrition, either directly or indirectly? In this study paper, the answers to these open questions will be revealed. The purpose of this research is to look into the factors that influence employee turnover. We'll then use the descriptive analysis' data to determine whether or not an employee will stay with the company. Employee attrition is the biggest difficulty for any company, yet HR or management may make prompt decisions to keep personnel. The same issue is happening within the IBM company also. The main attributes of the entire process are included within the “IBM HR Analytics Employee Attrition and Performance” dataset. This set contains variables such as the age of employees, the department they belonga to, the education level, job satisfaction level, gender, job role, marital status, and the overtime worked hours. In this study, we'll investigate new approaches to decrease the attrition rate as well as give suggestion and predictions on how IBM can continue being productive and successful by retaining their talent.
High employee turnover is often considered detrimental to organizational efficiency, but there is ambiguity regarding whether reducing turnover universally enhances organizational effectiveness. This research aims to investigate the complex factors influencing employee attrition and assess whether lower turnover consistently correlates with improved efficiency.
Employee turnover is a pervasive concern in organizational management, impacting productivity and stability. Traditional views suggest that minimizing turnover enhances operational efficiency by stabilizing workforce continuity and reducing recruitment costs. However, recent studies indicate nuances in this relationship, questioning whether low turnover always equates to optimal organizational performance. This study utilizes the "IBM HR Analytics Employee Attrition and Performance" dataset, encompassing variables such as employee demographics, job characteristics, and work-related factors. Through comprehensive descriptive analysis and predictive modeling, the research seeks to uncover underlying patterns and predictive indicators of employee retention. By gaining insights into these factors, HR professionals and management can implement targeted strategies to mitigate attrition, thereby fostering a stable and productive workforce environment at IBM.
The aim of this project is to analyze the factors influencing employee attrition using the "IBM HR Analytics Employee Attrition and Performance" dataset. By conducting a thorough examination of variables such as employee demographics, job characteristics, and work-related factors, the study seeks to identify key determinants of turnover. Ultimately, the project aims to provide actionable insights and predictive models that can assist IBM in implementing effective strategies to reduce employee turnover, enhance retention rates, and sustain organizational productivity and success.
The dataset used in this analysis was sourced from Kaggle.com and comprises a total of 35 columns. For this study, our focus is on key variables believed to influence employee attrition rates. The following essential variables are considered:
In addition to these variables, the dataset includes several others that will be explored to uncover factors strongly associated with employee attrition. This research aims to identify significant predictors and further investigate their relationships to inform strategies for reducing attrition rates and enhancing organizational retention practices.
During my analysis of each variable in the dataset, I focused on addressing the fundamental question: Is there a strategy to reduce employee attrition rates? In essence, I sought to uncover trends that may contribute to both employee attrition and performance. Any patterns identified are likely indicative of broader trends affecting organizations where employees choose to leave, rather than isolated factors specific to individual cases. It is crucial to acknowledge that historical data, while informative, does not guarantee future outcomes. These patterns may have evolved over time, but their persistence in the future is uncertain. Based on these considerations, the research proceeded with the following methodologies, detailed below. Initially, a descriptive analysis model was applied to Kaggle.com's historical dataset. This stage involved a thorough examination of each variable to identify key factors influencing employee attrition rates, providing insights for organizations aiming to retain their top talent over extended periods. Subsequently, a predictive analytics model was implemented to assess how findings from the descriptive study could forecast future employee attrition and performance rates within a company.
The initial phase of our analysis will involve conducting a descriptive analysis of the dataset. This includes checking for missing data, assessing the current data types of each variable, and generating summary statistics to gain a comprehensive understanding of the dataset. Visualizations will be utilized to better understand the distribution of each variable. Subsequently, we will implement predictive modeling to identify highly correlated factors that predict employee attrition rates within the company. Additionally, ANOVA tests will be conducted across multiple models to determine which model yields optimal results aligned with our research objectives.
The distribution of employee’s age is shown below.
Figure 1: Distribution of Employees Age
The average age of IBM employees in this graph goes from thirty to fifty years old. We have a few individuals in their twenties as well as a few other individuals over the age of sixty.
The distribution of the number of employees per business travel sector is shown below. The bulk of employees, as seen on the graph below belong to “Travel Rarely” group. We have some individuals that belong to the “Travel Frequently” group. Lastly, we also have a group of individuals who do not travel as much as the previous two groups, and they belong the “Non-Traveler” group.
Figure 2: Distribution of the Number of Employees per Business Travel sector
The distribution of employees per department is shown below.
Figure 3: Distribution of the Number of Employees per Department
We examined the impact of the Gender variable on attrition. The graph below illustrates the distribution of attrition by gender and employee count. From the graphic, it is evident that females with lower scores are more likely to leave their jobs. In contrast, males tend to remain in their jobs with slightly higher scores.
Figure 4: Distribution of Attrition per Gender and Employee Count
The graph below depicts the distribution of attrition rates across different employee job roles and the total count of employees in the organization. It illustrates that the "Yes" values for attrition are lower among research directors compared to the human resources role. This visualization indicates that research directors tend to exhibit lower turnover rates compared to other job titles within the organization.
Figure 5: Distribution of the Attrition variable per Gender and Employee Count
Next, we'll look at the distribution of older vs. younger generations within organizations and compare the two groups' ages to see how that distribution appears.
Figure 6: Box Plot Distribution of the Attrition Rate Over Age
People of younger ages are more willing to leave a job than those of older ages, as shown in the graph above. After that, we'll make a single-variable plot of the Age variable. The graph looks like this after graphing the values:
Figure 7: Univariate Plot of Age
Next, we have the Bivariate Plot of the performance rating variables and the Years in the Current Years of the employees. From the visualization we can see that the distribution of the variables is proportional.
Figure 8: Bivariate Plot of Performance Rating Score and the Number of Years in the Current Role
We have generated a multivariate plot using the attrition and performance rating dataset. The graph below illustrates an example of such a plot featuring numerical variables. This includes Age, Employee Number, Environment Satisfaction, Monthly Income, Performance Rating, Work Life Balance, Years at Company, Years in Current Role, Years Since Last Promotion, Years with Current Manager, and Stock Option Level variables.
Figure 9: Multivariate Plot of the Attrition and Performance Score
The predictive analysis will be carried out utilizing the two criteria listed below:
The correlation and regression model has been used to test the relationship between variables such as Age, Employee Number, Environment Satisfaction, Monthly Income, Performance Rating, Work Life Balance, Years at Company, Years in Current Role, Years Since Last Promotion, Years with Current Manager, and Stock Option Level etc. By exploring these potential explanatory variables, we will know what variables help the attrition rate in an organization.
Below we have created a heatmap about the correlation between the attrition values and the other explanatory variables such as (Age, Total Number of Employees, Environment Satisfaction, Monthly Income, Work Life Balance, Years at the Company, Years Since Last Promotion, Stock Level options)
Figure 10: Heatmap Between the Explanatory Variables
As demonstrated in the heatmap above, the attrition rate is strongly connected with the variables like years with current manager, years since last promotion, years in current role and a total number of years at company variables. We also see that the attrition rate is less so connected with the age variable and the employee satisfaction variable. We may also observe that the variables with lighter colors in the heatmap are less highly associated with one another than those with deeper.
We produced a correlation chart with numbers to determine the real values of the correlation between the explanatory components, as shown below. The correlation coefficient between different variables varies a lot, as shown in the graph below. The Years With Current Manager variable has a correlation coefficient of 0.77, and the Years In Current Role variable has a correlation coefficient of 0.76. Following those two factors is Years Since Last Promotion, which has a 0.62 association coefficient. Stock Option Level and Work Life Balance variables are not substantially connected, as shown in the graph below.
Figure 11: Correlation Numeric Values Between Explanatory Variables
The Regression Model was chosen for the following reasons:
Figure 12: Model 1: Attrition Rate ~ Age
Figure 13: Model 2: Attrition ~ YearsAtCompany
Figure 14: Model 3: Attrition ~ Job Satisfaction
Figure 14: Model 4: Attrition ~ YearsWithCurrManager
Figure 14: Model 5: Attrition ~ YearsInCurrentRole + YearsWithCurrManager + YearsSinceLastPromotion + YearsAtCompany
Figure 15: Regression Model of Residuals vs Fitted
Figure 16: Regression Model of Theoretical Quantiles
Figure 17: Regression Model of Scale-Location
Figure 18: Regression Model of Residuals vs Leverage
Figure 19: ANOVA for Model 1 & Model 2
Figure 20: ANOVA for Model 3 & Model 4
The regression models that were ran on the dataset yielded the following findings. The model results in each of these five models will be summarized, and we'll observe how these discoveries affect IBM's total attrition rate.
After we run the regression model of user rating score against the attrition rate against age, we got an R-squared value of 0.024. From this model we understand that the model is correct for approximately 2%.
We then run the regression model of attrition against the YearsAtCompany variable, and we got an R-squared value of 0.017. From this model we understand that the R-squared value is approximately 1% correct for our use case.
Lastly, we run the regression model of Attrition against the JobSatisfaction variable, we got a p-value of 0.0005 and an R-squared value of 0.009. From this model we understand that the p-value is within our limit, however, is very small. The same thing happens with our R-squared value also, which indicates that our model is 0.9% right. This indicates that the model needs more work, therefore other models have been considered.
Next, we run the regression model of attrition against the YearsWithCurrManager variable, where we got a very low p-value and an R-squared value of 0.023 From this model we understand that the R-squared value indicates that our model is ~2%.
The last model, we run is the regression model of Attrition against the YearsInCurrentRole, YearsWithCurrManager, YearsSinceLastPromotion and YearstAtCompany variable. Since all these values where highly correlated, we decided to investigate the regression model and see what we get for the p-value and the R-squared value. In this model, we got a very low p-value and an R-squared value of 0.033. From this model we understand that the R-squared value indicates that our model is approximately 3% correct understand that the R-squared value indicates that our model is approximately 3% correct.
In the previous phase of the analysis, we created a correlation and regression model. YearsWithCurrManager is the most highly associated value, according to the correlation model. We also discovered that the YearsWithCurrManager variable, as well as YearsSinceLastPromotion,YearsInCurrentRole,YearsAtCompay, are all closely linked. Then I ran the regression model and found that the combined variables
In summary, we utilized strategic tools to assess the current market landscape and potential factors influencing future employee attrition rates using the dataset sourced from Kaggle. com. Through our analysis, several key insights into the determinants of attrition have emerged. Employee turnover poses a significant challenge for organizations, but proactive decisions by HR and management can mitigate this issue. This holds true for IBM as well. Our dataset analysis allowed us to delve into explanatory variables and their impact on attrition rates. Notably, YearsWithCurrManager emerged as the most influential variable affecting attrition, based on our investigation. Additionally, YearsInCurrentRole exhibited the second highest R-squared value at 0.02, followed by YearsSinceLastPromotion at 0.01. When combining YearsInCurrentRole, YearsWithCurrManager, YearsSinceLastPromotion, and YearsAtCompany, the cumulative R-squared of 0.03 underscores the multifaceted nature of factors influencing an employee's decision to remain with or depart from the company [1].