Employee Attrition Modeling - Part 4

The live interactive app can be found here: 
https://sflscientific.shinyapps.io/employee_attrition_app/ 

Please contact info@sflscientific.com if you have trouble viewing this page  

 

Part 4:

Construction of the R Shiny App

R Shiny is a powerful yet intuitive tool for creating interactive web applications in R. The application itself has two main components: the user interface and the server. The user interface controls the appearance and layout of the app. The server is responsible for performing the calculations and contains the instructions for building the app.

The following is a code snippet used to set up a section of the user interface. In this case, we created a simple sidebar to allow the user to select from certain sub-groups of employees. 

shinyUI(fluidPage(
    theme = shinytheme("cerulean"),
    column(width=12,
    a(img(src="SFL-LOGO.png",align="right",width="300px", padding="50px 50px 50px 50px"),href="https://www.sflscientific.com"),
    titlePanel("Employee Attrition")),
    sidebarLayout(
        sidebarPanel(
            helpText("We recommend 3 or fewer options at a time."),
            width = 2,
            selectInput("Age", label = "Select an age range", choices = c("< 35", "35-45", "> 45", "all"), selected = "all"),
            selectInput("Gender", label = "Select a gender", choices = c("Female", "Male", "all"), 
                        selected = "all"), selectInput("Education", label = "Select an education level",
                        choices = c("1", "2", "3", "4", "all"), selected = "all"),
            selectInput("MonthlyIncome", label = "Select a monthly income range", 
                        choices = c("< 2500", "2500-5000", "5001-7500", "7501-10000", "> 10001", "all"), selected = "all"),
            selectInput("MaritalStatus", label = "Select a marital status", choices = c("Single", "Married", "Divorced", "all"),
                        selected = "all")
        )

On the server side, the options from the side bar are stored in variables which can then be applied to the model. The two work as a cohesive unit, with the user interface receiving the changes from the user and passing them along to the server to execute.

When dealing with employee attrition in particular, it is useful to look at specific employees and thus we added several options to allow the user to subset out such groupings.

We have already walked through how the learning process work with the three machine learning algorithms. In this part we will look at these three models to compare their performance on the dataset in Shiny app, by using various curves.

The first tab describes the problem we want to solve, discussed in Part 1. After users choose the subset of data they want to look at, the second tab shows the correlation matrix between all the variables in the dataset.

tabPanel("Data Exploration", width = 12,
                    column(width = 8, class = "well", h4("Correlation Matrix"),
                        plotOutput("corr_plot", click = "corr_plot_click"),
                        div(style = "height:110px;background-color: white;"),
                        style = "background-color:white;"
                    ),
                    
                    column(width = 4, class = "well", h4("Correlation plot of chosen variables:"), 
                        plotOutput("corr_click_info", height="280px"),
                        style = "background-color:white;"
                    )
                )

For the next three tabs, the shiny app gives visualization results comparing three different machine learning models: SVM, XGBoost, Logistic Regression.

We include ROC curve, precision curve and recall curve in the visualization results:

tabPanel("Training (ROC)", 
                    column(width = 8, class = "well", h4("ROC Curve"),
                        plotOutput("plot"),
                        style = "background-color:white;",
                        sliderInput("thresh", label = "", min = 0, max = 1, value = c(0.5)),
                    ),
                    column(width = 4, class = "well", 
                        tabsetPanel(
                            tabPanel("XGBoost", h4("Confusion Matrix (XGBoost)"), plotOutput("confusionMatrix"), style = "background-color:white;"),
                            tabPanel("SVM", h4("Confusion Matrix (SVM)"), plotOutput("confusionMatrix_svm"), style = "background-color:white;"),
                            tabPanel("Logistic Regression", h4("Confusion Matrix (Logistic Regression)"), 
                            plotOutput("confusionMatrix_lr"), style = "background-color:white;")
                        ), style = "background-color:white;"
                    )
                ),
                tabPanel("Training (Precision)", 
                    column(width = 8, class = "well", h4("Precision vs Cutoff Curve"),
                        plotOutput("plot_precision"),
                        style = "background-color:white;",
                        sliderInput("thresh_precision", label = "", min = 0, max = 1, value = c(0.5)),
                    ),
                    column(width = 4, class = "well", 
                        tabsetPanel(
                            tabPanel("XGBoost", h4("Confusion Matrix (XGBoost)"), plotOutput("confusionMatrix_precision"), style = "background-color:white;"),
                            tabPanel("SVM", h4("Confusion Matrix (SVM)"), plotOutput("confusionMatrix_svm_precision"), style = "background-color:white;"),
                            tabPanel("Logistic Regression", h4("Confusion Matrix (Logistic Regression)"), 
                            plotOutput("confusionMatrix_lr_precision"), style = "background-color:white;")
                        ), style = "background-color:white;"
                    )
                ),
                tabPanel("Training (Recall)", 
                    column(width = 8, class = "well", h4("Recall vs Cutoff Curve"),
                        plotOutput("plot_recall"),
                        style = "background-color:white;",
                        sliderInput("thresh_recall", label = "", min = 0, max = 1, value = c(0.5))
                    ),
                    column(width = 4, class = "well", 
                        tabsetPanel(
                            tabPanel("XGBoost", h4("Confusion Matrix (XGBoost)"), plotOutput("confusionMatrix_recall"), style = "background-color:white;"),
                            tabPanel("SVM", h4("Confusion Matrix (SVM)"), plotOutput("confusionMatrix_svm_recall"), style = "background-color:white;"),
                            tabPanel("Logistic Regression", h4("Confusion Matrix (Logistic Regression)"), 
                            plotOutput("confusionMatrix_lr_recall"), style = "background-color:white;")
                        ), style = "background-color:white;"
                    )
                
                )

A Little Guidance

We limited the subsetting options to five factors: age, gender, education level, monthly income and marital status in the app; this allows the user to see the details for a subset of the dataset. We suggest the user to use 3 or fewer variables due to the size of the dataset.

After choosing features, we can go to the ROC, precision and recall tab to change threshold for attrition; for example, with an aggressive HR department we can set the threshold, above which requires remedial action, to a lower value. This will give more false positives, but also ensure that we catch the majority of true attrites. 

After adjusting those parameter, users can go to prediction tab by clicking XGBoost.  From here, users can not only understand the distribution of prediction probability but also understand the most important features which deciding employees’ attrition. In a similar manner, you can also check the SVM and Logistic Regression distributions.

Finally, a 3D distribution can also be shown by clicking on the desired variable. Users can go to ‘Explanation of Results’ to have a deeper understanding about the result. 

 

For more details on this or any potential analyses, please visit us at http://sflscientific.com or contact mluk@sflscientific.com.

--

Contributors: Michael Luk, Zijian Han, Jinru Xue, Han Lin