Ashraf Miah
Jul 28, 2021

--

Your approach to dealing with the unbalanced dataset is pretty good but there is a subtelty which may be missed.

It may be worth emphasising that the process you took to create a new dataset had a 50:50 split between resignations and not rather than the original 6% and therefore you didn't need to take a stratified splitting of the data.

--

--

Ashraf Miah

Data Scientist and Chartered Aeronautical Engineer (MEng CEng EUR ING MRAeS) with over 15 years experience in the Aerospace, Defence and Rail Industry.