An Analance business case
When was the last time operations came to a halt due to an unplanned server downtime? Do you remember the moments when you had to work extra hours to make up for productivity loss? Well, those are the times we point fingers at the Infrastructure team for the unexpected server failures or for not being more prepared.
Organizations tried to solve this issue by moving from a corrective maintenance approach to a preventative maintenance approach. This is when one goes from reacting to a problem to being more proactive by replacing components before its full lifetime. One problem with that is poor asset utilization.
This is where predictive maintenance analytics come in to help organizations achieve balance, high asset utilization, and savings in operational costs, plus experience a boost in productivity with just-in-time component replacements.
How does Predictive Maintenance work?
Customer servers: A Predictive Maintenance case study
Now the question is: how do you implement predictive analytics in this business operation? To predict probable downtimes for a customer’s servers, we used Analance Advanced Analytics for data modeling and Analance Business Intelligence for reporting, dashboarding, and alerts.
We first monitored the customer’s server utilization and laid out a solution plan by forecasting key system metrics like CPU, RAM, and memory utilization. We then combined the values of these metrics to perform a multiclass classification analysis to predict possible server downtimes.
We also defined server utilization thresholds. If usage exceeded the set threshold, auto alerts would be sent to the required stakeholders for corrective action.
We had data from an ELK stream, which had indexes that were dedicated for capturing the system’s key metric values in real time. We sourced this data through Analance’s Elasticsearch connector, which allows for live streaming data to be made available inside the platform.
- Data source – Metricbeat, ELK stream
- Connector – Analance Elasticsearch Live
The data had the real-time values of all the major system metrics like CPU, Memory, RAM, inbound and outbound traffic, number of processes, etc.
Running the forecast
Once the data was prepared, the next step was to forecast the key metrics chosen for the prediction. There were two options to run this forecasting in Analance: use any of the 41 prebuilt machine learning algorithms or add a custom script through Jupyter which could be integrated in Analance.
Since we had to perform a complex multivariate forecasting, not a part of the predefined set, we decided to write the scripts in Python.
We used the statsmodels.tsa.vector_ar.var_model for a multivariate forecasting analysis using Jupyter notebooks, which was the most convenient IDE to use the required modules. Once the script was written and verified, we checked the performance of the code to meet our requirements. The accuracy of the forecast was good and accepted.
The scripts were then imported into Analance Advanced Analytics using the Custom Notebooks option to use the output of the forecasting model as the data sources for next steps.
We forecasted the CPU, Disk, and Memory utilization and monitored for possible surges. These forecasted values were then combined into a single and used as input for the classification for outage.
Analance Advanced Analytics has eight prebuilt algorithms for classification. We ran the classification model with all the available algorithms in an ensemble mode and checked results to find the best performing model for this scenario.
The Multiclass Random Forest algorithm had the highest accuracy and optimum Recall, F1 score, and Precision compared to the other classification models.
Setting up custom alerts
The algorithm would classify the utilization metric values as Normal (Green), Warning (Yellow), and Outage (Red) based on the forecasted values and rules for the maintenance.
When the model predicts a Warning (Yellow) or an Outage (Red), Service Managers are alerted in real time about server conditions. These smart alerts allow Service Managers to be more agile to prevent outages and avoid the unexpected downtimes.
Visualizing insights at a glance
The solution goes beyond real-time predictions and alerts. This model can be seamlessly deployed into the Analance Business Intelligence module to visualize data models through a live dashboard, showing all the current levels and predicted values of key metrics.
Analance live dashboards empower Service Engineers and Managers with real-time server performance metrics. They can monitor predictions with actual performance to ensure everything is up and running optimally.
Predictive maintenance is the next best form of maintenance as it:
- Improves quality
- Reduces unplanned downtime
- Reduces maintenance costs
- Increases overall equipment effectiveness
- Guarantees on time delivery
- Simulates a stress-free environment