Forbes

Artificial Intelligence Is Set to Change the Face of IT Operations

By Janakiram MSV

July 26, 2017

Artificial Intelligence will have a profound impact on the IT industry. The Machine Learning algorithms and models that bring AI to the forefront get only better with data. If these algorithms can learn from existing medical reports and help doctors with diagnosis, the same can be used to improve IT operations. After all, enterprise IT deals with humongous data acquired from servers, operating systems, applications and users. These datasets can be used for creating ML models that assist system administrators, DevOps teams and IT support departments.

Here are a few areas of enterprise IT that AI will significantly impact.

Log Analysis

Analyzing logs is the most obvious use case for AI-driven operations. Every layer of the stack – hardware, operating systems, servers, applications – generates the data stream that can be collected, stored, processed, and analyzed by ML algorithms. Today this data is used by the IT team to perform audit trails and root cause analysis (RCA) of an event caused due to a security breach or a system failure. Traditional log management platforms such as Splunk, Elasticsearch, Data Dog, and New Relic are augmenting their platforms with Machine Learning. By bringing AI to log analysis, IT can proactively find anomalies in the systems before a failure is reported.

Having sensed the opportunity in bringing ML to log management, a few startups are building AI-driven log analysis platforms. These intelligent tools can correlate data from networking gear, servers and applications to pinpoint the issue in real-time.

Going forward, the software will become smart enough to self-diagnose and self-heal to recover from failures. ML algorithms will be embedded right into the source of data including operating systems, databases, and application software.

Capacity Planning

IT architects spend considerable about of time planning the resource needs of applications. It could be very challenging to define the server specifications for a complex, multi-tier application deployment. Each physical layer of the application needs to be matched with the number of CPU cores, the amount of RAM, storage capacity and network bandwidth.

In the public cloud environments, this results in identifying the right VM type for each tier. Some of the mature IaaS offerings such as Amazon EC2, Azure VMs and Google Compute Engine offer dozens of VM types making it a difficult choice. Cloud providers regularly add new VM families to support the emerging workloads like big data, game rendering, parallel processing, and data warehousing.

Machine Learning can come to the rescue of infrastructure architects by helping them define the right specifications of hardware or choose the appropriate instance type in the public cloud. The algorithms learn from existing deployments and their performance to recommend the optimal configuration for each workload.

It’s a matter of time before the public cloud providers add an intelligent VM recommendation engine for each running workload. This move will reduce the burden on IT architects by assisting them in identifying the right configuration and specifications.

Infrastructure Scaling

Thanks to the elasticity of the cloud, administrators can define auto scaling for applications. Auto scaling can be configured to be proactive or reactive. In proactive mode, admins will schedule the scale-out operation before a particular event. For example, if a direct mailer campaign triggered every weekend results in additional load, they can configure the infrastructure to scale-out on a Friday evening and scale-in on Sunday. In reactive mode, the underlying monitoring infrastructure will track key metrics such as CPU utilization and memory usage to initiate a scale-out operation. When the load returns to the normalcy, the scale-in operation takes place bringing back the infrastructure to its original form.

With Machine Learning, IT admins can configure predictive scaling that learns from the previous load conditions and usage patterns. The system will become intelligent enough to decide when to scale with no explicit rules. This design complements capacity planning by adjusting the runtime infrastructure needs more accurately.

In the coming months, public cloud providers will start adding predictive scaling to their IaaS offering.

Cost Management

Assessing the cost of infrastructure plays a crucial role in IT architecture. Especially in the public cloud, cost analysis and forecast is complex. Cloud providers charge for a variety of components including the usage of VMs, storage capacity, IOPS, internal and external bandwidth, and API calls made by applications.

Machine Learning can accurately forecast the cost of infrastructure. By analyzing the workloads and their usage patterns, it becomes possible to provide a breakup of the cost across various components, applications, departments, and subscription accounts. This would help business units to secure IT budgets more accurately.

Intelligent cost management will become a de facto feature of public cloud platforms.

Energy Efficiency

Large enterprises and infrastructure providers are continuing to invest in massive data centers. One of the most complex challenges of managing data centers is power management. The increase in energy costs combined with environmental responsibility has put pressure on the data center industry to improve its operational efficiency.

By applying Machine Learning to the power management, data center administrators can dramatically reduce the energy usage. Google is pioneering AI-driven power management through DeepMind, a UK-based company that the search giant acquired in 2014 for $600 million. Google claims that it managed to reduce the amount of energy used for cooling by up to 40 percent. The below graph shows how the PUE (Power Usage Effectiveness) was adjusted based on the ML recommendations.

 

ML-based Energy Management

 

ML-based Energy Management

AI-driven power management will become accessible to enterprises to bring energy efficiency into data center management.

Performance Tuning

After an application is deployed in production, a considerable amount of time is spent in tuning its performance. Especially, database engines that deal with significant amount of transactions experience reduced performance over a period of time. DBAs step in to drop and rebuild indices and clear the logs to free up space. Almost every workload including web applications, mobile applications, big data solutions, and line-of-business applications need tweaking to get the optimal performance.

Machine Learning can deliver auto tuning of workloads. By analyzing the logs and the time taken for common tasks such as processing a query or responding to a request, the algorithm can apply an accurate fix to the problem. It augments the log management by taking action instead of escalating the issue to the team. This will directly impact the cost of support and running enterprise IT help desks.

Artificial Intelligence will have an enormous impact on the L1 and L2 IT support roles. Most of the issues that are escalated to them will be tackled by intelligent algorithms.

 

This article was written by Janakiram Msv from Forbes and was legally licensed through the NewsCred publisher network. Please direct all licensing questions to legal@newscred.com.