If operations is a classic big data problem, cloud Operations is a huge data problem. We all understand the volume of logs, alerts and metrics generated by SaaS applications, and the increasing complexity of hybrid infrastructure, requires you to step up your monitoring strategy and just like any other big data problem – it only makes sense to leverage AI to achieve the observability imperative.
Taking into consideration the sheer volume of IT monitoring data that you have to deal with each and every day as DevOps, IT or SRE- leveraging traditional, reactive monitoring tools and approaches wont cut it much longer. Infusing AI is not about magically identifying and automatically solving all your problems, but given the criticality of delivering a phenomenal user experience for SaaS- you can leverage machine learning models to provide you with insights-rather than data- to not only effectively detect abnormal behaviors but also to predict potential issues, map them to associated services and help you intelligently prioritize preventive, troubleshooting and remediation efforts.
Building and operating a global cloud infrastructure at a large scale is a complex task with hundreds of ever-evolving service components. I am happy to share with you some real-world examples of how AI is leveraged at Azure Marketplace and Linkedin scale to monitor wisely, predict capacity and save costs so you can think how you can take it home, and apply it in your production environments.