{"id":5874,"date":"2024-07-23T15:20:03","date_gmt":"2024-07-23T15:20:03","guid":{"rendered":"https:\/\/www.horsesforsources.com\/?p=5874"},"modified":"2024-07-23T15:20:03","modified_gmt":"2024-07-23T15:20:03","slug":"crowdstrikes-error_072324","status":"publish","type":"post","link":"https:\/\/www.horsesforsources.com\/crowdstrikes-error_072324\/","title":{"rendered":"CrowdStrike is accountable for ITIL failure but Microsoft must manage its SaaS ecosystem more diligently"},"content":{"rendered":"
Last Friday\u2019s cluster &^%$ of IT outages plaguing companies globally will likely result in several billion dollars of economic impact. However, for CIOs, the problem wasn\u2019t a security issue. Instead, this was an IT services management (ITSM) issue that caused massive disruption with companies relying on Microsoft\u2019s Windows platform.<\/p>\n
Software-as-a-service (SaaS) has become mainstream, with our research showing 68% of enterprise software is being delivered using this model today. As SaaS allows the software vendor to maintain, upgrade, and improve their solutions via their cloud delivery, updates are regularly issued as par for the course. However, as the CrowdStrike outages illustrate, many IT departments are getting too complacent, allowing their SaaS vendors to have full control of application management, updates, and automated delivery, especially when it comes to security updates. In addition, tech giants like Microsoft must be more diligent with their SaaS ecosystem partners.<\/p>\n
<\/p>\n
Friday, July 19th<\/sup>, wasn\u2019t the first time there have been significant IT outages. For example, Rogers, the second largest telecom services provider in Canada, significantly impacted its customers in July 2022 with a router update from Cisco<\/a>. And in 2020, SolarWind, another cybersecurity firm, dealt its customer a similar as their systems failed after an update. In the case of SolarWinds, this event has been traced back to a bad actor implanting malicious code in an update.<\/p>\n While companies depend on security patches to safeguard their systems, applications, and data, blindly trusting a loose federation of software companies to be mutually compliant is increasingly risky. Any IT leader worth their salt must have a process that not only governs software but also ensures that new software and patches have a modicum of testing to ensure compliance and stability. One only needs to crack open that dusty volume on the ITIL (Information Technology Infrastructure Library) framework to recant the importance of having a standardized process for quality assurance, testing, and deployment.<\/p>\n What caused the CrowdStrike mayhem was the release of a virus definition. Because the update is automated and is accepted automatically by its antivirus software, Flacon, when it was enabled, it caused the \u2018blue screen of death\u2019.<\/p>\n These automatic updates from SaaS vendors are common. However, EDR and antivirus firms push out a significant number of virus definition updates per week, sometimes even per day, depending on the severity of a virus they’ve discovered. This is all done to meet device-level security requirements required by standards like Soc2. However, when CrowdStrike released its version early Friday, July 19, it resulted in a global Windows meltdown for nearly every firm running CrowdStrike\u2019s Falcon product.<\/p>\n The ONLY explanation for this is CrowdStrike’s fundamental failure to follow basic ITSM or ITIL practices. ITIL is an industry recognized five step framework outlining a set of best practices and guidelines for managing and delivering IT software and services. ITIL offers software development teams with a systematic approach to IT service management (ITSM) with a focus on aligning their services with the needs of the business and ensuring the quality of the products they deliver.<\/p>\n In the case of CrowdStrike, its development team likely glossed over Step 3, Service Transition. While it likely focused on its standard operating procedure for writing the update the virus definition code, it appears they dropped the ball here for some unknown reason or hubris. As a reminder, in service transition, standard ITIL practices dictate the developer ensure the software (package, feature, or update) undergo a validation and test step. Surely, CrowdStrike has a stage gate for this, don\u2019t they?<\/p>\n This step would have put the update through a quick battery of code testing, integration testing with Window\u2019s OS, and finally system testing between the antivirus, Windows, and any additional services that might be called. Given the failure happened after the update hit the CrowdStrike Falcon software causing Windows clients globally to fail, it is pretty clear there was a lack of quality or system testing prior to release.<\/p>\n Therefore, one can only assume that CrowdStrike\u2019s developers make the poor decision to skip testing and trust that their update would just work. This is a black mark on CS\u2019s quality control, assuming it has one, and should lead to many CIO asking their CrowdStrike rep, \u201cWTF, don\u2019t you test these?!\u201d<\/p>\nCrowdStrike is a fault due to negligence processes<\/span><\/h3>\n