Goes to show just how interconnected and automated cloud-based systems are, and what can go wrong with them. A DNS outage, it has been revealed, was the cause behind the latest Azure outage.
Which, sadly, also led to data loss for some customers.
According to reports, this latest incident transpired after a DNS outage from CenturyLink, a company that provides DNS services to Microsoft. It suffered a software defect, which was reflected on the Azure cloud platform.
And it indirectly led to the deletion of live customer information due to a lack of human intervention.
CenturyLink is no stranger to controversies like these, as it seems to be experiencing DNS problems lately. It also suffered a DNS outage in December that reportedly affected emergency services, and actually sparked an FCC investigation.
All this, soon after its $34 billion acquisition of large network operator Level 3 in 2017.
As for Azure, a combination of DNS problems and automated scripts were to blame for this latest fiasco.
Microsoft apparently deleted a number of Transparent Data Encryption (TDE) databases in Azure, which held live customer information. TDE databases dynamically encrypt the information they store, and decrypting the data when customers access it.
Keeping the data encrypted at rest is done to prevent an intruder with access to the database from reading the information.
Azure users store their own encryption keys in Microsoft’s Key Vault encryption key management system, in a process that is called Bring Your Own Key (BYOK).
Redmond explain in letter sent to customers that a script that drops TDE database tables when their corresponding keys can no longer be accessed in the Key Vault, was at the center here. The company quickly restored the tables from a five-minute snapshot backup.
But this still meant that the transactions that customers had processed within five minutes of the table drop had to be dealt with manually.
Azure users can at least take comfort in the fact that Microsoft is offering multiple months of free Azure service for those affected.
Still, this serves as an example just how much can go wrong in the complicated cloud systems of today.