Mohan, senior IT architect at ABC Bank, was looking at the cloud bills for the first quarter for one of the bank’s newest payments transaction software he helped architect and design. He simply could not believe his eyes and was aghast at the high dollar figure in front of him. Just a few months back he and the joint functional/technical team had received high appreciation for delivering and deploying in production, the bank’s most business critical high volume SaaS application that leveraged almost fully the public cloud. This was easily the bank’s largest cloud-native application ever to go live. Mohan always had a very strong eye for the future and was passionate about building software that will give the business very high flexibility to handle future needs. He had personally led the creation of the Microservices architecture and domain-driven design, leveraging the best the native cloud architecture has to offer. So you can imagine the shock he got looking at how expensive the application turned out to be, far exceeding what they had budgeted in terms of cloud spend. To add to his worries, only the past week he had sat through some tough meetings along with the IT team to review significantly high levels of functional and non-functional quality issues in the software.
He sat down in his cubicle, coffee mug in hand, and started to review in detail the cloud bill, the detailed cloud monitoring reports, the application’s architecture documents, and the quality reports.
What do you think went wrong?
After a detailed analysis Mohan was able to narrow down the main problem. Ironically, his passion to architect and design futuristically high flexibility into the application (and the related infrastructure) led to two issues. A) Too many public API calls (internet-facing end-points) hitting the API gateway and consequent egress traffic resulted in a much higher than anticipated cloud bill. B) The high flexibility came at a cost, increasing the complexity of the application, which then resulted in the significant quality issues which he was able to directly co-relate to the consequent integration and testing issues. Mohan also realized that another unwanted side effect of having too many API calls was the heightened cyber security risk, also increasing related quality issues.
While Mohan’s detailed analysis of the cloud spend and monitoring reports helped throw light on the various cloud cost elements, it opened his eyes to be careful in future with the potentially expensive tradeoff between flexibility and complexity. He realized it is important to go for simplicity and let the applications’ architecture and design evolve along an architectural runway that will be more aligned with business evolution. That way, the team will be able to match returns (business value) with costs much more closely, especially as the business scales up. Whether technical debt or over-engineering, both exacerbate the cloud spend problem as the business and technology scale up.
For a broader discussion on the topic of continuous cloud cost optimization, see my latest blog here.