Armchair Architects: Architectural Erosion and Technical Debt
Published Mar 09 2023 07:16 PM 8,325 Views

Welcome back to Armchair Architects as part of the Azure Enablement Show. Today we're going to talk about this architecture erosion and we're also going to talk about technical debt with our Armchair Architects, Uli Homann and Eric Charran.


What does architecture erosion mean?


The way Eric thinks about architectural erosion is when architects and engineers work together, they construct a system or solution. They have launched that solution into production. It's performing well for a period of time and then change happens. Whether it's a change in requirements, a change in infrastructure, or a change in customer habits, DevOps signals saying that people are using a certain feature versus they're not using a certain feature. What ended up happening is there's a moment in time in which we look at velocity around how do we implement this change and make the applications experience, do what the customer or the end user wants as quickly as possible. Then there's the strategic picture of managing things like technical debt, which is if I do something tactical, I'm probably going to do it fast and cheap and not necessarily the “right way.” This then accrues to the architectural patterns, longevity and scalability and all those other types of things, and then that goes into my pile of technical debt.


This would be architectural erosion in Eric's mind. You're doing the right thing, but you're not necessarily following all the steps and all the processes to do it correctly. You can't necessarily do that all the time. But over time that means now the application has become out of sync with architectural principles that were originally put in place when it was built and that happens as architectures live. The focus is what do we do when it separates from utilization of good patterns and starts to accrue too much technical debt.


The impact of technical and business disruptions on architecture erosion.


Uli Homann, the other Arm Chair Architect in this session, mentions another key question that you need to answer, “is this actually a technology disruption or business disruption?” Where if suddenly you see a complete paradigm shift. For example, in the 1960s, when all of a sudden most of the weaving and cloth making, which used to be a huge industry both in the US and Europe, all moved to Asia. Suddenly, your technical architectures, that were supporting a local industry and a local manufacturing site completely were upended. Or another example is US steel, when the US steel companies were dominant and suddenly, the Asian and European companies came in with two innovations: one, they standardized the steel rod, so it was much easier to transport and manufacture or manipulate. And the second innovation was a concept called a mini mill. So instead of having to put all the material to US steel in their centralized locals they had little local mills that effectively were working with the automaker or whoever the steel consumer was and that was a huge disruption on the business side that drives obviously then technical conversation. But then the Internet shows up as a technical disruptor, suddenly all of your architecture is effectively yesterday's news and so how do you deal with that?


For Uli, that's an erosion instead of a technical debt because technical debt is accumulating poor behaviors because you're shipping applications. Instead of saying, “oh, I made a decision here short term, I wanted to fix it and I didn't get to it because we had to do new features. Let's clean-up work.” Erosion is, “Oh damn, the world has completely shifted.” As in the examples, US steel or the Internet showing up or now we have 5G connectivity showing up everywhere. What does it mean for software architecture and solutions that you built based upon a certain paradigm? And Uli thinks that's really what erosion means, that all of a sudden, the carpet got ripped underneath you and many business environmental elements have changed. So, for Uli that's very common and that's one of the reasons in the early 2000s Uli authored an article about business architectures as that was the craze of SOA service-oriented architecture.


Another very big technical disruption really that effectively allowed us to go and say, “Hey, how do we build the right services, what does this mean?” It turns out that the business architecture of how you organize yourself from a business perspective is actually a very good architecture paradigm that has longer longevity than just pure technical perspectives. So that's why Uli recommends you should think about a half lifetime, like the atomic problems, architectures have half lifetime where you say, “OK, is it five years, 10 years, three years?” It really depends on what your basis is and how do you go and. organize yourself and other factors that's erosion and erosion protection. Technical debt is where you have made small choices or have made choices that you know weren't ideal, but you didn't have the time to go and do it right so, we're going to live with this. As long as you track it and then address it, Uli thinks you're going to be OK.


Three levels of erosion: large technology disruptions, slow erosion, and living architecture.


Eric thinks it's all part of the same challenge, which is how do I manage the solution over time? And he has identified the correct pivot points: large disruptions in technology that made things possible now that weren't possible before, slow erosion which is I'm going to actually make tactical decisions that might not be the best right now, but I promise to double back and fix those later. Lastly, there's the architecture itself, which is what we originally started out with, but this thing's alive. The architecture on day one won't be the same on day 60 or day 90, it's got to change over time with all those first two events. So, the question then becomes now we also have situations in which we've got CI/CD pipelines and GitOps increasing the velocity of changes and the speed with changes can manifest in an architecture. Then the concern is that if we've got super highway of changes being introduced into solutions, could that actually be a source of erosion? The thing that architects need to consider here is just to be aware of these things. Number one, your architecture should be alive. It's not like you paint the picture and hang it on the wall and say, “alright, what's the next one?” You definitely do have to check back in with your architecture to determine how tactical decisions, technical debt and disruptions can and should change & erode the architecture. Then how do you need to change the architecture to match where things need to be in the future and then also if there is technical debt, we need to make sure that architects shine daylight on that. Make sure that you understand that the choices you made are completely in line with business objectives. We know we had to do the wrong thing for the right reason. Now we have to spend some sprints going back and fixing that and making sure the solution actually aligns with your architecture.


The business model takes longer to change than your technical model.


There is one important factor to consider, which is, what's the business model and business perspective you're basing your thinking on? That's important because ultimately the business model takes longer to change. While in the US steel example it did happen, it did however take over 5-6 years for it happen. These things don't happen overnight. So, if you have a business perspective and a business architecture approach, your technical model most likely will be more survivable than if you just look at pure tech. Technology changes faster than business models change.


This leads to the question is there anything you do to protect against architectural erosion and technical debt from happening? What else can you do to make sure that you're not suddenly impacted?


How to prevent architectural erosion.


The best way to manage the future is to invent it.  If you're looking at the sea, changes can happen. First, one of the things that Microsoft is doing, Google is doing, and IBM has done for a long time is actually having a research organization. The reason why we have research organizations is to get these very early signals that are not yet clear. But you see what's going on because the academia communities often, not always, but often the people that are signaling and heralding this change they can see what's going on. So, if you have your own research organization or you are plugged into research working with academia, you often get signals that you don't get if you just don't pay attention. So, pay attention, go out there, listen, see what's happening and then draw your own conclusions with respect to what does this mean for me? Do I want to take advantage of this? When do I want to take advantage of it? Can I take advantage of it? Those kinds of questions I think are important for both the business side of the house as well as the technical architecture side of the house. Do pay attention whether you have a research organization formally or read trade articles and articles on the Internet.


If you are seeing architecture erosion, here's how you know that you're in this situation and here's what you do to get out of it. The way you know you're in it is if you look at this entirely tangled ball of yarn string and rubber bands and you're like “we got to start from scratch.” That's how you know that you have lost control and you have rampant erosion in which this is a platform that I can no longer save. We have to start over; we'll service this existing platform and keep it on dial tone and then we'll just build a brand-new platform.


The challenge is for those architects that have been around a while, you see multiple cycles like this where the platform just gets too big and have to burn it down and create another one. To break out of that, you must be aware of this architectural erosion and all the three dimensions we've talked about in this session so far. We as software people, we tend to feel like if nobody's mad at us, then things are probably going OK. This is like a slow poison pill that you have to pay attention to. The only way to pay attention to it is through rigor. Architectural reviews, calibration sessions, meeting with your engineering counterparts and correlating that back to the business architecture. To make sure that things are not going out of ship, out of phase, both from a technical debt perspective and what the solution is actually there to do at the end of the day.


Recommended Next Steps:


If you’d like to learn more about the general principles prescribed by Microsoft, we recommend Microsoft Cloud Adoption Framework for platform and environment-level guidance and Azure Well-Architected Framework. You can also register for an upcoming workshop led by Azure partners on cloud migration and adoption topics and incorporate click-through labs to ensure effective, pragmatic training.


To hear the whole conversation, you can watch the video below.


Version history
Last update:
‎May 03 2023 02:28 PM
Updated by: