Last time we were talking about data and how it's changing in part 1 of this blog, and we realized there's more to be said there. So, let's dive back into that conversation with our host David Blank-Edelman and our armchair architects Uli Homann and Eric Charran.
David would like to start talking about the difference between synchronous processing and real time processing of data.
Uli discusses the pros and cons of ‘real-enough-time’.
In the early 1990s Uli was introduced to a term called “real-enough-time” by Norm Judah at an event and he wondered what does that mean? Norm then provided a couple of examples where architectures can go wrong. This is where architectures go and introduce a very complex technical solution to a very simple business solution and the idea here is if you do real-enough-time that means you're really looking at the best point in time based upon the business scenario of how to capture and execute activities. The example Uli shared was instead of doing a two-phase commit with the mainframe and a SQL Server database, this business requirement would have been perfectly served with a file upload and then processing every 24 hours. The team put themselves under extreme technology pressure and the systems because they interpreted it as “no it needs to be in real time” and therefore two-phase commit and all these other good things. Whereas the business owner said, “24 hours good enough if it's synchronized” and that's the big meaning of “real-enough-time”.
This is where Uli always starts with thinking about the later you can process the easier it is on your systems, even more insights you can generate and the hotter you make it the smaller the scope is you have and therefore the data that you have is far less accurate, far less contextualized than all these other things we talked about. And therefore, you need to be careful when you get hotter and hotter data that comes right off the system. Now let's bring it back to the real time processing of data and synchronous which David wanted to dive into.
Uli discusses real time and synchronous processing.
A lot of times when people hear real time, they think they need to make sure that the other system that they're talking to the data gets immediately stored. They might actually want to do transactions across the system, like two-phase commit, which is the worst thing you can do for scalability. There are scenarios where two-phase commit does make sense, but it's very rare and it certainly kills scalability. You can kiss your scalability for goodbye when you do that, but what systems don't realize is once you go synchronized and you hold the locks on the other side, the system must grow bigger, it is much more complicated and so forth. When you do that, you also have to think about creating almost one system; out of the system that calls and the system that responds. You create a single system and updating those will now become a real complex environment because the calls happen all the time. It's very stateful and those kinds of things. Therefore, we want people to think about real-enough-time, do you really need to be this close to the processing and make sure that it happens like this or is it OK to go and stretch out the thing?
Uli uses eCommerce as an example of how to design the best system.
To give you another example using ecommerce, if you visit a website and are filling up your shopping basket the business transaction systems that record you buying items doesn't need to know about it yet. The system doesn’t need to lock inventory for you yet, as long as it processes your request in a reasonable amount of time. Good systems take your shopping basket and treat that as a completely isolated system. It has nothing to do with the record keeping system where sales are recorded, the inventory is deprecated and other activities like that completely isolated.
Then when you are ready to check out and buy the items in your basket, then there's a batch process or an asynchronous process, to use modern language, that processes your shopping basket. In that way you split the system you have; you have a real time system which is the shopping basket where you interact with, but it's a very isolated system and then you have the big system that manages inventory, orders, fulfillment, and other associated activities, which at their own leisure and pace process, your shopping basket as soon as you click “buy”. That is a smart design, that's how you scale.
Because if you do not design it that way and create a shopping basket that goes into the inventory system and locks the inventory and then after an hour of browsing and putting things into the shopping basket the user decides not to click “buy” then you would have locked the entire inventory that someone was looking at for an hour in the shopping basket. That's obviously a bad business design, and it's terrible systems design. And so that's really the danger when you think synchronous in real time and equate that instead of saying “how do I isolate the system, how do I partition the system, how do I buy myself time” so that at the end of the day I don't need that many resources to process the users shopping basket because I have a very small shopping basket, I can deal with that separately. Once the shopping basket is done, it's a persistable data set, I can put it into a queue, I can then go and process it at leisure.
All good systems design do it that way because you book hope at the end of the day. For example, you buy an e-ticket for a flight, they really don't lock the inventory on the seat for you. They will say, “hey, probability 99% is what that I will get that seat and therefore I will tell the user ‘Yes, you got the seat’” and then if I can't get it well say “sorry, this seat is no longer available” and give you another options of available seats.
Eric discusses some do’s and don’ts for new architects.
Eric wanted to wrap up this topic with some dos and don'ts for new architects out there, along with some folks who have worked in this space for a while. The don’ts are don't create what he calls distributed monolithically as Uli described it. If you take two systems and you synchronize their transaction processing, you basically have created a monolith that is inflexible. It's just distributed, which can perform more poorly than a giant monolith altogether that we know today that there are challenges with monoliths; microservices are the preferred approach and asynchronous processing is a preferred approach. The second do or don’t is never ask “do you need this in real time?” Because the answer is always going to be “yes.” It's like asking, “should I delete this file?” The answer is always going to be “no, don't delete it.” So, every businessperson or person that's in charge of requirements says “I need this real time or as it happens.”
Eric had the pleasure of working with Norm Judah and Norm would ask “how real is real?” It’s real enough for you if it's five seconds, if it's 10 minutes, they're like, “Oh yeah, 10 minutes is fine.” So go through that conversation to really feel out how much real time actually is. And then there's two concepts that we need to have emerged; do create real time stream analytics, which is not necessarily transaction processing on the event, but it's looking at just opening a window into the stream. Such as in the street example and seeing what's flying by right now and what can I tell from what's flying by that is completely different than processing in real time, which is going to lead to that distributed monolith problem. Eric thinks this is really where folks that might be stepping into architecture for the first time may not have the scars that Uli and he have in dealing with these distributed monoliths. Transaction processing in real time, hopefully we'll be fast, but eventually you need eventual consistency in that model because all sorts of things have to happen. Architectural paradigms like CQRS and the ambassador pattern, the orchestrator pattern, are designed to help with that. So, focus on that if you must do something with a message, not just see what's happening with the message.