Programming and architectural paradigms
This chapter is to be included in version 1.0 of my book Architectural Metapatterns: the Pattern Language of Software Architecture. Any feedback is warmly welcome.
The latest version (0.9) of the book is available for free from Leanpub and GitHub.
Sharing a database is the greatest sin when you architect Microservices. But Space-Based Architecture is built around shared data. How do these approaches coexist? Do Microservices make any sense if blatantly violating their rules still results in successful projects?
There is a clue in another programming paradox. There was C. Then there came C++ to kill C. Then we’ve got Rust to kill C++. Now we have C, C++ and Rust, all of them perfectly alive.
Technologies are specialized
When a new technology emerges, it must show its superiority over existing mature methods. In most cases that is achieved by specialization. Is a car superior to a donkey? It depends. Probably yes, when there are good roads, plenty of gas and spare parts. A car is narrowly specialized, thus some areas succeeded in adopting cars, while others still rely on donkeys.
The same holds true for programming languages and architectures. C is good when you work close to hardware and need complete control over whatever happens in the system. C++ is great in partitioning business logic, but it lost the simplicity of its predecessor. Rust will likely shine in communication libraries, which are often targeted by hackers, though we are still to see its wide adoption. Usefulness (and choice) of a tool or programming language depends on the circumstances.
Let’s turn our attention to your average code. It is likely to mix:
- Object-oriented programming that divides the application into a tree of loosely interacting pieces.
- Functional programming, with the output of one function becoming the input to another, method chaining included.
- Procedural programming, where multiple functions access the same set of data, which also happens inside classes whose many methods operate their private data members.
Each programming paradigm fits its own kind of tasks. Moreover, the same three approaches reemerge at the system level:
Object-oriented (centralized, shared nothing) paradigm — orchestration
Many software projects are so complex that it is impossible for a programmer to keep all the details of the requirements or implementation in their mind. Still, those details must be written down and run as code.
The good old way out of the trouble is called divide and conquer. The global task is divided into several subtasks, and each subtask is subdivided again and again — till the resulting pieces are either simple enough to solve directly or too messy to allow for further subdivision. Basically, we need to split our domain’s control, logic and data into a single hierarchy of moderately sized components.
We have heard a lot about keeping logic and data together: an object (or actor, or module, or service — whatever we call it) must own its data to assure its consistency and hide the complexity of the component’s internals from its users. If the encapsulation of an object’s data is violated, the object’s code can neither trust nor restructure it. On the other hand, if the data is bound to the logic that accesses it, the entire thing becomes a useful black box which one does not need to look inside to be able to operate.
Adding control to the blend is more subtle, but no less crucial than the encapsulation discussed above. If an object commands another thing to do something, it must receive the result of the delegated action to know how to proceed with its own task. Returning control after the action is conducted enables the separation of high-level supervising (orchestrating, integrating) logic from low-level algorithms which it drives, adding depth to the structure.
Object-oriented design is ubiquitous, thanks to its ability to address complex domains, insofar as the whole can be reduced to self-contained pieces. This paradigm, when applied to distributed systems, gives birth to Microservices, Orchestrated Services and Service-Oriented Architecture.
Functional (decentralized, streaming) paradigm — choreography
Sometimes you don’t need that level of fine-tuning for the behavior of the system you build — it operates as an assembly line with high throughput and little variance: its logic is divided into steps that resemble conveyor’s workstations through which identically structured pieces of data flow, just like goods on a conveyor belt. In that case there is very little to control: if an item is good, it goes further, otherwise it just falls off the line. Here the control resides in the graph of connections, the domain logic is subdivided, while the data is copied between the components.
Functional or pipelined design is famous for its simplicity and high performance as the majority of processing steps can be scaled. However, its straightforward application lacks the depth required for handling complex processes, which translate into webs of relations between hundreds of functions present on the same level of detail. It is also inefficient for choose-your-own-adventure-style (control) systems where too many too short conveyor belts would be required, negating the paradigm’s benefits. And it may not be the right tool for making small changes in large sets of data as you’ll likely need to copy the whole dataset between the functions.
In distributed systems the functional paradigm is disguised as Choreographed Event-Driven Architecture, Data Mesh and various batch or stream processing [DDIA] pipelines.
Procedural (data-centric) paradigm — shared data
The final approach is integration through data. There are cases where the domain data and business logic differ in structure — you cannot divide your project into objects because each of the many pieces of its logic needs to access several (seemingly unrelated) parts of its data.
In the data-centric paradigm logic and data are structured independently. In procedural programming, like in object-oriented paradigm, control is implemented inside the logic, thus making the logic layer hierarchical (orchestrated). Another, less common option, uses Observer [GoF] to provide data change notifications, making application logic decentralized (choreographed).
The data-centric approach works well for moderately-sized projects with a stable data model (like reservation of seats in trains or game of chess). The best-known distributed data-centric architectures include Services with a Shared Database and Space-Based Architecture.
Composite cases
The three programming paradigms often work together:
- An ordinary class is object-oriented on the outside but procedural inside: each of its methods can access any private data member. Moreover, the code inside methods may chain function calls, locally switching to the functional paradigm.
- Cell-Based Architecture tends to use choreography (pub/sub) between cells [DEDS] and orchestration or shared database inside them.
- A system of Services (or Space-Based Architecture) may be integrated through both orchestrator and shared database (or Processing Grid and Data Grid, correspondingly).
Reality is more complex
We have reviewed a few cases directly supported by common programming languages. However, there is a wide variety of possible combinations of (at least) the following dimensions, each making a unique programming paradigm:
- Synchronous (method calls) vs asynchronous (messaging), with closely related:
- Imperative vs reactive.
- Blocking vs non-blocking. - Centralized (orchestrated) vs decentralized (choreographed) flow.
- Shared data (tuple space) vs shared nothing (messaging).
- Commands (actors) vs notifications (agents).
- One-to-one (channels) vs many-to-one (mailboxes) vs one-to-many (multicast) vs many-to-many (gossip) communication.
Some of the combinations look impossible or impractical, others are narrowly specialized thus uncommon, while many more are commonplace. Discussing all of them would require insights from people who used them in practice and would make a dedicated book.
Summary
We have deconstructed the most common programming paradigms into their driving forces and shown how those forces shape distributed architectures:
- An object-oriented system relies on hierarchical decomposition of a complex domain, just like SOA and orchestrated (Micro-)services.
- Functional programming streams data through a sequence of transformations — the idea behind Choreographed Event-Driven Architecture and Data Mesh.
- Procedural style lets any piece of logic access the entire project’s data, which also happens in Space-Based Architecture and Services with a Shared Database.
References
[DDIA] Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. Martin Kleppmann. O’Reilly Media, Inc. (2017).
[DEDS] Designing Event-Driven Systems: Concepts and Patterns for Streaming Services with Apache Kafka. Ben Stopford. O’Reilly Media, Inc. (2018).
[GoF] Design Patterns: Elements of Reusable Object-Oriented Software. Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Addison-Wesley (1994).