System Design Concepts Every Software Engineer Should Know

7 min readJan 28, 2023

As software engineers, we are constantly faced with the challenge of designing systems that can handle large amounts of data, scale seamlessly, and perform efficiently. The ability to design such systems is a crucial skill for any software engineer, but it can be overwhelming to know where to start. In this blog post, we will explore fundamental concepts that every software engineer should know regarding system design.

Scalability

A system’s ability to handle increasing loads or traffic. Understanding different scalability approaches, such as horizontal and vertical scaling, is important.

Horizontal scaling is a common approach to scale systems by adding more machines to handle the load. For example, a web application receiving more traffic can add more servers to handle the increase in requests. However, it also comes with challenges, such as managing and coordinating multiple machines, load balancing and ensuring data consistency across them.

Vertical scaling, on the other hand, increases the resources of a single machine. For example, a database server can be upgraded with more memory and CPU power to handle a higher load. This approach is less complex and can be done quickly, but it also has limitations. The cost of adding more resources to a single machine increases as the load increases. Also, there is a physical limit to how many resources can be added to a single machine. At some point, a machine will reach its maximum capacity, and further scaling will be impossible without replacing the entire machine.

Load Balancing

Load balancing is distributing workloads across multiple machines to optimise resource usage and ensure that no single machine is overwhelmed. This is particularly important in distributed systems where a single machine cannot handle the entire load.

Several load-balancing algorithms can be used, each with its advantages and disadvantages. Some common algorithms include:

Round-robin: Requests are distributed evenly across all machines. This simple algorithm is easy to implement, but it doesn’t consider the current load on each machine.
Least connections: Requests are sent to the machine with the fewest current connections. This algorithm considers the current load on each machine, but it doesn’t consider the processing power of each machine.
IP Hash: The IP address of the client is used to determine which machine to send the request to. This ensures that requests from the same client are always sent to the same machine, but it doesn’t consider the current load on each machine.
Least response time: The load balancer measures the response time of each machine and sends requests to the machine with the lowest response time. This algorithm takes into account the current load and performance of each machine.

Load balancers can be implemented in hardware or software. Hardware load balancers are typically faster and more powerful, but they are also more expensive and difficult to maintain. Software load balancers are typically less expensive and easier to maintain, but they may not be able to handle as much traffic.

Reverse and Forward Proxy

A reverse proxy is a proxy server that sits in front of one or more web servers and directs incoming client requests to the appropriate web server. Reverse proxies are often used to provide additional security, performance, and scalability to web applications.

For example, a reverse proxy can be used to offload SSL/TLS encryption from the web server so that the web server can focus on processing requests. It can also load incoming requests across multiple web servers, ensuring that no single web server is overwhelmed. Reverse proxies can also be used to hide the internal IP addresses of web servers, providing an additional layer of security by making it more difficult for attackers to target specific web servers.

On the other hand, a Forward proxy is a type of proxy server that sits between clients and the internet. It is an intermediary for client requests, forwarding them to the appropriate server. Forward proxies are often used to control access to the internet, provide additional security, and improve performance.

For example, a forward proxy can block access to certain websites, such as social media or streaming sites, to increase organisational productivity. It can also be used to cache frequently requested web pages, reducing the number of requests that need to be sent to the internet, thus improving performance. Forward proxies can also be used to hide the IP addresses of clients, providing an additional layer of security by making it more difficult for attackers to target specific clients.

Availability and Fault Tolerance

Availability refers to the ability of a system to respond to requests on time. It is typically measured as a percentage of time that the system is operational.

Fault tolerance, on the other hand, refers to the ability of a system to continue functioning despite the failure of one or more of its components. This can include hardware failures, software bugs, or network outages.

The two concepts are related in that increasing a system’s fault tolerance can also increase its availability. For example, if a system has multiple redundant components, a failure in one component can be handled by the other components, allowing the system to continue functioning. Additionally, load-balancing techniques can help distribute workloads across multiple machines, increasing the system’s fault tolerance and availability.

However, it is also important to note that increasing a system’s availability and fault tolerance often comes at a cost, such as increased complexity and cost.

Caching

Caching is an essential technique for improving system performance by reducing the load on underlying data stores and increasing the speed of data access. It stores frequently accessed data in a high-speed storage layer like memory. So that it can be quickly retrieved without having to go through the process of fetching it from the original data source. This can significantly improve system responsiveness and reduce the load on servers and databases.

One common example of caching is a web application that caches the results of database queries. When a user requests a page that requires data from a database, the web application checks to see if the data is already in the cache, if it is, the data is returned from the cache, which is much faster than fetching it from the database. If the data is not in the cache, it is retrieved from the database, stored in the cache, and then returned to the user. This way, the web application can serve many requests for the same data without having to fetch it from the database each time.

Another example of caching is content delivery networks (CDN), which cache static assets such as images, videos, and scripts on edge servers worldwide. This allows users to access the data from a server that is geographically closer to them, which can significantly reduce the time it takes to download the data.

Consistency

Consistency is a fundamental concept in distributed systems, as it ensures that data is consistent across all nodes in the system. It can be divided into levels, such as strong and eventual consistency.

Strong consistency means that all nodes in the system always have the same, up-to-date version of the data. This can be achieved through techniques such as two-phase commit or Paxos. A traditional relational database is an example of a system that uses strong consistency.

Eventual consistency, on the other hand, is when all nodes will eventually have the same data, but it may take some time for the changes to propagate. This can be achieved through techniques such as conflict resolution or vector clocks. An example of a system that uses eventual consistency is a distributed key-value store like Cassandra.

Latency

Latency, also known as response time, measures the time it takes for a request to be processed and a response to be returned. It is a critical aspect of system design, particularly for systems that handle real-time data, as it directly impacts the user experience.

In real-time systems, such as stock trading or video conferencing, low latency is essential to ensure that requests are processed, and responses are returned quickly. For example, in a stock trading system, a low latency ensures that trades are executed with minimal delay, reducing the risk of market fluctuations. Similarly, low latency in an online video streaming service ensures that the video starts playing quickly and with minimal buffering. On the other hand, high latency can cause delays, leading to frustration, lost revenue, or even safety concerns.

To achieve low latency, system designers employ various techniques such as reducing the distance between clients and servers, caching, and load balancing. For example, using Content Delivery Networks (CDNs) to place the servers closer to the clients or using in-memory databases to reduce the time needed to retrieve data from disk. Additionally, using techniques such as multithreading and asynchronous programming can also help to reduce latency by allowing multiple requests to be processed simultaneously.

Throughput

Throughput is a measure of a system’s ability to handle a large number of requests in a given period of time. It is important in system design, particularly for high-traffic systems. A system with high throughput can handle many requests quickly and efficiently without delays.

For example, in an e-commerce website, high throughput ensures that many customers can purchase products simultaneously without delays. In contrast, high throughput in a social media platform ensures that many users can post, view, and interact with content in real-time without delays. Systems often employ load balancing, horizontal scaling, and caching techniques to achieve high throughput. These techniques help distribute the load across multiple machines, increase the resources available to handle requests, and store frequently accessed data in a high-speed storage layer.

Partition Tolerance

Partition tolerance is a key aspect of distributed systems. It refers to the ability of the system to maintain its functionality even when network partitions occur. In a distributed system, it is not possible to achieve both consistency and partition tolerance simultaneously. Therefore, a designer must make a trade-off and decide which is more important for the specific use case.

For example, partition tolerance becomes more important in a system where high availability is critical. A distributed storage system that stores critical and sensitive data should have partition tolerance to ensure that the data is still accessible even when network partitions occur. This is particularly important in scenarios where data loss or unavailability could have significant consequences, such as financial transactions or healthcare data.

Thank you for reading. I hope this post is helpful to you. If you have any further questions, don’t hesitate to reach out. I’m always happy to help.

Let’s connect:
LinkedIn
Twitter