Cloud Architecture Patterns
Now a days i am reading few of the books and articles and also going through few of the online sessions to start developing effective designs for cloud applications. In this post i am going to talk about the general idea of few major cloud design patterns, and will keep updating this as per the knowledge updates. This post is a little lengthy but i tried to make interest while working on it, so that a reader wont get boared.
To understand few of the concepts of these patterns lets refer the listed below architecture diagram.
Scalability
The first thing which came into our minds while talking about the clouds is the accessibility and Scalability (measure of number of users an application can support effectively as the same time). At the point where the application cannot handle additional users can be said as limit of scalability. We know that for the deployment of any application there are few factors to be keep in mind i.e. CPU, memory, disk capacity and throughput, and network bandwidth. Scalability is all about handling these resources effectively.
Further, it can be divided into two types, vertical scalability and horizontal scalability.
To understand the concept of vertical scalability and horizontal scalability lets assume that the machine on which application is deployed is a node, and for a distributed environment application is deployed on multiple nodes (as shown below on Application Tier).
Vertical Scalability :- to increase the overall application capacity by increasing the resources within existing nodes, is called as increasing vertical scalability. It can be achieved by hardware improvements, i.e. adding memory, increasing the number of CPU cores, or other single node changes.
Horizontal Scalability :- to increase overall application capacity by adding nodes (like adding fourth node in cluster) will be referred as increasing horizontal scalability. Each additional node adds equivalent capacity such as same amount of memory and same CPU. Horizontal scaling is more complex than vertical, as horizontal scaling is development and architectural focused.
While studying these types, came across a example listed below:
"Consider a roadway for automobile travel. If the roadway was unable to support the desired volume of traffic, we could improve matters in a number of possible ways. One improvement would be to upgrade the road materials (“ the hardware”) from a dirt road to pavement to support higher travel speeds. This is vertically scaling up; the cars and trucks (“ the software”) will be able to go faster. Alternatively, we could widen the road to multiple lanes. This is horizontally scaling out; more cars and trucks can drive in parallel. And of course we could both upgrade the road materials and add more lanes, combining scaling up with scaling out."
Horizontal Scaling is more efficient with homogeneous nodes.
Horizontal Scaling Compute Pattern
This pattern in primarily listed to provide stateless autonomous compute nodes. Consider the case of a online retail store, where user navigate to various items online and add the items into cart, to checkout, and after all purchases the Items online. Now, as we know that the cart and user navigation are mainly stored in user sessions correct ?, now what will happen if node one is having the user session (cart info) and the purchase request comes to node second ? Or vise versa, there could be many such combinations which will result in loss of information while using horizontal scaling. This compute pattern is all about handling these conditions effectively.
In addition to this this pattern also deals with a) cost effective scaling of compute nodes, b) application capacity requirement exceed, c) application capacity requirement vary seasonally, monthly, or daily (considering the above listed situation, if the user traffic is more during specific week, say for thanksgiving, it will be good idea to add new nodes to improve performance, and also remove additional nodes after the requirement is fulfilled. This will not only increase business but also will be in managed budget), and d) application compute nodes require minimal downtime.
Horizontal scaling can be achieved by adding or releasing compute nodes, which is achieved through any cloud management user interface. Few factors are required, a) desired number of nodes (which is the minimum number of nodes for the deployment), b) limit of scalability of individual node.
- If the number of desired nodes is larger than the current number of nodes, nodes are added.
- If the number of desired nodes is equal to the current number of nodes, but combined scalability is less than the end user traffic, new nodes were added.
- If the desired nodes if less than the current number of nodes, and user traffic is sustainable in desired number of nodes, the additional nodes were released. (Yes, Cloud Scaling is reversible, i.e. additional nodes can be removed from the cluster if there is no need)
Managing Session State:- Coming back to our actual problem scenario of handling session, where we need to direct visiting user to one node or another. As shown above, for multiple nodes the redirection of user request is decided by load balancer, so are the load balancers responsible to handle user redirection ?
There are few ways to handle these situations depending of the requirement.
a). Using sticky sessions in the web tier. Some web applications uses sticky sessions, which assign each user to a specific web server node when they first visit. Once assigned, that node satisfies all of the user's page request for the duration of that visit. This is supported in two places: the load balancer ensures that each user is directed to there assigned node, while the web server nodes store session state for users between pages.
Horizontal scaling can be achieved by adding or releasing compute nodes, which is achieved through any cloud management user interface. Few factors are required, a) desired number of nodes (which is the minimum number of nodes for the deployment), b) limit of scalability of individual node.
- If the number of desired nodes is larger than the current number of nodes, nodes are added.
- If the number of desired nodes is equal to the current number of nodes, but combined scalability is less than the end user traffic, new nodes were added.
- If the desired nodes if less than the current number of nodes, and user traffic is sustainable in desired number of nodes, the additional nodes were released. (Yes, Cloud Scaling is reversible, i.e. additional nodes can be removed from the cluster if there is no need)
Managing Session State:- Coming back to our actual problem scenario of handling session, where we need to direct visiting user to one node or another. As shown above, for multiple nodes the redirection of user request is decided by load balancer, so are the load balancers responsible to handle user redirection ?
There are few ways to handle these situations depending of the requirement.
a). Using sticky sessions in the web tier. Some web applications uses sticky sessions, which assign each user to a specific web server node when they first visit. Once assigned, that node satisfies all of the user's page request for the duration of that visit. This is supported in two places: the load balancer ensures that each user is directed to there assigned node, while the web server nodes store session state for users between pages.
"The Amazon Web Services elastic load balancer supports sticky sessions, although the Windows Azure load balancer does not. It is possible to implement sticky sessions using Application Request Routing (ARR) on Internet Information Services (IIS) in Windows Azure."
b). Using cookies. Actually this should come to point one, this says that the user information can also be kept into the cookies, but it should not be very large (in bytes) i.e. if the user information is very less, that can also be kept into cookies and can be passed every-time along with the request, so that the response from autonomous nodes will be same for all requests. But, what will we do if the user session is having very large data ?
c). The cookies will still be used, not for storing all session state inside it, but holding a application-generated session identifier that links to server-side session state. Using the session identifier, session data can be retrieved and re-hydrated at the beginning of each request and saved again at the end. Few options which could be used for storing state will be NoSQL data store, cloud storage, or distributed cache (If the cache is not distributed, one should need to implement cache replication. Teracotta is preferred for distributed caching, will write separate blog to describe caching replication techniques and advantages of teracotta)
Operational Logs and Metrics:- Managing operational data is another challenge encountered while working with horizontal scaling. Apache Server Logs (or IIS), event logs, performance counters debug messages of applications and custom logs are other primary data sources.
- following AWS, a centralized log server can be setup to gather logs and data from individual instances, preferably logstash, and Kibana & Cloud Watch can be used to prepare matrices and graphs along with alerting and monitoring capabilities.
- following AZURE, the windows azure diagnostics (WAD) monitor is a platform service that can be used to gather data from all of the individual instances and store it centrally in a Single Windows storage account. Another source of operational data is Windows Azure Storage Analytics feature, that includes matrices and access logs from Azure Storage Blobs, Tables, and Queues.
Queue-Centric Workflow Pattern
Did anyone of you guys, have any idea about chain-of-responsibility design pattern, if yes you are all done with this design pattern !!!
Yes, its actually similar (not the same) to that of chain of responsibility design pattern, where we can add asynchronous communication from web tier to application tier, which will increase your response time. The basic idea of having queue-centric workflow patterns deals with loose coupling and focuses on asynchronous delivery of command request sent from user interface to back-end service for processing.
Note: "Amazon’s Scalable Storage Service (S3) is eventually consistent and the documentation warns of this possibility. Both Windows Azure Storage Queues and Windows Azure ServiceBus Queues are immediately consistent, so this edge case does not apply."
If you are aware of HTTP specifications, they have idempotent operations, PUT, GET, DELETE. What does it meant ? It means we can operate of a resource by DELETE from once or 1000 number of times, the end result is equivalent; i.e. Success with resource is gone. This impotency is the key concept to over come from our problem. Do you know, anytime a message is dequeued, cloud queue service provides dequeue count value along with the message. i.e. first time when a message is dequeued its dequeue count is one, and by checking this value application code can tell weather this this is first processing attempt or repeat. Idempotent handling is the first step in dealing with repeat messages, if the message repeats excessively above the threshold it will be called as poison message. Now two decisions needs to be taken for poison messages: how to detect one, and what to do with it.
- Your poison message detection logic must include a rule that considers any message that keeps reappearing to be treated as a poison message when it shows up the Nth time. Choosing a value for N is a business decision that balances the cost of wastefully processing a poison message with the risk of not processing a valid message.
- Your poison message detection logic must include a rule that considers any message that keeps reappearing to be treated as a poison message when it shows up the Nth time. Choosing a value for N is a business decision that balances the cost of wastefully processing a poison message with the risk of not processing a valid message.
These are two standard actions, but they may defer ab per the business scenarios. To get more knowledge about this design pattern one should prefer to read about "Comets".
Auto-Scaling Pattern
We just read about horizontal scaling and its benefits, Auto-Scaling Pattern deals with automation of scale-in & scale-out the nodes to optimize resources used by cloud application (which saves money) and to minimize human interventions (which saves time and reduces errors) with an assumption that the environment is friendly to reversible scaling. Following a specific statement from a book, "This pattern embraces the inherent increase in overlap across the development and operations activities in the cloud. This overlap is known as DevOps, a portmanteau of “development” and “operations.”
Yes, its actually similar (not the same) to that of chain of responsibility design pattern, where we can add asynchronous communication from web tier to application tier, which will increase your response time. The basic idea of having queue-centric workflow patterns deals with loose coupling and focuses on asynchronous delivery of command request sent from user interface to back-end service for processing.
"The web tier does not use this pattern for read-only page view requests; this pattern is for making updates."
But, that is not it !!!, Consider a scenario, where for a online retail store, if a user is trying to make a payment for a purchase, the purchase request went into a queue, and due to our horizontal scaling (multiple nodes), the task to deduct amount from bank account will be taken by both (or multiple) nodes. This could result in big failure correct ??
To overcome from this problem, lets dig into the problem first, now what is queue?, and how it works ? In simple words queue is just a data structure which stores data and behaves in FIFO (first in first out) order (not an accurate definition).Now, what happens in queue, i.e. the sender add messages to a queue (enqueues messages) and receiver removes those commands from the queue (dequeues messages). Lets see, how received process the message
a) Get the next available message from queue
b) Process the message
c) Delete the message from the queue
The implementation first dequeues the message, and then later deletes the message. Why the two-phase removal? This is to ensure at-least-once processing. i.e. when a message is message is dequeued, it is not removed entirely from the queue, but is instead hidden. The message is hidden for a specified amount of time (the duration is specified during the dequeue operation, and can be increased later). We call this period the invisibility window. When a message is within its invisibility window, it is not available for dequeuing. The invisibility window comes into play only when processing takes longer than is allowed. And this automatic reappearance of message on the queue is one key to overcoming failure. Delay in processing time could be caused by any of the possible scenarios like database connectivity reset, network failure form the node, node crash or any programming defects. Now, when we have good understanding of problem and failure, let try to find good solution.Note: "Amazon’s Scalable Storage Service (S3) is eventually consistent and the documentation warns of this possibility. Both Windows Azure Storage Queues and Windows Azure ServiceBus Queues are immediately consistent, so this edge case does not apply."
If you are aware of HTTP specifications, they have idempotent operations, PUT, GET, DELETE. What does it meant ? It means we can operate of a resource by DELETE from once or 1000 number of times, the end result is equivalent; i.e. Success with resource is gone. This impotency is the key concept to over come from our problem. Do you know, anytime a message is dequeued, cloud queue service provides dequeue count value along with the message. i.e. first time when a message is dequeued its dequeue count is one, and by checking this value application code can tell weather this this is first processing attempt or repeat. Idempotent handling is the first step in dealing with repeat messages, if the message repeats excessively above the threshold it will be called as poison message. Now two decisions needs to be taken for poison messages: how to detect one, and what to do with it.
- Your poison message detection logic must include a rule that considers any message that keeps reappearing to be treated as a poison message when it shows up the Nth time. Choosing a value for N is a business decision that balances the cost of wastefully processing a poison message with the risk of not processing a valid message.
- Your poison message detection logic must include a rule that considers any message that keeps reappearing to be treated as a poison message when it shows up the Nth time. Choosing a value for N is a business decision that balances the cost of wastefully processing a poison message with the risk of not processing a valid message.
These are two standard actions, but they may defer ab per the business scenarios. To get more knowledge about this design pattern one should prefer to read about "Comets".
Auto-Scaling Pattern
We just read about horizontal scaling and its benefits, Auto-Scaling Pattern deals with automation of scale-in & scale-out the nodes to optimize resources used by cloud application (which saves money) and to minimize human interventions (which saves time and reduces errors) with an assumption that the environment is friendly to reversible scaling. Following a specific statement from a book, "This pattern embraces the inherent increase in overlap across the development and operations activities in the cloud. This overlap is known as DevOps, a portmanteau of “development” and “operations.”
So, the auto-scaling can be done following schedule for known events and create rules that reacts environmental signals. There are few auto-scaling softwares available, like WASABi by Microsoft Azure (configuration can be found here), AWS auto-scaling (here), the one i like the most Apache Stratos (opensource here), and there was some work done by facebook too (can be found here).
But, that is not it, this can also be implemented programmatically.
Stratos documentation can be found here.
Eventual Consistency (last stable data)
Eventually Consistency pattern uses the CAP theorem to talk about three possible guarantees of data, i.e. Consistency, Availability and Partition Tolerance for distributed environment.
Consistency means everyone get the same answer.
Availability means the clients will have ongoing access
Partition Tolerance means correct operations
Consistency means everyone get the same answer.
Availability means the clients will have ongoing access
Partition Tolerance means correct operations
The CAP Theorem informs us that we must pick two of the three guarantees, which can be written in shorthand as CA, AP, and CP. All three combinations result in different behaviors. The one having focus on here is AP (availability and partition tolerance), also known as eventually consistent. Eventually consistent databases always support some level of atomicity.
MapReduce Pattern
To understand this pattern, lets first understand the basics of Hadoop first. Apache Hadoop is a framework that allows for the distributed processing of large data sets across clusters of commodity computers using a simple programming model.
Note: This snippet is taken from an online training.
- HDFS (Hadoop Distributed File System), as the name suggests, is a file system designed for storing very large files with streaming data access pattern, running cluster in commodity hardware.
- HDFS further is divided into two parts, primary Name Node and following Data Nodes, where the actual data is kept distributed in data nodes, and the mapping of data (i.e. which data recided on which data node) is kept in memory in name node.
- Which means, name node is admin node, and data nodes are the slaves nodes.
- Similarly, to operate on this distributed file system, there will be a Job Tracker (responsible to schedule jobs and create and assign tasks) will be treated as Admin, and Task Trackers are the actual task programs to be executed on individual data nodes.
- These actual task programs are the things which were known as "MapReduce". So map (of map-reduce) is a programs which runs of data nodes for calculation on data and reduce (of map-reduce) will combines the results of maps and pass it back to Job Tracker node.
"MapReduce requires writing two functions: a mapper and a reducer. These functions accept data as input and then return transformed data as output. The functions are called repeatedly, with subsets of the data, with the output of the mapper being aggregated and then sent to the reducer. These two phases sift through large volumes of data a little bit at a time."
Database Sharding Pattern
If for an database someone wants to speed up the reads by 4X, what will you do. Add more CPUs, memory or disks ? probably that will work. But, what will you do if you need to increase the data write ? Database Sharding is one way to solve this. Sharding is just another term for horizontal database partitioning. On sharding data is divided across multiple containers (i.e. each container will be having similar container to hold data, and tuples (rows) were distributed across containers)
Sharding can be of three types: -
Range Sharding, Starts with range, i.e. lets say all the rows from 0 to 1000 will go in one container, 1001 to 2000 in another container and so on.
List Sharding, create a buckets of intrinsic data properties i.e. either grouping US states by region, grouping users by department and so on.
Hash Sharding, Apply a hash function to a key, and use the output to find the appropriate container, i.e. assuming 4 servers, the hash function will return 0, 1, 2 or 3.
In terms of sharding, not all data is sharded, but there will be some data which is replicated to all of the sharding nodes (containers). This data stores shard key which determines which shard node stores any particular database row. This is known as Shard Identification.
Busy Signal Pattern
This pattern deals with the conditions where a cloud service responds a programmatic request with a busy signal rather than success. The only thing which should be taken care to handle these types of conditions is Retry. The retry can be made in either way :-
- Retry immediately (no delay).
- Retry after delay (fixed or random delay).
- Retry with increasing delays (linear or exponential backoff) with a maximum delay.
- Throw an exception in your application.
After a specific number of retry the application should throw an exception and issue an alert and log the error.
Very Nice Explanation. Thanks Himanshu.. wonder why you stop writing such useful blogs!!!
ReplyDelete