Cloud Design Pattern - Throttling Pattern(节流模式)

1.前言上一篇我们讨论了云计算设计模式之静态内容托管模式,了解如何使用存储服务来实现静态内容托管,提升网站性能.这一篇我们继续讨论如何提升性能的话题,这一篇我们要讨论的是云计算的节流模式.2.概念

feng1456

1236人浏览 · 2015-11-30 10:24:09

feng1456 · 2015-11-30 10:24:09 发布

1.前言

上一篇我们讨论了云计算设计模式之静态内容托管模式,了解如何使用存储服务来实现静态内容托管,提升网站性能.这一篇我们继续讨论如何提升性能的话题,这一篇我们要讨论的是云计算的节流模式.

2.概念

对应用程序,租户及整个服务对资源的消耗加以控制,这种模式能够确保应用程序在满足服务级别协议的基础上,继续提供服务,即使是在给资源增加额外负载的情形下.云端应用的负载在不同的时间段有很大的变化,与活跃用户数及他们所做的操作有关.举例来说,在工作事件,系统的负载会较大,在月底要做大量数据统计汇总的时候,系统的负载也会相应增大.系统可能会出现访问量暴增,系统性能急剧下降.系统会出现较长的延时,甚至因超时而失败,这种问题是无法被用户所接受的.

有很多的策略可以应对突然增长的请求,具体使用哪一种需要根据商业目标来"对症下药".一种最常见的模式就是自动扩展的模式,这种模式并非是瞬间的,而是需要一定的时间域来完成动态的扩展.

3.解决方案

一种可行的解决方案是仅仅允许应用程序使用资源达到设定的上限.（An alternative strategy to autoscaling is to allow applications to use resources only up to some soft limit）一旦达到我们设定的上限,就可以采取自动水平扩展,再次调整上限阀值,这样就可以实现快速扩展了.为了实现这一点,应用程序需要监控资源的使用情况,当资源的已使用量超过了系统设置的值,系统需要限制某些用户或者租户的访问,这样可以满足SLA协议的要求.更详细的信息请参考:Instrumentation and Telemetry Guidance.

系统可实现以下可选的策略:

1) Rejecting requests from an individual user who has already accessed system APIs more than n times per second over a given period of time. This requires that the system meters the use of resources for each tenant or user running an application. For more information, see the Service Metering Guidance.
2) Disabling or degrading the functionality of selected nonessential services so that essential services can run unimpeded with sufficient resources. For example, if the application is streaming video output, it could switch to a lower resolution.
3) Using load leveling to smooth the volume of activity (this approach is covered in more detail by the Queue-based Load Leveling pattern). In a multitenant environment, this approach will reduce the performance for every tenant. If the system must support a mix of tenants with different SLAs, the work for high-value tenants might be performed immediately. Requests for other tenants can be held back, and handled when the backlog has eased. The Priority Queue pattern could be used to help implement this approach.
4) Deferring operations being performed on behalf of lower priority applications or tenants. These operations can be suspended or curtailed, with an exception generated to inform the tenant that the system is busy and that the operation should be retried later.

上图展示的是应用程序使用资源突破系统设定的阀值的时候的情形,其实通常超过阀值之后,系统应该自动进行扩展（增加Volume或者compute）,然后调整阀值到一个更高的值,当系统应用资源很少的时候,再次降低阀值.

4.需要考虑的事项

Throttling an application, and the strategy to use, is an architectural decision that impacts the entire design of a system. Throttling should be considered early on in the application design because it is not easy to add it once a system has been implemented.
Throttling must be performed quickly. The system must be capable of detecting an increase in activity and react accordingly. The system must also be able to revert back to its original state quickly after the load has eased. This requires that the appropriate performance data is continually captured and monitored.
If a service needs to temporarily deny a user request, it should return a specific error code so that the client application understands that the reason for the refusal to perform an operation is due to throttling. The client application can wait for a period before retrying the request.
Throttling can be used as an interim measure while a system autoscales. In some cases it may be better to simply throttle, rather than to scale, if a burst in activity is sudden and is not expected to be long lived because scaling can add considerably to running costs.
If throttling is being used as a temporary measure while a system autoscales, and if resource demands grow very quickly, the system might not be able to continue functioning—even when operating in a throttled mode. If this is not acceptable, consider maintaining larger reserves of capacity and configuring more aggressive autoscaling.

5.何时使用

To ensure that a system continues to meet service level agreements.
To prevent a single tenant from monopolizing the resources provided by an application.
To handle bursts in activity.
To help cost-optimize a system by limiting the maximum resource levels needed to keep it functioning.

6.Example

Figure 3 illustrates how throttling can be implemented in a multi-tenant system. Users from each of the tenant organizations access a cloud-hosted application where they fill out and submit surveys. The application contains instrumentation that monitors the rate at which these users are submitting requests to the application.

In order to prevent the users from one tenant affecting the responsiveness and availability of the application for all other users, a limit is applied to the number of requests per second that the users from any one tenant can submit. The application blocks requests that exceed this limit.

7.相关阅读

The following patterns and guidance may also be relevant when implementing this pattern:

Instrumentation and Telemetry Guidance. Throttling depends on gathering information on how heavily a service is being used. The Instrumentation and Telemetry Guidance describes how to generate and capture custom monitoring information.
Service Metering Guidance. This guidance describes how to meter the use of services in order to gain an understanding of how they are used. This information can be useful in determining how to throttle a service.
Autoscaling Guidance. Throttling can be used as an interim measure while a system autoscales, or to remove the need for a system to autoscale. The Autoscaling Guidance contains more information on autoscaling strategies.
Queue-based Load Leveling pattern. Queue-based load leveling is a commonly used mechanism for implementing throttling. A queue can act as a buffer that helps to even out the rate at which requests sent by an application are delivered to a service.
Priority Queue Pattern. A system can use priority queuing as part of its throttling strategy to maintain performance for critical or higher value applications, while reducing the performance of less important applications.