Sunday, September 13, 2020

In-House Version Test framework

Every e-commerce company has many data science models trained and experimented with live production traffic, for any new experiment introduced via AB testing required performance evaluation to the business and productization. Most product teams and engineers preferred to have simplified distribution handling i.e., convenient and straightforward user interface to handle and change configuration dynamically.

 

There are many AB testing (Version test frameworks) available in the market with per call models, but almost all companies invest in the in-house version test framework and make their microservices for version tests. Today, we will not comment on publicly available version test providers but will talk about a few primary use cases and their use cases.

 

Every version test setup uses to have a few critical use cases.

  1. A microservice that can serve the current distribution for a given version.
  2. A Client that can be embedded into multiple technologies to access the current distribution of version tests. 
  3. The evaluation of provided distribution i.e., for given traffic distribution of version test which variant the version should use.
  4. A simplified user interface to add/update/disable any version test can be accessible by management or product to change distribution whenever required.

 

The implementation defers company-by-company, but there are few challenges. It looks simple to have a simple HTTP client that will access the microservice, and the microservice can provide the evaluated variant based on the distribution (shown below).

 

 

But, for given traffic of thousands of requests per second, there will be many network calls resulting in increased resource utilization and the burden of managing another stack of the version test framework. For further improvement, we can decouple the evaluation from micro service i.e., microservice will only provide you with distribution, and evaluation can be done on service-client, something like mentioned below.

 

 

But, even after decoupling the evaluation, we are not getting any benefits of a reduced number of calls to the newly created version test service. Few companies use to cache the distribution locally for 10-15 mins depending on the requirement i.e., locally cache the distribution and use service-client for evaluation, and once the TTL of the locally cached version is expired, fetch the new allocation for another 10-15 minutes.

 

This leads to a little optimization, but what if a version test used to distribute traffic between old v/s new components? What if the downstream third party stack fails and you immediately want to shut down the variant traffic? 10-15 minutes can cause much damage in those situations.

 

Hence to come up with a more optimized version, I am proposing to have a pub/sub model i.e., for any add/update deactivation of version test, a message will be published for all consumers, and service-client can cache the distribution and use evaluation for working purpose, but also will subscribe to a topic/message which can trigger to fetch the latest from service. The local distribution will be updated as-soon-as any changes were made on version test.

 

 

TestService

 

If you haven't explored Redis's pub/sub capabilities, please look and let us know if you need any help. You might be thinking, why did I use Redis? So the answer is to keep the explanation simple. Feel free to evaluate any open-source pub-sub implementations. We know Redis is key value-based storage, and most of the version test configurations are more like document-based. But, if you will further simplify the schema, you might distribute it based on different domains of your business e.g., Frontend based version tests, Backend version tests, or data science-based version tests where the service stack is other for different teams. And redis hash-set based storage can help you store this information more appropriately. Redis is not just a key-value data store but is having many other capabilities. I am currently evaluating tree-based storage with a redis and GraphQL based interface for storing and accessing the metadata. I Will be able to share as-soon-as I will make any progress on it.

 

One last thing which is in trend today is the user-interface for easy access to current distribution and capabilities to make any changes. This feature in most companies is a tiebreaker, as management/product does not want to rely on the engineer's availability to update the distribution into the service data store.  This simple implementation of a graphical user interface to list all distribution, filter by name, and add/update/deactivate functionality will support management.

 

It will help you have a straightforward service for distribution handling, fewer calls to service (easy maintenance), fast evaluation with a leap of local cache and subscription to update messages, and minimal graphical user interface for management.

 

Let me know if one needs any further explanation on how to implement evaluation or detailed description on version-test service implementation or assessment of proper datastore is required, and we can continue our discussion for those topics.

No comments:

Post a Comment