System Design: Failure Encoding and Synthesis

January 29, 2022 8-minute read

Introduction

The every day task of handling failure is rarely given much thought. Grabbing whatever is available for the language of choice, failures are treated as backstops. Because of the general principles adpoted by the industry, this works most of the time. When it fails, those failures are handled by the surrounding software and life goes on. But what happens when you want to encode failure in a more systematic way? For a system, collections of failures can converge to represent important emergent behavior. This post aims to discuss the benefits of encoding failure at all of its levels, and asserts that synthesis of failure is a differentiating factor in system design.

A Seemingly Simple Problem

Failure is a concept every programmer must deal with. In programming, there are many reasons why something can fail. A program is forced to either handle that failure, or crash. Both cases can be acceptable depending on the task at hand and the impact of the failure. From exceptions, to crashing with fast restarts, and even pushing failure outside of the algebraic expression of the domain, there have been herculean attempts at what I would call “dealing with failure”. I use the term “dealing” intentionally here, because I feel that’s the appropriate term. Dealing with failure is the minimum amount required for your program to either find a way to continue in a reasonable state or crash. You can find this in most languages under the term “exception handling”, and it goes something like this:

try {
    methodCall();
} catch (Exception e) {
    // log or print error
    // return some kind of state that returns things to normal or do nothing (void)
}

This pattern has served software over the years and accomplishes what it was designed to solve. It’s often misused or even abused, but that’s not the topic of conversation for this post. The point is that there are well established patterns for handling both expected and unexpected failures in software, and it’s perfectly reasonable to continue using them.

It’s not until we ask more questions about failure that we discover how complicated the topic really gets. When presented with a failure, what does that failure mean to the program or system? We have made attempts at creating classifications of failure that help guide developers to think about the semantic meaning of a failure, but that freedom unfortunately creates boundless abundance that doesn’t allow us to categorically reason about failure. Given the very definition of categorically includes “without exception” I understand this is taking artistic liberty, but my point is less literal. It is true that some exceptions are truly exceptional, and put the system in a state where it can no longer continue. It is true that some exceptions are non-terminal in nature and represent an unassisted path to recovery. It is also true that most exceptions have the possibility of being either of these. Unfortunately this means that the representation of failure is multivalent, and until you express it as such, your system can’t truly understand the meaning of failure, and how it should proceed.

Example: Making an HTTP Request

I’m going to start with a foundation present in a vast majority of software today, HTTP requests. The curl library, libcurl, sits at the core of many libraries, and I would consider it one of the most widely used libraries in the world. This example makes a GET request to http://example.com and prints the results.

#include <iostream>
#include <curl/curl.h>

static size_t write_callback(void *contents, size_t size, size_t nmemb, void *userp) {
  ((std::string *)userp)->append((char *)contents, size * nmemb);
  return size * nmemb;
}

int main() {
  CURL *curl = curl_easy_init();
  std::string body_buffer;

  curl_easy_setopt(curl, CURLOPT_URL, "http://example.com");
  curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, write_callback);
  curl_easy_setopt(curl, CURLOPT_WRITEDATA, &body_buffer);

  CURLcode res = curl_easy_perform(curl);
  if (res != CURLE_OK) {
    std::cout << "ERROR: " << curl_easy_strerror(res) << std::endl;
  }

  std::cout << body_buffer << std::endl;

  curl_easy_cleanup(curl);

  return 0;
}

You could accomplish the same result by running

curl -s http://example.com

There are one line examples for most modern programming languages. The choice to start with the C api is intentional. Because exceptions are not part of the equation, we can extract the cases where failure should be represented and start to tease apart representation without exceptions. You will notice the use of C++ in this example, and that’s because we are going to push quickly into C++ to build the additional abstraction layers.

As with any C code, there are no exceptions present. There is, however, error handling code. We can see that there’s a natural seam created by testing the value of CURLcode when the operation completes. This represents catastrophic failure, where, for some reason, the request was unable to obtain a response from the target. While important, it’s not the only type of failure we can observe in this scenario. There are at least two additional layers of failure to account for. First, we should expect the status code of the response to match our expected value. Attempting to process a response body when the status code doesn’t match will likely create an exceptional case and represents a semantic failure. We can test this by adding the following code to our example:

long status_code = 0;
curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &status_code);

if (status_code != 200) {
  std::cout << "ERROR: expected response code of 200, but got " << status_code << std::endl;
}

Finally, there can be failure in processing the response body, but that is not the responsbility of the code that produces the raw response. At this point we have set a precedent for checking an important piece of information. We now have two distinct types of failure that are categorically different. The consumer can choose how to handle them, but the representation should demand separate paths. The first represents an absolute failure of the request. It’s terminal in the HTTP request sense, but not necessarily to the system consuming the result. We can call this “connection” failure to summarize it in a way that doesn’t lead consumers to attempt to treat it as terminal. The second is an expectation failure on a completed response from the server. A fully formed HTTP response can still tell us that the request did not succeed and must be treated differently. It can also represent a different branch to take in the logic of the consumer. We can call this “semantic” failure to summarize it in a way that lets the consumer know we have a complete response, but it should be handled separately from the normal path.

Thinking in Types

If we were to think about the cases above in pseudo types it might look something like:

HttpResponse -- placeholder for a complete response
ConnectionFailure = ConnectionFailure String
SemanticFailure = SemanticFailure HttpResponse
HttpFailure = ConnetionFailure | SemanticFailure
HttpSuccess = HttpSuccess HttpResponse
HttpResult = HttpFailure | HttpSuccess

You might even be able to think of the signature (completely ignoring an obvious effect type) as:

HttpRequest -> Either (Either ConnectionFailure SemanticFailure) HttpSuccess

Which you could break down to

HttpRequest -> Either HttpFailure HttpSuccess

And arrive at

HttpRequest -> HttpResult

Unrwapping this at the call site would now demand three places where the consumer needs to decide what to do. There may be additional things the caller needs to consider, but from the perspective of the HTTP response, the provider of this response can’t produce any more fidelity without some serious feature envy.

Applying These Abstractions To libcurl

We now move back to the C++ example we started with. This post will not explore the depth offered by libcurl, but you can get a better sense of the picture by looking at either the libcurl documentation or the code behind this example, simple_http. For now, we will take the same request from above and apply these ideas:

#include <iostream>
#include <nlohmann/json.hpp>
#include <simple_http.hpp>

struct Failure final {
  std::string url;
  std::string value;
};

struct Id final {
  int value;
};

int main() {
  SimpleHttp::Client client;
  SimpleHttp::HttpUrl url{"https://jsonplaceholder.typicode.com/users"};
  auto result = client.get(url).template match<std::variant<Failure, Id>>(
      [&url](const SimpleHttp::HttpFailure &failure){
        return failure.template match<std::variant<Failure, Id>>(
            [&url](const SimpleHttp::HttpConnectionFailure &connectionFailure) {
              return Failure{url.value(),connectionFailure.value()};
            },
            [&url](const SimpleHttp::HttpResponse &semanticFailure) {
              return Failure{url.value(), semanticFailure.body.value()};
            }
        );
      },
      [](const SimpleHttp::HttpSuccess &success){
        auto parsed = nlohmann::json::parse(success.body().value());
        return Id{parsed[0]["id"]};
      }
  );

  if (std::holds_alternative<Failure>(result)) {
    Failure failure = std::get<Failure>(result);
    std::cout << "Failed HTTP request for <" << failure.url << ">: " << failure.value << std::endl;
  } else {
    std::cout << "Result: " << std::get<Id>(result).value << std::endl;
  }
}

Here we have the same general idea as above, but expressed in C++. There’s a much larger post about representing an Either type in C++, but that’s for another day. Under the hood, simple_http simply uses std::variant to represent the product of both the SimpleHttp::HttpResult type as well as the the product of the SimpleHttp::HttpFailure type. The match function on both forces the caller to provide handlers for all three logical paths. As you can see from the template expression, the caller must also unify all paths into some result type. It uses the natural semantics of Either and assumes the left value of the variant holds the failure. This example also demonstrates that because you are forced to unify the witness type of the result, it’s natural and common to have the product of the request itself be something akin to an algebraic data type.

Wrap-Up

The examples here come from a small HTTP library I have been working on recently, simple_http. I found myself writing the same shallow wrappers around libcurl for almost every project, wishing I had something better, and typically taking a half hearted stab at failure synthesis under the finite lens of the problem at hand. This project is an attempt to ground all of these half hearted attempts and create a common algebra for handling the deceptively complicated problem of making HTTP requests. Rather than take up space to explain the library in detail, I’ll highlight the unit and integration tests for the project that I think represent the major themes. I’m always happy to consider contributions if you find value in any of these ideas.