In my last post I described how you can think of your organization’s data infrastructure as a grid of blocks defined by category of use case and stage of the pipeline. Each block can be further broken down into three layers: Control, Compute and Storage. Last time I briefly described these layers, then discussed different high-level options for the Control layer. In this post, I’ll do the same for the Compute layer, which is where the computer does the work described by the user in the control layer.
Where it Happens
The Compute layer is where the code/logic that responds to user requests from the Control layer is run. Technically, some logic happens in all three layers, whether it’s to render images and text in the Control layer of to respond to queries in the Storage later. But it’s useful to separately consider the logic that doesn’t address these two, particularly for compute-intensive applications like machine learning where you’ll often want to manage a large chunk of the computations in a separate place from the other layers.
In other words, the compute layer is often split between more than one of the options I’ll describe below: One for the control layer that creates the user interface and possibly handles some other things; one defined by the storage decision that manages data access; and one or more that handle more intensives computations.
Most of the compute options that are defined by the control and storage layers can coordinate with any of the other options as long as the software is written to allow it. But it’s always important to consider how your decisions will impact your ability to split up and coordinate between multiple options.
Local compute means that the code/logic is running on the laptop or desktop computer that the user is interacting with for the Control layer. As noted above, this is where some or all of the Control layer will run, since the last-mile involves displaying something on the user’s screen and getting input from their keyboard, mouse, etc. Some programs may run entirely locally for all three layers, though it’s becoming increasingly common for programs to use one of the other options below for the non-control parts of its Compute layer.
The benefit of local compute is that it’s simple, which makes it both easier to implement and easier to maintain and debug. In particular you don’t need to worry about network outages or situations when the user doesn’t have internet access. Even when the network is working, it doesn’t need to wait for messages to be passed back and forth, making response times faster.
On the other hand, the computing power when using local compute is limited by the user’s desktop or laptop, as opposed to options below that allow them to employ one or more computers on the network that could be significantly faster and have special hardware like GPUs. Moreover, if the user is trying to run other programs at the same time (or has too many browser tabs open) then adding another intense computer job will slow everything down.
This is an example a common trade-off between speed and latency: Roughly speaking, latency is the time it takes for a simple operation to complete. Speed is how much longer things take as the operation becomes more complex. So local compute is good for latency, but often not good for speed.
Typically there will be some parts of a use case where latency is most important and others where speed matters more. The latter steps can often be moved to one of the other compute options that optimize for speed rather than latency.
Remote Dedicated Session
A remote dedicated session is an interaction in which a process is started on a computer somewhere on the network, optionally interacts back and forth with a single user, then turns off when the user is done. The process may involve requisitioning one or more new computers or (more often) a new virtual machines, but it can also be just a new thread on an already-running machine. If it’s a new machine within a Cloud provider then you can get effectively unlimited computer power while only paying for the time that the session is active. Using remote sessions with on-prem servers doesn’t provide the same cost savings, but may still make sense.
A remote session is similar to local compute in that a single user is interacting with the system. In fact, you can sometimes use the same code on a remote session as on your local machine, with a thin wrapper that communicates with the local Control layer. You can think of that remote session as a way of directly transferring portions of logic from local compute to a more powerful remote system.
Because of this, dedicated sessions are often the simplest of the remote options and the easiest to implement. The main drawback is that the user needs to explicitly turn the remote session on and off, which can add noticeable delays. In particular, requisitioning a new machine or VM can take a few minutes. If a user is planning on using the session for a few hours this is probably not a problem. But if they just want to do something short and quick, this delay may be long enough to convince them not to bother.
If starting a session involves just requisitioning a new thread on an existing VM then the startup time will be much faster. But when you run out of capacity on the existing VMs, the next user will have to wait for a new VM to start up. You can mitigate this by starting the new instance before you run out of capacity. But if overall usage is minimal then keeping an extra VM running all the time can be very expensive relative to what you actually need. So this strategy is most effective when you consistently have a baseline level of usage.
Off-the-shelf software packages that use remote sessions will provide the wrapper for starting and stopping them, but if you’re building something custom then this can add a fair amount of complexity. The trade-off is often between building this control logic or rewriting the application itself to fit into one of the other remote options described below.
A remote service is a system that runs continuously on one or more machines somewhere on the network, communicating with multiple users at the same time. The system may be split across a number of machines that increase or decrease based on the amount of usage it’s getting, but this will be based on an estimate of what users will need rather than direct control from users.
The machines can be in an on-premise data center or rented from a cloud provider. Or they could be managed by a Platform as a Service (PaaS) provider that manages a lot of the deployment and scaling that you might need to do yourself on a Cloud provider. The discussion below won’t differentiate between these, though the ability to save costs by scaling instances to meet demand is typically only available with a cloud provider or PaaS.
With a remote service, users don’t need to wait for a new instance to spin up. The service may still keep track of some notion of user sessions, but only for internal book keeping. This greatly simplifies the logic around increasing or decreasing the number instances to meet demand by decoupling it from the internal details of the service.
However, some of this complexity just gets pushed into the service itself, which must now simultaneously handle instructions from multiple independent users. While a local application can often be turned into a remote session application with mostly superficial changes, making it a remote service typically requires a complete rewrite. This includes breaking up interactions into discrete API calls and explicitly storing and retrieving information about each user’s session between API calls.
In terms of cost, running a remote service requires having one or more compute instances running 24/7, whether or not there are users. You can have the system increase the number of instances when traffic picks up, but you’ll always need a bit more capacity than you’re using so you’re ready for the next new user.
The extra capacity is an additional cost, but allows the system to handle spikes in traffic. On the other hand, by sharing instances between multiple users, a remote service can end up costing less than remote sessions if you have enough users enough of the time. Moreover, if you run multiple services in VMs or containers on a common set of shared machines, then you can share the extra capacity too, reducing the overall need.
Ultimately, both remote sessions and remote services require you to manage the complexity of multiple users and managing capacity, but give you different levers for doing so. Services tend to make more sense when you have more users, each requiring a small amount of compute time. But ultimately the decision will depend on a number of factors in the larger context.
Software as a Service (Saas)
Technically, this is a business model that incorporates one of more of the remote compute options above, but it’s important to consider. In this model, a third-party provider of a particular application both makes the decision between remote sessions or remote service, and manages the complexity of each.
The benefits of SaaS are that you don’t need to make these decisions and manage the infrastructure directly. You effectively share the operating costs with all the other customers, potentially including the extra capacity reserved for spikes in usage.
The downside of SaaS may include the extra complexity of having the services run in a different network, though many SaaS platforms have ways of running inside your own (Cloud) network. There’s also the complexity of integrating a SaaS system with the rest of your system, but that’s the same for any other off-the-shelf software.
Because the SaaS provider decides on the compute option for you, you could argue that it doesn’t matter. But it can have an impact on both the user experience and potentially on your costs, depending on the payment model. So you should consider this when evaluating different options. If you can find a SaaS option that closely matches your requirements and has good integration options (such as a user-accessible REST API), it can often be a better decision than managing the infrastructure yourself.
Like SaaS, Serverless is a way to have someone else manage your compute infrastructure, which may be based on remote sessions, remote services or a mix of the two. But while a SaaS product manages everything from compute instances to the application, serverless frameworks only manage the instances, allowing you to define the application yourself. It’s “serverless” in the sense that you don’t have to think about the server, even though it’s there under the hood.
This will typically only be an option if you’re building your own custom software. Serverless infrastructure usually makes the most sense as a replacement for a remote service. This usually requires splitting up the service into small pieces – individual API calls – each of which gets deployed separately. There are frameworks that manage deployment as if it were a single service, but this tends to work best for simple applications, to save the overhead of deployment, monitoring, etc. For more complex services, the complexity of splitting your logic into small serverless pieces often outweighs this.
For each block in the grid defined by use case categories and pipeline stages, there are a number of different options for the Compute layer, each with its own pros and cons. For example you could have all compute run locally, along with the control layer. Or you could move some or all of the logic that isn’t directly necessary for the control layer into a remote session or remote service which you can manage yourself or delegate to a SaaS service or Serverless infrastructure. There are a number of factors that feed into all these decisions, and when making them it’s important to think about your organization’s overall software platform rather just the individual use case or pipeline stage.