When you make a deployment, by running now with our CLI, making an API call or with either our GitHub or GitLab integrations, we synchronize the source code and optionally build them to your specification.

The outputs of this build process can be static files or serverless Lambdas. Static files will be served directly from our edge caches, while Lambdas contain code that gets executed dynamically and on-demand.

Lambdas allow your code to be executed on-demand and scale automatically. They minimize cost in terms of infrastructure spend and engineering time, due to largely removing operational overhead and reducing the surface of error.

ZEIT Now automatically provisions Lambdas with the underlying cloud infrastructure provider for dynamic code execution (such as deployments that expose APIs), to run static site generators, or to compile code in the cloud.

Example

If you have these files in your deployment's directory:

  • package.json
  • static/logo.svg
  • pages/index.js
  • api/date.js

With pages/index.js being a Next.js page without any need for server-side rendering, it will be statically exported automatically.

In this case, we will want our Next.js page and static/logo.svg to be served statically from our CDN.

Note: If a Next.js page includes server-side code, it will be built as Lambdas (per-page).

The api/date.js API endpoint (using Node.js) will be invoked dynamically as it is requested.

The API file needs only to export a function:

module.exports = (req, res) => {
  res.send(new Date())
}

An example /api/date.js serverless function.

When you deploy an application using a similar structure, with static files and dynamic functions, the build processes happens in parallel and all yield independent outputs which the deployment then serves.

Within a Directory Listing or source view, built Lambdas (represented by the icon λ) show up as "files" alongside static ones.

When a request is received by a Lambda, the code package, deployed to the specified regions, is executed and its response returned.

It is helpful to think of the generated Lambdas as "lightweight binaries" that get executed (or typically called "invoked") on-demand.

With ZEIT Now, applications that are typically built as servers and containers can seamlessly be compiled into serverless functions under the hood without any code changes.

Conceptual Model

Lambdas as Threads

Cloud infrastructure providers typically give us access to several layers of abstraction that map to the traditional counterparts:

  • Set of Computers → Cluster
  • Computer → VM Instance (e.g., EC2)
  • Process → Container
  • Thread → Function

The ZEIT Now Platform focuses on the last one: the serverless function. If we think about it in terms of primitives, it closely mirrors a thread of computation.

We have typically been able to provision entire computers or orchestrate processes, but a faster, more granular and more parallelizable primitive like the thread has only recently become available.

Versus Processes and Containers

Lambdas have a lot of benefits over booting up and managing containers or processes, let alone computers or clusters.

One of the main benefits is that the developer only needs to reason about one concurrent computation happening in the context of their code, despite many of them being able to run in parallel.

In the example above, api/date.js handles an incoming request, runs some computation, and responds. If another request comes to the same path, the system will automatically spawn a new isolated function, thus scaling automatically.

In contrast, processes and containers tend to expose an entire server as their entrypoint. That server would then define code for handling many types of requests.

The code you specify for handling those requests would share one common context and state. Scaling becomes harder because it is hard to decide how many concurrent requests the process can handle.

Furthermore, if a systemic error occurs when handling an incoming request or event, the entire process or container would be affected, and with it many other requests being handled inside it.

Per-Entrypoint Granularity

ZEIT Now makes it trivial to build one serverless function per potential entrypoint of your application. If you are familiar with frontend development, you can think of this as code-splitting for the backend.

In the example above, the entrypoint we specified results in a single packaged function that gets uploaded and deployed with the cloud infrastructure provider.

This has numerous benefits:

  • Ensures smaller package sizes which in turn make cold bootup times very fast
  • Makes builds faster by analyzing and re-building only changes
  • Improves security by increasing isolation

Lifecycle and Scalability

In order to scale, it is widely accepted and understood that more copies of your software need to spawn and the incoming load of traffic balanced between them. This is known as horizontal scalability.

With traditional process-based architectures, scaling like this is possible but has significant drawbacks and challenges:

  • Auto-scaling algorithms are based on high-level metrics like CPU or Memory usage. They might scale too eagerly or not quickly enough, therefore not serving some portion of the traffic in an acceptable amount of time
  • The developer loses the ability to understand and predict exactly how their software will scale since the metrics as mentioned above are very dynamic
  • The developer might have to spend a lot of time and resources carefully running simulations, setting up metric dashboards, ensuring the algorithms are fine-tuned

Typically, when you spawn a server, worker, container, or process, the requests end up all combined into one process, spawned across several cores and computers

A typical request model.

With the ZEIT Now Lambda architecture, the mental model is much simpler. For each incoming request, a new invocation happens.

The Now Lambda request model.

If a request happens quickly thereafter, the Lambda is re-used for a subsequent invocation. After a while, only enough Lambdas will stay around to meet the needs of incoming traffic.

If no more traffic comes, Lambdas (unlike nearly all alternative systems) will scale to zero. This means that you only pay precisely for the level of scale that your traffic demands.

Cold and Hot Boots

When a Lambda boots up from scratch, that is known as a cold boot. When it is re-used, we can say the Lambda was warm.

Re-using a Lambda means the underlying container that hosts it does not get discarded. State gets preserved (such as temporary files, memory caches, sub-processes). This empowers the developer not just to minimize the time spent in the booting process, but to also take advantage of caching data (in memory or /tmp filesystem) and memoizing expensive computations.

It is important to note that Lambdas, even while the underlying container is hot, cannot leave tasks running. If a sub-process is running by the time the response is returned, the entire container is frozen. When a new invocation happens, if the container is re-used, it is unfrozen, which allows sub-processes to continue running.

To minimize the latency difference between a cold and boot instantiation, ZEIT Now aids by:

  • Designing a build system that has a per-entrypoint granularity. This means that when a user hits a particular endpoint, only the code associated to handle that request has to be fetched and booted.
  • Imposing a limit on max bundle size for each Lambda generated by your deployment's files and file types. Cloud providers allow for a variety of different sized functions, but we have picked one that is aligned with making cold fast instantiation nearly instant for user-facing workloads (such as serving HTTP traffic).

Capabilities and Limits

CPU / Memory

ZEIT Now configures Lambdas to have 3008mb of available memory for builds (those triggered by static site generators or Builds) and output Lambdas. The amount of CPU allocated is proportional to the max memory limit.

To understand how the CPU allocation will work for your application, we recommend testing and benchmarking.

Maximum Request Body Size

Each request body has a maximum payload limit of 5mb.

Maximum Bundle Size

The maximum Lambda size is 50mb. If a deployment build tries to create a larger Lambda as an output, the build will fail and therefore so will your deployment.

Maximum Execution Duration

A Lambda can execute for at most 5 minutes. This is not configurable as of now. If the limit is reached for a particular invocation, it will result in an error 504.

In-memory Caches

It is possible to take advantage of global state from invocation to invocation if the underlying container is re-used.

Filesystem

The filesystem is writable inside /tmp and persisted if the underlying container is re-used. The /tmp directory is writable for up to 512mb of storage.

Logging

All output streams are captured and persisted. Each function can store up to 1,000 lines. Each deployment will aggregate up to 1,000 lines. Each line is defined as a string chunk of up to 1kb.