AWS Serverless Architecture: A Practical Introduction

How serverless differs from the conventional model

In the conventional model you provision capacity, and once you provision it you own its idle time. The unit of thought is the server, or the container, or the cluster. It runs continuously, and you pay for the wall-clock hours it exists rather than the work it actually performs. A box that handles ten requests at 3 a.m. costs exactly what it costs handling four hundred at noon.

Serverless changes the unit. The thing you think about is no longer the host but the function: a small, single-purpose piece of code that runs in response to an event and then stops. On AWS, that compute primitive is Lambda, a service that executes your function on demand, allocates an execution environment for the duration of the work, and releases it the moment the work finishes. There is no host for you to patch, size, or keep warm by hand.

The name causes confusion, so define it plainly. Serverless does not mean there is no server; it means the server is no longer your unit of responsibility. The provider owns provisioning, scaling, and availability. What this does not mean is that the work disappears. Serverless shifts the work rather than deleting it. Capacity planning becomes concurrency planning, and server management becomes event-flow design. The skill does not vanish; it moves.

The cost argument

The structural difference is simple to state. Conventional billing charges you for provisioned time. Serverless billing charges you for consumed time. Idle capacity, the dead hours between traffic peaks, costs nothing because nothing is running.

Consider a service that handles a few thousand requests an hour with short bursts of activity and long quiet stretches. On an always-on instance, you pay for all twenty-four hours, including the many during which the CPU is close to asleep. On per-invocation billing, you pay for the milliseconds your function actually executes and nothing for the gaps. The idle hours simply fall off the bill. (Treat this as an illustrative shape, not a quoted figure; the real numbers depend entirely on your workload.)

The variable-load case is where this argument is strongest. Spiky, unpredictable, or low-baseline workloads are exactly the workloads that punish you for provisioning to a peak you hit rarely. They are also the workloads where per-invocation pricing rewards you most. The honest qualifier belongs right next to the claim: per-invocation pricing stays cheap until volume becomes both constant and high, and at that point a reserved instance or a container can cost less. Cost is a function of the shape of your load, not a verdict you can pronounce once and reuse everywhere.

Concurrency without an autoscaling group

Conventional systems scale in units of server. To absorb a fraction more load, you add a whole instance, then wait for it to boot, register, and warm up. Serverless scales in units of invocation. Each concurrent request gets its own execution environment, and the platform creates those environments as demand arrives.

Concurrency, here, means the number of function executions running at the same instant. When a hundred requests land together, the platform runs a hundred environments; when four arrive, it runs four. There is no autoscaling group to tune, no warm-up lag measured in minutes, and no scale-in policy to second-guess at midnight. Horizontal scaling is the default behaviour rather than a system you assemble.

That default has edges worth knowing before you rely on it. A new execution environment carries initialisation latency, the cold start, which is real and measurable and shows up at the tail of your latency distribution. Concurrency is not infinite either; there is a regional account limit, raisable on request but always finite, so unbounded scaling is a myth worth discarding early. And scaling the compute does not scale its dependencies. A function that fans out to a thousand concurrent executions will happily send a thousand concurrent connections at a database that was built for fifty.

Long-running workflows with Step Functions

A fair objection arrives at this point. Lambda functions have a maximum execution duration, so what happens to a process that legitimately takes hours, or one that must pause and wait for a human to click approve?

The answer is AWS Step Functions, a state machine service that orchestrates a sequence of steps and holds the state between them. Each individual step stays short and well within the Lambda duration limit, while the workflow as a whole can run for hours or days. You stop writing one long function and start describing a workflow as a diagram of states: tasks that do work, choices that branch, parallel paths that run together, waits that pause. An order-processing pipeline, a document-ingestion flow, a multi-stage approval all fit this shape. A step can wait hours for an external system or a human decision without consuming any compute while it waits, because waiting is a state, not a running process.

The power-grid analogy reaches its edge here, and it is worth admitting rather than stretching. Mains electricity has no concept of a process that pauses halfway and resumes later; Step Functions is precisely that concept. Orchestration is the part of serverless the utility metaphor cannot carry, so the metaphor stops at the meter and the state machine takes over from there.

Almost any AWS resource as a trigger

In a conventional application, the web server is the single front door. Every request that causes work enters through it. In a serverless system, the event source is the front door, and there are many doors at once.

A function can be triggered by an HTTP request arriving at API Gateway, by an object landing in an S3 bucket, by a row changing in a DynamoDB table, by a scheduled timer firing through EventBridge, or by a message appearing on a queue. The architectural consequence is that logic attaches directly to the event that should cause it. There is no polling loop asking “has anything happened yet,” no cron job running on a box someone has to remember exists, and no glue service whose entire purpose is to notice a change and pass it along. A file uploaded to storage is itself the trigger that resizes it, scans it, and sends it for indexing. The upload is the event, and the event is the cause.

Native queue consumer support

Consider what consuming a message queue costs in a conventional system. You run a worker process that polls the queue, manages its own concurrency, handles its own failures, and never stops running, which means you are back to paying for an always-on component whose job is mostly to wait.

Serverless inverts the arrangement. With Amazon SQS (Simple Queue Service) as the queue and Lambda as the consumer, the platform polls the queue for you and invokes your function with batches of messages as they arrive. The connection between the two is an event source mapping, a managed piece of plumbing you configure rather than build. You write the message handler. You do not write the consumer loop, the poller, or the process supervisor that keeps it alive.

What you get with that arrangement is concrete: batching of messages, automatic retries on failure, failed messages routed to a dead-letter queue for later inspection, and consumer concurrency that rises and falls with the depth of the queue. What you still own is the message semantics. Ordering guarantees, the handling of a partial batch failure, and duplicate delivery remain your design problems. The platform manages the plumbing; it does not manage the meaning of your messages.

Where it fits, and where it does not

Serverless earns a place in an architecture for specific reasons, and pretending it fits everywhere is the fastest way to lose an engineer’s trust. It is a strong choice for:

Spiky or unpredictable traffic, where provisioning to a rare peak wastes money every hour the peak is absent.
Event-driven processing, where work is naturally triggered by discrete events rather than a continuous request stream.
Glue between AWS services, where a small function reacting to a change is simpler than a standing integration service.
Low-baseline workloads, where idle cost dominates the bill and per-invocation pricing erases it.
Teams that want to spend no time on infrastructure, treating provisioning and scaling as someone else’s concern.

It is a weak choice, equally, for:

Steady, high-volume, latency-sensitive workloads, where always-on compute is both cheaper per request and faster at the tail.
Long, continuous computation, which fights the execution-duration model rather than fitting it.
Latency budgets with no tolerance for cold starts, where the initialisation penalty at the tail is unacceptable.
Workloads that depend on heavy local state or large in-memory caches, which a stateless, short-lived execution environment cannot hold between invocations.

The synthesis is unglamorous and correct. Serverless is a strong default for new event-driven systems and a poor default for steady-state, high-throughput services. Most real architectures end up as a mix, and the engineering judgment lies in knowing which workload belongs on which side of the line.

The complication nobody mentions first: observability

The conventional model gives you one host, one log file, and one process you can attach a debugger to. Serverless gives you many short-lived executions spread across an event-driven graph, and that distribution is the genuine operational cost of the model.

The specific difficulties are worth naming so they do not surprise you in production:

Distributed traces. A single user action may cross API Gateway, a function, a queue, and a second function. Reconstructing that path after the fact is not free and does not happen by itself.
Ephemeral execution. There is no box to log into. Once a function returns, its environment is gone, and so is anything you did not deliberately record.
Cold-start noise. Initialisation latency distorts your metrics, so an average response time hides the problem; you need percentiles to see the tail where cold starts live.
The cost of logs themselves. Centralised logging at high invocation volume has its own bill, and it is not a small one if you log carelessly.

The tooling answers exist. CloudWatch collects logs and metrics, AWS X-Ray reconstructs distributed traces across services, and structured logging with correlation IDs lets you follow one request through the graph. The point to carry away is that serverless moves observability from a default you inherit for free to a design decision you make on purpose. Plan for it at the start, because retrofitting it later is the harder path.

How LiteBreeze has used it

– Document ingestion pipeline
– AMPL implementation
– Data Sync – Archive data from RDS to ES
– Serverless Imposition

Closing

Going serverless is not a decision about whether your application can scale. A well-run autoscaling group scales too. The decision is about what you want to be responsible for, and what you are willing to keep paying for during the hours nobody is watching.

The conventional question was always whether the lights would stay on under load. Serverless does not retire that question so much as replace it with a sharper one: when the room is empty, are you still paying for the generator out back?

Your email address will not be published. Required fields are marked *