Introduction to HashiCorp Nomad

Whilst looking for a platform to host some small projects, I recently took HashiCorp Nomad for a spin in my home lab. This is a brief run-through of my setup and a guide on getting started with a small but flexible Nomad installation. To be clear, this is not an immediately scalable, production ready setup - if there is such a thing - but may help lay the foundation for something that works for your project. In the interest of full disclosure, at the time of writing this article, I am employed by HashiCorp in a team working on a different product, however I received no direction during my experimentation nor on the content of this writeup.

Nomad is “a simple and flexible workload orchestrator” that is capable of running not just containers, but pretty much any workload you can build. Whilst related solutions like Kubernetes focus on containerized application hosting, Nomad has a broader reach and this might be a great selling point if your environment is more heterogenous, or if you are looking for something more lightweight and with a gradual learning curve. With that said, let’s take a closer look at what Nomad actually looks like.

Written in Go, Nomad is distributed as a single binary for multiple platforms. Official builds are provided for Windows, Linux and macOS, so you can get started on your own workstation right away. Here I’ve focused on running Nomad on Linux, with a basic configuration to get you to a fully functioning single node cluster. Nomad works hand-in-hand with HashiCorp Consul, a platform independent networking solution with built-in service discovery, although we’ll only touch on the latter featureset. In stark contrast to competing solutions, it is ridiculously simple to get started with Nomad.


Installing on Ubuntu

We can install Nomad from the Linux repository provided by HashiCorp. Since we’ll be deploying a containerized application, we will also install Docker at the same time.

Make sure we have prerequisite tools:

# apt install apt-transport-https ca-certificates curl

Add the following to your apt sources, substituting the arch and release to match your host. See this post for supported platforms.

deb [arch=amd64] https://apt.releases.hashicorp.com focal main
deb [arch=amd64] https://download.docker.com/linux/ubuntu focal stable

Also trust the gpg signing keys:

curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

Install Nomad and Docker CE.

apt update
apt install nomad docker-ce docker-ce-cli containerd.io

At this point you should be able to start Nomad by running nomad agent -config /etc/nomad.d and access the web UI at http://localhost:4646/ui/. Right now, you can submit jobs to your Nomad instance and they will run. Hit ctrl-c to terminate the process. That was easy!

The Nomad web UI

Running as a service

Assuming you are using systemd, configuring Nomad to run at boot is straightforward. Place the following unit file at /etc/systemd/system/nomad.service and run systemctl daemon-reload to pick it up.

[Unit]
Description="HashiCorp Nomad"
Documentation=https://nomadproject.io/docs/
Wants=network-online.target
After=network-online.target

[Service]
ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/usr/bin/nomad agent -config /etc/nomad.d
KillMode=process
KillSignal=SIGINT
LimitNOFILE=infinity
LimitNPROC=infinity
Restart=on-failure
RestartSec=2
StartLimitBurst=3
StartLimitIntervalSec=10
TasksMax=infinity

[Install]
WantedBy=multi-user.target

To run Nomad right away, use systemctl start nomad and to enable it at boot-time, run systemctl enable nomad.

Configuration

So far we haven’t actually configured anything, since the Debian package ships with a default configuration suitable for a single agent deployment. We’ll add some options to personalize our setup. The default configuration is at /etc/nomad.d/nomad.hcl and you can customize this to suit your needs.

datacenter = "lab1"
data_dir   = "/opt/nomad/data"
bind_addr  = "0.0.0.0"

server {
  enabled          = true
  bootstrap_expect = 1
}

client {
  enabled = true
  servers = ["127.0.0.1:4646"]

  host_volume "certs" {
    path      = "/etc/letsencrypt/live"
    read_only = true
  }

  host_volume "registry" {
    path      = "/opt/registry/data"
    read_only = false
  }
}

We’ve specified a datacenter name, which you’ll use in your jobs to describe where they should run, and we’ve added two host volumes - one for injecting our TLS certificates and the other for storing image layers for the container registry service we’ll be deploying.

Make sure to pre-create these directories before restarting Nomad to apply this configuration, as it will fail to start if any of them are not found, then use systemctl restart nomad.

Consul

We are going to use Consul to provide service publishing and discovery services for our Nomad agent. Assuming the apt repository is configured as above, just install it.

apt install consul

Unlike Nomad, Consul doesn’t ship with a working server configuration and requires a small amount of configuration to get going with a simple single-agent setup. Place the following in /etc/consul.d/consul.hcl.

datacenter  = "lab1"
data_dir    = "/opt/consul"
client_addr = "0.0.0.0"
ui          = true
server      = true
encrypt     = "..."

bootstrap_expect = 1

For the value of encrypt, you should provide a base64-encoded 32-byte random key, which you can generate using openssl.

openssl rand -base64 32

Let’s try running Consul to verify the configuration. When starting Consul, if more than one network is detected, you are required to specify which one it should bind to. Since we installed Docker earlier, your system will now have multiple networks configured. Determine your local network address and pass it as the -bind argument.

sudo -H -u consul consul agent -bind 1.2.3.4 -config-dir=/etc/consul.d

At this point, you should be able to access the Consul UI at http://127.0.0.1:8500/ui/.

Since Nomad and Consul are running on the same host, Nomad should begin communicating with Consul right away. You can verify this by looking for the nomad and nomad-client service registrations in the Consul UI.

Consul Services

We’ll configure systemd to also run Consul at boot. Hit ctrl-c to quit the Consul process, and then save the following systemd unit file at /etc/systemd/system/consul.service.

[Unit]
Description="HashiCorp Consul"
Documentation=https://www.consul.io/
Requires=network-online.target
After=network-online.target

[Service]
Type=notify
User=consul
Group=consul
ExecStart=/usr/bin/consul agent -bind 1.2.3.4 -config-dir=/etc/consul.d/
ExecReload=/bin/kill --signal HUP $MAINPID
KillMode=process
KillSignal=SIGTERM
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

Be sure to substitute your network address for the -bind argument. Also, unless you ran Consul earlier as the consul user, you should reset the file permissions in the Consul data directory (chown -R consul:consul /opt/consul).

Run systemctl daemon-reload, systemctl enable consul and systemctl start consul to start Consul and run it at boot time.

Load balancing

Before deploying any services, we are going to need a load balancer to predictably expose our services to the network and perform TLS offloading. Note that whilst it’s perfectly possible to expose static ports which route directly to deployed services, using a load balancer will tidy things up nicely and means that we don’t have to manually assign (and remember) port numbers for anything we choose to deploy.

Unlike say, Kubernetes, Nomad doesn’t expose primitives specifically for load balancing. This makes our setup really simple; all we need is to deploy a suitable HTTP server as a service. We are able to configure it to automagically discover the services we deploy.

We are going to use HAProxy. Whilst it doesn’t have fully native support for service discovery via Consul, we can use some templating magic to build and maintain our configuration file. It needs to be told about each service, but it can inspect DNS SRV lookups to decide how to route requests.

Create the following job file and save it as haproxy.nomad. It’s quite long but we’ll break it down.

job "haproxy" {
  datacenters = ["lab1"]
  type        = "service"

  group "haproxy" {
    count = 1

    network {
      port "http" {
        static = 80
      }

      port "https" {
        static = 443
      }

      port "haproxy_ui" {
        static = 1936
      }
    }

    service {
      name = "haproxy"
      tags = [
        "http",
      ]

      check {
        name     = "alive"
        type     = "tcp"
        port     = "http"
        interval = "10s"
        timeout  = "2s"
      }

      check {
        name     = "alive"
        type     = "tcp"
        port     = "https"
        interval = "10s"
        timeout  = "5s"
      }
    }

    task "haproxy" {
      driver = "docker"

      config {
        image        = "haproxy:2.0"
        network_mode = "host"

        volumes = [
          "local/haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg",
        ]
      }

      template {
        destination = "local/haproxy.cfg"
        change_mode = "restart"
        data        = <<EOF
global
   tune.ssl.default-dh-param 2048

defaults
   mode http
   timeout connect 5000
   timeout check 5000
   timeout client 30000
   timeout server 30000

frontend stats
   bind *:1936
   stats uri /
   stats show-legends
   no log

frontend http_front
   bind *:{{ env "NOMAD_PORT_http" }}
   bind *:{{ env "NOMAD_PORT_https" }} ssl crt /local/tls/example.net/combined.pem
   http-request set-header X-Forwarded-Proto https if { ssl_fc }
   http-request redirect scheme https unless { ssl_fc }
{{ range $tag, $services := services | byTag }}{{ if eq $tag "proxy" }}{{ range $service := $services }}{{ if ne .Name "haproxy" }}
   acl host_{{ .Name }} hdr(host) -i {{ .Name }}.example.net
   use_backend {{ .Name }} if host_{{ .Name }}
{{ end }}{{ end }}{{ end }}{{ end }}

{{ range $tag, $services := services | byTag }}{{ if eq $tag "proxy" }}{{ range $service := $services }}{{ if ne .Name "haproxy" }}
backend {{ .Name }}
    balance roundrobin
    server-template {{ .Name }} 10 _{{ .Name }}._tcp.service.consul resolvers consul resolve-opts allow-dup-ip resolve-prefer ipv4 check
{{ end }}{{ end }}{{ end }}{{ end }}

resolvers consul
   nameserver consul 127.0.0.1:8600
   accepted_payload_size 8192
   hold valid 5s

EOF
      }

      resources {
        cpu    = 200
        memory = 128
      }

      volume_mount {
        volume      = "certs"
        destination = "/local/tls"
        read_only   = true
      }
    }

    volume "certs" {
      type      = "host"
      read_only = true
      source    = "certs"
    }
  }
}

It should hopefully be clear what we’re trying to accomplish with the network {} block. We are exposing 3 static ports - 80 and 443 for general traffic, and 1936 for viewing HAProxy stats. Port labels are arbitrary so you can customize these, just be sure to reference the same port name elsewhere in your job spec.

The service {} block is where Consul comes in. This will register a service called haproxy with Consul, along with two health checks. Consul evaluates the health checks to detemine whether our haproxy service is working, although we haven’t yet defined any actions for it to take in the event that it is not.

The task {} block is the core of our job. It tells Nomad how to run our job and what resources it should allocate. In the config {} block, we’re using the haproxy:2.0 image (unqualified image references are pulled from Docker Hub), and we have mapped our configuration file. The resources {} block should also be self explanatory - cpu is measured in MHz and memory in MB.

The volume {} block and it’s relative volume_mount {} are used here for mounting the directory we configured earlier in Nomad’s main configuration file for injecting certificates for TLS offloading. You can acquire a wildcard certificate from Let’s Encrypt or you can supply a self signed certificate. Importantly, HAProxy requires a “combined” file consisting of the RSA private key, the issued certificate, and any intermediary certificates, concatenated in that order. More on that below.

HAProxy configuration template

The template {} block in the above configuration is evaluated by Nomad when creating the allocation for our task, and is automagically updated during the lifetime of the job. The change_mode attribute tells Nomad to restart the task in the event the resulting configuration changes, which is useful because we want our load balancer to detect newly launched services.

Inside the template, we use some useful patterns to help standardize our configuration. Of note is the NOMAD_PORT_x set of environment variables, where x refers to the port name configured in the network {} block. This ensures that if the port changes, so do the contents of the corresponding environment variable, which are then injected into our HAProxy config file, and it helps prevent drift and misconfiguration. This is especially useful when not using static ports, as we’ll discover later when we deploy additional services.

The range and if constructs inside the template evaluate the list of services made available by Consul, and we use this to express the frontend and backend configuration for HAProxy. Note that we have made an assumption here about DNS conventions, specifically that each service will be registered in DNS in the form servicename.example.net, and this convention will be used for routing requests based on the HTTP Host header. Registering these DNS names is out of scope here, but if you’re testing locally you can simply add these to your local hosts file as you go along.

Note the if eq $tag "proxy" condition inside each of these template blocks. This ensure that we only configure HAProxy for services registered with the proxy tag, giving us fine grained control over which services we want to load balance. Without this, or a similar condition, this template would render configuration for every service in Consul, including Nomad itself, with lacklustre results.

Submitting the job

With the above job spec saved in a file haproxy.nomad, submitting the job to Nomad is a single command.

nomad job run haproxy.nomad

You should see something similar to:

==> Monitoring evaluation "f677b146"
    Evaluation triggered by job "haproxy"
==> Monitoring evaluation "f677b146"
    Evaluation within deployment: "34b4f287"
    Allocation "94f580bf" created: node "377c1f84", group "haproxy"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "f677b146" finished with status "complete"

and if you look in the Nomad web UI, you should see your job now running. Click on the job to inspect it - you can see the current allocation which should have a status of “running” and if you click on the allocation, you should see live graphs of the resource utilizations, and a list of the assigned ports.

Allocation for haproxy

You should also see the corresponding service for this job in the Consul UI.

So far so good, but we really want to run some services! You can go ahead and deploy any job you wish, but if you want a completely self-contained setup let’s first deploy a container registry.

Container Registry

As we will be deploying containerized applications to our single-node cluster, we are going to need somewhere to store container images. You could use Docker Hub, GitHub, or one of the many cloud registries, but we’ll stay local and install our own registry service. Docker publishes a registry application that works great for our purposes, so we’ll use that. We are going to keep things simple and install it from Docker Hub, just as we did with HAProxy.

Create the following job configuration and save it as registry.nomad.

job "registry" {
  datacenters = ["lab1"]
  type        = "service"

  group "registry" {
    count = 1

    network {
      port "http" {
        to = 5000
      }
    }

    service {
      name = "registry"
      port = "http"
      tags = [
        "http",
        "proxy",
      ]

      check {
        type     = "http"
        path     = "/"
        interval = "10s"
        timeout  = "2s"
      }
    }

    task "registry" {
      driver = "docker"

      config {
        image = "registry:2"
        ports = ["http"]
      }

      resources {
        cpu    = 500
        memory = 512
      }

      volume_mount {
        volume      = "registry"
        destination = "/var/lib/registry"
        read_only   = false
      }
    }

    volume "registry" {
      type      = "host"
      read_only = false
      source    = "registry"
    }
  }
}

Compared to our HAProxy job spec, this is simpler, and uses some of the same patterns. We have a single volume mount for storing image layers, which helps with persistence in the event our task is stopped or restarted. Our port definition is different - here we are not using a static port allocation, but we are routing it to a static port inside the container for convenience. We also register a service with Consul, and this time we are adding the proxy tag so that the registry service is itself exposed by our load balancer. Although we only have a single instance, this gives us ready-made TLS and a predictable hostname we can address.

Run this job using:

nomad job run registry.nomad

Just as with HAProxy, the allocation ID should be returned and you will see the job running in Nomad, and the service registration in Consul. The service registration should have the specified tags http and proxy. Inspecting the HAProxy configuration should now show the newly launched registry service configured with a backend and frontend. To view the configuration, first find the current allocation ID for the haproxy task and you can locate the generated file in the Nomad data directory.

cat /opt/nomad/data/alloc/e12c507a-7a3d-d140-1705-e38fa1355c1b/haproxy/local/haproxy.cfg

It should look something like this:

...

frontend http_front
   bind *:80
   bind *:443 ssl crt /local/tls/example.net/combined.pem
   http-request set-header X-Forwarded-Proto https if { ssl_fc }
   http-request redirect scheme https unless { ssl_fc }

   acl host_registry hdr(host) -i registry.example.net
   use_backend registry if host_registry


backend registry
    balance roundrobin
    server-template registry 10 _registry._tcp.service.consul resolvers consul resolve-opts allow-dup-ip resolve-prefer ipv4 check

...

At this point, you’ll want to either create a DNS record or add an entry to your hosts file. Go ahead and test it by hitting it with curl.

curl -ik -H 'Host: registry.example.net' https://localhost/

If you get a 200 OK response, your registry and load balancer are working!

Deploying a service

Now, we have everything we need to build and deploy our own services. You can pick any application you like; you may even have a project ready and waiting. However, to keep things simple, we can deploy an HTTP echo service provided by HashiCorp. It’s written in Go, and it’s trivial to compile in a container. We’ll clone the repository, build a container image, push it to our registry, and deploy it as a job with Nomad!

Clone the github.com/hashicorp/http-echo repository and save the following Dockerfile to the same directory.

FROM golang:1.15
RUN mkdir /app
ADD . /app
WORKDIR /app
RUN test -f go.mod || go mod init github.com/hashicorp/http-echo
RUN go build -o http-echo .

FROM alpine:latest
RUN apk --no-cache add libc6-compat
COPY --from=0 /app/http-echo /bin/http-echo
ENTRYPOINT ["/bin/http-echo"]

This is a two-stage container build, using the golang image to compile the application, and the alpine image in the final build. Build the container, tag it and push it to your registry.

docker build . -t http-echo
docker tag http-echo registry.example.net/http-echo
docker push registry.example.net/http-echo

All being well, you should see the following as your image is sent to the registry service.

Using default tag: latest
The push refers to repository [registry.example.net/http-echo]
a1f80ad402a6: Pushed
a9d9a72b590d: Pushed
1119ff37d4a9: Pushed
latest: digest: sha256:f8cfedcd96d4403d097927ea2acb5eeff229d856d4ad50711845e710bdacd01a size: 947

The container image is now saved in your registry and you can deploy a job to Nomad. Use the following job spec, saved in echo.nomad.

job "echo" {
  datacenters = ["lab1"]
  type        = "service"

  group "echo" {
    count = 1

    network {
      port "http" {}
    }

    service {
      name = "echo"
      port = "http"
      tags = [
        "http",
        "proxy",
      ]

      check {
        type     = "http"
        path     = "/"
        interval = "10s"
        timeout  = "2s"
      }
    }

    task "server" {
      driver = "docker"

      config {
        image = "registry.example.net/http-echo:latest"
        args  = [
          "-listen", ":${NOMAD_PORT_http}",
          "-text", "Echo service listening on port ${NOMAD_PORT_http}",
        ]
        ports = ["http"]
      }

      resources {
        cpu    = 100
        memory = 64
      }
    }
  }
}

Now submit the job to Nomad.

nomad job run echo.nomad

Assuming the job was accepted, and the task was allocated, you’ll see it appear in the Nomad UI. Go ahead and create a DNS record for this service now, or add it to your hosts file.

In the Nomad UI, click into the job, and again into the allocation, to see the assigned ports. You’ll notice that since we didn’t specify a static port, a random available port was assigned by Nomad and populated in the NOMAD_PORT_http environment variable, which we used in our task block to configure the service. This port was also used when publishing the service to Consul.

Task port allocation

Meanwhile, the HAProxy config template consumed the service registration details from Consul and configured our load balancer with the correct port for this service. This workflow obviates the responsibility of tracking port numbers for services, so we can simply deploy new services and almost immediately access them via our load balancer using a friendly hostname. Try it!

curl -i https://echo.example.net/
HTTP/1.1 200 OK
x-app-name:
x-app-version: 0.2.4
date: Sun, 07 Feb 2021 21:18:48 GMT
content-length: 37
content-type: text/plain; charset=utf-8

Echo service listening on port 31192

Our echo service is running and load balanced. We can deploy as many services as we have resources for, and enjoy the transparent orchestration provided by Nomad.

Nomad isn’t just for web services, it’s capable of running nearly any workload you care to throw at it, from scheduled jobs (there’s a job scheduler type for that) to JVM applications, and even KVM virtual machines. Despite being lightweight, the combination of Nomad and Consul is extremely powerful, and we have only looked at a small set of features. I strongly suggest that if you choose to take this setup to production, that you consider important measures such as adding more agents, using different nodes for the clients and server, access control using ACLs, firewall and network security considerations (e.g. using a backplane network for cluster communications), shipping your logs, and monitoring & alerting.


References

Whilst learning about Nomad, I found these great resources for which I am thankful and recommend you check out.