AWS Account Switching with Ansible

I recently worked on a project involving multiple AWS accounts, with different projects and environments spread through those accounts in different combinations. Having opted to use Ansible for driving deployments, I looked at built-in capabilities for account switching. It turns out you can easily inject credentials authenticating with another IAM user, but this can only be done on a per-task (or perhaps, per block?) level. This might seem flexible at first glance, but when you consider you have to duplicate tasks, and therefore roles, and even playbooks, when you have to use different accounts, it quickly becomes unwiedly. That’s not even considering the insane amount of boilerplate you get when forced to specify credentials for each and every task. Perhaps the biggest blocker is that Ansible has no support for assuming IAM roles, which is amplified by the fact that most of the core AWS modules still rely on boto2, which has patchy support for this at best, and won’t be improving any time in the future.

I spent some time digging in the boto2 and boto3 docs to find commonalities in authentication support, and eventually figured that I should be able to inject temporary credentials via environment variables. Thankfully even the Session Token issued with temporary credentials (such as when assuming a role) is barely supported in boto2, albeit with a different environment variable. Now I just needed a way to obtain the credentials, and set them before playbook execution.

My first pass was a wrapper script, making use of AWS CLI calls to STS and parsing out the required bits with jq. This worked, proving the concept, but lacked finesse and intelligence as you’d still need to purposely decide which role to assume before running a playbook.

What I really wanted was a way to automatically figure out which AWS account should be operated on, based on the project and or environment being managed. Since I already have a fairly consistent approach to writing playbooks, where the environment and project are almost always provided as extra vars, this should be easy!

I’ve previously made use of Ansible vars plugins; this is a very underdocumented feature of Ansible that whilst primarily designed for injecting group/host vars from alternative sources, actually provides a really flexible entrypoint into a running Ansible process in which you can do whatever you want. The outputs of a vars plugin are host variables, but with a little cheekiness you can manipulate the environment – which happens to be where Boto and Boto3 look for credentials!

Vars plugins, however cool, are just plugins. There are inputs and outputs, but those do not include a way to inspect existing variables (either global or per-host) from within the plugin itself. Personally I find this a major shortcoming in this particular plugin architecture, however since the required information is always passed as extra vars, I decided to manually parse the CLI arguments to extract them in the plugin and not relying on Ansible to do it.

Here’s how I went about it. So, starting in the vars_plugins directory (relative to playbooks), here is a skeleton plugin that runs but does not yet do anything useful.

from __future__ import (absolute_import, division, print_function)
__metaclass__ = type

DOCUMENTATION = '''
    vars: aws
    version_added: "2.5"
    short_description: Nothing useful yet
    description:
        - Is run by Ansible
        - Runs without error
        - Does nothing, returns nothing
    notes:
        - Nothing to note
'''

class VarsModule(BaseVarsPlugin):
    def __init__(self, *args):
        super(VarsModule, self).__init__(*args)

    def get_vars(self, loader, path, entities, cache=True):
        super(VarsModule, self).get_vars(loader, path, entities)
        return {}


# vim: set ft=python ts=4 sts=4 sw=4 et:

We can extend this to parse the CLI arguments with ArgParse, making sure to use parse_known_args() so that we don’t have to duplicate the entire set of Ansible arguments.

from __future__ import (absolute_import, division, print_function)
__metaclass__ = type

DOCUMENTATION = '''
    vars: aws
    version_added: "2.5"
    short_description: Nothing useful yet
    description:
        - Is run by Ansible
        - Runs without error
        - Does nothing, returns nothing
    notes:
        - Nothing to note
'''

import argparse

def parse_cli_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('-e', '--extra-vars', action='append')
    opts, unknown = parser.parse_known_args()
    args = dict()
    if opts.extra_vars:
        args['extra_vars'] = dict(e.split('=') for e in opts.extra_vars if '=' in e)
    return args


class VarsModule(BaseVarsPlugin):
    def __init__(self, *args):
        super(VarsModule, self).__init__(*args)
        cli_args = parse_cli_args()
        self.extra_vars = cli_args.get('extra_vars', dict())

    def get_vars(self, loader, path, entities, cache=True):
        super(VarsModule, self).get_vars(loader, path, entities)
        return {}


# vim: set ft=python ts=4 sts=4 sw=4 et:

Now we have made available any extra vars in dictionary form, making it easy to figure out which environment and project we’re working on. We’ll run playbooks like this:

ansible-playbook do-a-thing.yml -e env=staging -e project=canon

Next, we’ll build up a configuration to specify which account should be used for different projects/environments. In my situation, the makeup was complex due to some projects having all environments in a single account and some accounts having more than one project, so I needed to model this in a reusable manner. This is the structure I came up with. aws_profiles is a dictionary where the keys are names of AWS CLI/SDK profiles (as configured in ~/.aws), and the values are dictionaries of extra vars to match on.

---
aws_profiles:
  canon-staging:
    env:
      - stable
      - staging
    project: canon
  canon-production:
    env: production
    project: canon
  ops:
    env: ops

# vim: set ft=yaml ts=2 sts=2 sw=2 et:

Parsing this took a bit of thought, and some rubber ducking on zatech, but I eventually figured it out. This could probably be leaner but it balances well in my opinion. We store this configuration in vars_plugins/aws.yml, where the plugin can easily read it.

from __future__ import (absolute_import, division, print_function)
__metaclass__ = type

DOCUMENTATION = '''
    vars: aws
    version_added: "2.5"
    short_description: Nothing useful yet
    description:
        - Is run by Ansible
        - Runs without error
        - Does nothing, returns nothing
    notes:
        - Nothing to note
'''

import argparse
import os, re, yaml

try:
    import boto3
    import botocore.exceptions
    HAS_BOTO3 = True
except ImportError:
    HAS_BOTO3 = False


def parse_cli_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('-e', '--extra-vars', action='append')
    opts, unknown = parser.parse_known_args()
    args = dict()
    if opts.extra_vars:
        args['extra_vars'] = dict(e.split('=') for e in opts.extra_vars if '=' in e)
    return args


def load_config():
    ''' Test for configuration file and return configuration dictionary '''

    DIR = os.path.dirname(os.path.realpath(__file__))
    with open(os.path.join(DIR, 'aws.yml'), 'r') as stream:
        try:
            config = yaml.safe_load(stream)
            return config
        except yaml.YAMLError as e:
            raise AnsibleParserError('Failed to read aws.yml: {0}'.format(e))


class VarsModule(BaseVarsPlugin):
    def __init__(self, *args):
        super(VarsModule, self).__init__(*args)
        cli_args = parse_cli_args()
        self.extra_vars = cli_args.get('extra_vars', dict())
        self.config = load_config()
        self._connect_profiles()
        self._export_credentials()


    def _connect_profiles(self):
        for profile in self._profiles():
            self._init_session(profile)


    def _init_session(self, profile):
        if not hasattr(self, 'sessions'):
            self.sessions = dict()
        self.sessions[profile] = boto3.Session(profile_name=profile)


    def _credentials(self, profile):
        return self.sessions[profile].get_credentials().get_frozen_credentials()


    def _export_credentials(self):
        self.aws_profile = None
        profiles = self.config.get('aws_profiles', None)

        if isinstance(profiles, dict):
            profiles_list = profiles.keys()
        else:
            profiles_list = profiles

        credentials = {profile: self._credentials(profile) for profile in profiles_list}

        profile_override = os.environ.get('ANSIBLE_AWS_PROFILE')
        default_profile = None
        if profile_override:
            if profile_override in profiles:
                default_profile = profile_override
        elif isinstance(profiles, dict) and self.extra_vars:
            for profile, rules in profiles.iteritems():
                if isinstance(rules, dict):
                    rule_matches = {var: False for var in rules.keys()}
                    for var, vals in rules.iteritems():
                        if isinstance(vals, basestring):
                            vals = [vals]
                        if var in self.extra_vars and self.extra_vars[var] in vals:
                            rule_matches[var] = True
                    if all(m == True for m in rule_matches.values()):
                        default_profile = profile
                        break

        if default_profile:
            self.aws_profile = default_profile
            os.environ['AWS_ACCESS_KEY_ID'] = credentials[default_profile].access_key
            os.environ['AWS_SECRET_ACCESS_KEY'] = credentials[default_profile].secret_key
            os.environ['AWS_SECURITY_TOKEN'] = credentials[default_profile].token
            os.environ['AWS_SESSION_TOKEN'] = credentials[default_profile].token

        cleaner = re.compile('[^a-zA-Z0-9_]')
        for profile, creds in credentials.iteritems():
            profile_clean = cleaner.sub('_', profile).upper()
            os.environ['{}_AWS_ACCESS_KEY_ID'.format(profile_clean)] = creds.access_key
            os.environ['{}_AWS_SECRET_ACCESS_KEY'.format(profile_clean)] = creds.secret_key
            os.environ['{}_AWS_SECURITY_TOKEN'.format(profile_clean)] = creds.token
            os.environ['{}_AWS_SESSION_TOKEN'.format(profile_clean)] = creds.token


    def get_vars(self, loader, path, entities, cache=True):
        super(VarsModule, self).get_vars(loader, path, entities)
        return {}


# vim: set ft=python ts=4 sts=4 sw=4 et:

This got busy real quick, let’s break it down a little.

At line 54, we read the configuration file and store a config dictionary.

At line 55, we loop through each configured profile and instantiate a boto3 session for each one, saving the session objects as attributes on our module class.

At line 56, the magic happens. First we retrieve temporary credentials for each of the connected profiles (line 83) – this includes the usual secret access key plus a session key. From line 85 We check for an environment variable ANSIBLE_AWS_PROFILE, a play on the usual AWS_PROFILE, which allows us to override the account selection when invoking Ansible. Should this not be specified, from line 90 we iterate the profile specifications to determine if the specified extra vars match any profile. If they do, default_profile is populated and from line 103 we export the earlier acquired credentials using the usual AWS_* environment variables. From line 110, credentials for all profiles are exported with prefixed environment variable names to allow us to override them on a per-task basis.

This approach takes advantage of the fact that environment variables set here do propagate process-wide, and all Ansible modules running on the control host are able to see them, and will automatically use them to authenticate with AWS.

For specific tasks where you know you’ll always run that task for one specific account, you can reference the corresponding prefixed environment variables to specify credentials for the module. For example:

---

- hosts: localhost
  connection: local
  
  pre_tasks:
  
    - name: Validate extra vars
      assert:
        that:
          - env is defined
          - project is defined
          - name is defined
  
  tasks:
  
    - name: Launch EC2 instance
      ec2:
        assign_public_ip: yes
        group: external
        image: ami-aabbccde
        instance_tags:
          Name: "{{ name }}"
          env: "{{ env }}"
          project: "{{ project }}"
        instance_type: t2.medium
        keypair: ansible
        vpc_subnet_id: subnet-22334456
        wait: yes
      register: result_ec2

    - name: Create DNS record
      route53:
        aws_access_key: "{{ lookup('env', 'OPS_AWS_ACCESS_KEY_ID') | default(omit) }}"
        aws_secret_key: "{{ lookup('env', 'OPS_AWS_SECRET_ACCESS_KEY') | default(omit) }}"
        security_token: "{{ lookup('env', 'OPS_AWS_SECURITY_TOKEN') | default(omit) }}"
        command: create
        overwrite: yes
        record: "{{ name }}.example.net"
        value: "{{ item.public_ip }}"
        zone: example.net
      with_flattened:
        - "{{ result_ec2.results | map(attribute='instances') | list }}"
        - "{{ result_ec2.results | map(attribute='tagged_instances') | list }}"

# vim: set ft=ansible ts=2 sts=2 sw=2 et:

In this playbook, the ec2 task launches an instance in the account that was matched based on the env and project variables provided at runtime. The route53 task however, always creates a corresponding DNS record for the instance using the ops AWS profile.

Wrapping up, I added all of this functionality and more to my Ansible AWS Vars Plugin which you can grab from GitHub and use/modify as much as you find it useful.