• AWS Account Switching with Ansible

    I recently worked on a project involving multiple AWS accounts, with different projects and environments spread through those accounts in different combinations. Having opted to use Ansible for driving deployments, I looked at built-in capabilities for account switching. It turns out you can easily inject credentials authenticating with another IAM user, but this can only be done on a per-task (or perhaps, per block?) level. This might seem flexible at first glance, but when you consider you have to duplicate tasks, and therefore roles, and even playbooks, when you have to use different accounts, it quickly becomes unwiedly. That’s not even considering the insane amount of boilerplate you get when forced to specify credentials for each and every task. Perhaps the biggest blocker is that Ansible has no support for assuming IAM roles, which is amplified by the fact that most of the core AWS modules still rely on boto2, which has patchy support for this at best, and won’t be improving any time in the future.

    I spent some time digging in the boto2 and boto3 docs to find commonalities in authentication support, and eventually figured that I should be able to inject temporary credentials via environment variables. Thankfully even the Session Token issued with temporary credentials (such as when assuming a role) is barely supported in boto2, albeit with a different environment variable. Now I just needed a way to obtain the credentials, and set them before playbook execution.

    My first pass was a wrapper script, making use of AWS CLI calls to STS and parsing out the required bits with jq. This worked, proving the concept, but lacked finesse and intelligence as you’d still need to purposely decide which role to assume before running a playbook.

    What I really wanted was a way to automatically figure out which AWS account should be operated on, based on the project and or environment being managed. Since I already have a fairly consistent approach to writing playbooks, where the environment and project are almost always provided as extra vars, this should be easy!

    I’ve previously made use of Ansible vars plugins; this is a very underdocumented feature of Ansible that whilst primarily designed for injecting group/host vars from alternative sources, actually provides a really flexible entrypoint into a running Ansible process in which you can do whatever you want. The outputs of a vars plugin are host variables, but with a little cheekiness you can manipulate the environment – which happens to be where Boto and Boto3 look for credentials!

    Vars plugins, however cool, are just plugins. There are inputs and outputs, but those do not include a way to inspect existing variables (either global or per-host) from within the plugin itself. Personally I find this a major shortcoming in this particular plugin architecture, however since the required information is always passed as extra vars, I decided to manually parse the CLI arguments to extract them in the plugin and not relying on Ansible to do it.

    Here’s how I went about it. So, starting in the vars_plugins directory (relative to playbooks), here is a skeleton plugin that runs but does not yet do anything useful.

    from __future__ import (absolute_import, division, print_function)
    __metaclass__ = type
    
    DOCUMENTATION = '''
        vars: aws
        version_added: "2.5"
        short_description: Nothing useful yet
        description:
            - Is run by Ansible
            - Runs without error
            - Does nothing, returns nothing
        notes:
            - Nothing to note
    '''
    
    class VarsModule(BaseVarsPlugin):
        def __init__(self, *args):
            super(VarsModule, self).__init__(*args)
    
        def get_vars(self, loader, path, entities, cache=True):
            super(VarsModule, self).get_vars(loader, path, entities)
            return {}
    
    
    # vim: set ft=python ts=4 sts=4 sw=4 et:

    We can extend this to parse the CLI arguments with ArgParse, making sure to use parse_known_args() so that we don’t have to duplicate the entire set of Ansible arguments.

    from __future__ import (absolute_import, division, print_function)
    __metaclass__ = type
    
    DOCUMENTATION = '''
        vars: aws
        version_added: "2.5"
        short_description: Nothing useful yet
        description:
            - Is run by Ansible
            - Runs without error
            - Does nothing, returns nothing
        notes:
            - Nothing to note
    '''
    
    import argparse
    
    def parse_cli_args():
        parser = argparse.ArgumentParser()
        parser.add_argument('-e', '--extra-vars', action='append')
        opts, unknown = parser.parse_known_args()
        args = dict()
        if opts.extra_vars:
            args['extra_vars'] = dict(e.split('=') for e in opts.extra_vars if '=' in e)
        return args
    
    
    class VarsModule(BaseVarsPlugin):
        def __init__(self, *args):
            super(VarsModule, self).__init__(*args)
            cli_args = parse_cli_args()
            self.extra_vars = cli_args.get('extra_vars', dict())
    
        def get_vars(self, loader, path, entities, cache=True):
            super(VarsModule, self).get_vars(loader, path, entities)
            return {}
    
    
    # vim: set ft=python ts=4 sts=4 sw=4 et:

    Now we have made available any extra vars in dictionary form, making it easy to figure out which environment and project we’re working on. We’ll run playbooks like this:

    ansible-playbook do-a-thing.yml -e env=staging -e project=canon

    Next, we’ll build up a configuration to specify which account should be used for different projects/environments. In my situation, the makeup was complex due to some projects having all environments in a single account and some accounts having more than one project, so I needed to model this in a reusable manner. This is the structure I came up with. aws_profiles is a dictionary where the keys are names of AWS CLI/SDK profiles (as configured in ~/.aws), and the values are dictionaries of extra vars to match on.

    ---
    aws_profiles:
      canon-staging:
        env:
          - stable
          - staging
        project: canon
      canon-production:
        env: production
        project: canon
      ops:
        env: ops
    
    # vim: set ft=yaml ts=2 sts=2 sw=2 et:

    Parsing this took a bit of thought, and some rubber ducking on zatech, but I eventually figured it out. This could probably be leaner but it balances well in my opinion. We store this configuration in vars_plugins/aws.yml, where the plugin can easily read it.

    from __future__ import (absolute_import, division, print_function)
    __metaclass__ = type
    
    DOCUMENTATION = '''
        vars: aws
        version_added: "2.5"
        short_description: Nothing useful yet
        description:
            - Is run by Ansible
            - Runs without error
            - Does nothing, returns nothing
        notes:
            - Nothing to note
    '''
    
    import argparse
    import os, re, yaml
    
    try:
        import boto3
        import botocore.exceptions
        HAS_BOTO3 = True
    except ImportError:
        HAS_BOTO3 = False
    
    
    def parse_cli_args():
        parser = argparse.ArgumentParser()
        parser.add_argument('-e', '--extra-vars', action='append')
        opts, unknown = parser.parse_known_args()
        args = dict()
        if opts.extra_vars:
            args['extra_vars'] = dict(e.split('=') for e in opts.extra_vars if '=' in e)
        return args
    
    
    def load_config():
        ''' Test for configuration file and return configuration dictionary '''
    
        DIR = os.path.dirname(os.path.realpath(__file__))
        with open(os.path.join(DIR, 'aws.yml'), 'r') as stream:
            try:
                config = yaml.safe_load(stream)
                return config
            except yaml.YAMLError as e:
                raise AnsibleParserError('Failed to read aws.yml: {0}'.format(e))
    
    
    class VarsModule(BaseVarsPlugin):
        def __init__(self, *args):
            super(VarsModule, self).__init__(*args)
            cli_args = parse_cli_args()
            self.extra_vars = cli_args.get('extra_vars', dict())
            self.config = load_config()
            self._connect_profiles()
            self._export_credentials()
    
    
        def _connect_profiles(self):
            for profile in self._profiles():
                self._init_session(profile)
    
    
        def _init_session(self, profile):
            if not hasattr(self, 'sessions'):
                self.sessions = dict()
            self.sessions[profile] = boto3.Session(profile_name=profile)
    
    
        def _credentials(self, profile):
            return self.sessions[profile].get_credentials().get_frozen_credentials()
    
    
        def _export_credentials(self):
            self.aws_profile = None
            profiles = self.config.get('aws_profiles', None)
    
            if isinstance(profiles, dict):
                profiles_list = profiles.keys()
            else:
                profiles_list = profiles
    
            credentials = {profile: self._credentials(profile) for profile in profiles_list}
    
            profile_override = os.environ.get('ANSIBLE_AWS_PROFILE')
            default_profile = None
            if profile_override:
                if profile_override in profiles:
                    default_profile = profile_override
            elif isinstance(profiles, dict) and self.extra_vars:
                for profile, rules in profiles.iteritems():
                    if isinstance(rules, dict):
                        rule_matches = {var: False for var in rules.keys()}
                        for var, vals in rules.iteritems():
                            if isinstance(vals, basestring):
                                vals = [vals]
                            if var in self.extra_vars and self.extra_vars[var] in vals:
                                rule_matches[var] = True
                        if all(m == True for m in rule_matches.values()):
                            default_profile = profile
                            break
    
            if default_profile:
                self.aws_profile = default_profile
                os.environ['AWS_ACCESS_KEY_ID'] = credentials[default_profile].access_key
                os.environ['AWS_SECRET_ACCESS_KEY'] = credentials[default_profile].secret_key
                os.environ['AWS_SECURITY_TOKEN'] = credentials[default_profile].token
                os.environ['AWS_SESSION_TOKEN'] = credentials[default_profile].token
    
            cleaner = re.compile('[^a-zA-Z0-9_]')
            for profile, creds in credentials.iteritems():
                profile_clean = cleaner.sub('_', profile).upper()
                os.environ['{}_AWS_ACCESS_KEY_ID'.format(profile_clean)] = creds.access_key
                os.environ['{}_AWS_SECRET_ACCESS_KEY'.format(profile_clean)] = creds.secret_key
                os.environ['{}_AWS_SECURITY_TOKEN'.format(profile_clean)] = creds.token
                os.environ['{}_AWS_SESSION_TOKEN'.format(profile_clean)] = creds.token
    
    
        def get_vars(self, loader, path, entities, cache=True):
            super(VarsModule, self).get_vars(loader, path, entities)
            return {}
    
    
    # vim: set ft=python ts=4 sts=4 sw=4 et:

    This got busy real quick, let’s break it down a little.

    At line 54, we read the configuration file and store a config dictionary.

    At line 55, we loop through each configured profile and instantiate a boto3 session for each one, saving the session objects as attributes on our module class.

    At line 56, the magic happens. First we retrieve temporary credentials for each of the connected profiles (line 83) – this includes the usual secret access key plus a session key. From line 85 We check for an environment variable ANSIBLE_AWS_PROFILE, a play on the usual AWS_PROFILE, which allows us to override the account selection when invoking Ansible. Should this not be specified, from line 90 we iterate the profile specifications to determine if the specified extra vars match any profile. If they do, default_profile is populated and from line 103 we export the earlier acquired credentials using the usual AWS_* environment variables. From line 110, credentials for all profiles are exported with prefixed environment variable names to allow us to override them on a per-task basis.

    This approach takes advantage of the fact that environment variables set here do propagate process-wide, and all Ansible modules running on the control host are able to see them, and will automatically use them to authenticate with AWS.

    For specific tasks where you know you’ll always run that task for one specific account, you can reference the corresponding prefixed environment variables to specify credentials for the module. For example:

    ---
    
    - hosts: localhost
      connection: local
      
      pre_tasks:
      
        - name: Validate extra vars
          assert:
            that:
              - env is defined
              - project is defined
              - name is defined
      
      tasks:
      
        - name: Launch EC2 instance
          ec2:
            assign_public_ip: yes
            group: external
            image: ami-aabbccde
            instance_tags:
              Name: "{{ name }}"
              env: "{{ env }}"
              project: "{{ project }}"
            instance_type: t2.medium
            keypair: ansible
            vpc_subnet_id: subnet-22334456
            wait: yes
          register: result_ec2
    
        - name: Create DNS record
          route53:
            aws_access_key: "{{ lookup('env', 'OPS_AWS_ACCESS_KEY_ID') | default(omit) }}"
            aws_secret_key: "{{ lookup('env', 'OPS_AWS_SECRET_ACCESS_KEY') | default(omit) }}"
            security_token: "{{ lookup('env', 'OPS_AWS_SECURITY_TOKEN') | default(omit) }}"
            command: create
            overwrite: yes
            record: "{{ name }}.example.net"
            value: "{{ item.public_ip }}"
            zone: example.net
          with_flattened:
            - "{{ result_ec2.results | map(attribute='instances') | list }}"
            - "{{ result_ec2.results | map(attribute='tagged_instances') | list }}"
    
    # vim: set ft=ansible ts=2 sts=2 sw=2 et:

    In this playbook, the ec2 task launches an instance in the account that was matched based on the env and project variables provided at runtime. The route53 task however, always creates a corresponding DNS record for the instance using the ops AWS profile.

    Wrapping up, I added all of this functionality and more to my Ansible AWS Vars Plugin which you can grab from GitHub and use/modify as much as you find it useful.

  • Beyond Facts: Retrieving AWS Resource IDs

    I’m a huge fan of Ansible and I’ve made use of it in several projects to orchstrate AWS services. Ansible is designed to be simple, with most functionality contained in modules which are callable via tasks in playbooks. This has huge benefits, but also bears the major drawback of significant boilerplate when you need to retrieve data from external sources.

    From the beginning, Ansible has had a dynamic inventory facility to allow host data to be dynamically imported from sources like AWS, but although this is undergoing great improvements in Ansible 2.5, does not yet provide a way to retrieve arbitrary data which is commonly used when interacting with AWS services. Specifically, if you want to retrieve VPC, subnet or security group details, you are forced to do this with facts modules. When you have a large number of these resources, this quickly becomes unwieldy and repetitive.

    I was determined to find a better way. I experimented with various plugin types available in Ansible, but most of these are designed to manipulate existing data, or to perform once off lookups. Nothing really lends itself to mass-lookup of large data sets. I also wanted to do this automatically – since almost every playbook interacts with one or more AWS services this would have a massive payoff. Enter the widely underused (and underdocumented) Vars Plugins.

    Creating a vars plugin turns out to be surprisingly straightforward. First you need to configure a directory (or directories) for vars plugins. This can be done in ansible.cfg, but since there might be an existing default here, I wanted to supplement whatever might already be configured. Like many Ansible configuration values, a vars plugins directory can also be specified via an environment variable: ANSIBLE_VARS_PLUGINS. You can export this in your wrapper script(s), set it in your shell rc, CI tool, or wherever/however you use Ansible.

    I prefer to keep everything together for a project, so I set this to the vars_plugins/ directory, relative to the playbooks for a project. Once this is set, you need only place Python scripts in this directory, with each one describing a VarsModule class. Here’s what an empty vars module looks like.

    from __future__ import (absolute_import, division, print_function)
    __metaclass__ = type
    
    
    class VarsModule(BaseVarsPlugin):
        def get_vars(self, loader, path, entities, cache=True):
            super(VarsModule, self).get_vars(loader, path, entities)
            return {}
    
    
    # vim: set ft=python ts=4 sts=4 sw=4 et:

    Ramping this up pretty quickly, here’s how you’d retrieve VPC IDs and populate a glocal dictionary variable keyed by VPC name.

    from __future__ import (absolute_import, division, print_function)
    __metaclass__ = type
    
    try:
        import boto3
        import botocore.exceptions
        HAS_BOTO3 = True
    except ImportError:
        HAS_BOTO3 = False
    
    
    class VarsModule(BaseVarsPlugin):        
        def get_vars(self, loader, path, entities, cache=True):
            super(VarsModule, self).get_vars(loader, path, entities)
            self._get_vpc_ids()
            return dict(vpc_ids=self.vpc_ids)
    
    
        def _get_vpc_ids(self):
            ''' Retrieve all VPC details from AWS API '''
    
            self.vpc_ids = dict()
            client = boto3.client('ec2')
            vpcs_result = client.describe_vpcs()
            if vpcs_result and 'Vpcs' in vpcs_result and len(vpcs_result['Vpcs']):
                for vpc in vpcs_result['Vpcs']:
                    if 'Tags' in vpc:
                        tags = dict((t['Key'], t['Value']) for t in vpc['Tags'])
                        if 'Name' in tags:
                            self.vpc_ids[tags['Name']] = vpc['VpcId']
    
    
    # vim: set ft=python ts=4 sts=4 sw=4 et:

    This uses Boto3 to connect to EC2 and retrieve a list of VPCs, then parses the list and builds a dictionary indexed by Name tag values. In the get_vars() method, we return a dictionary of variables to set, and Ansible kindly obliges and makes these dictionary values available as variables on all hosts.

    We can extend this to also retrieve subnet IDs.

    from __future__ import (absolute_import, division, print_function)
    __metaclass__ = type
    
    try:
        import boto3
        import botocore.exceptions
        HAS_BOTO3 = True
    except ImportError:
        HAS_BOTO3 = False
    
    
    class VarsModule(BaseVarsPlugin):
        def get_vars(self, loader, path, entities, cache=True):
            super(VarsModule, self).get_vars(loader, path, entities)
            self._get_vpc_ids()
            self._get_subnet_ids()
            return dict(
                vpc_ids=self.vpc_ids,
                subnet_ids=self.subnet_ids,
            )
    
            
        def _get_vpc_ids(self):
            ''' Retrieve all VPC details from AWS API '''
    
            self.vpc_ids = dict()
            client = boto3.client('ec2')
            vpcs_result = client.describe_vpcs()
            if vpcs_result and 'Vpcs' in vpcs_result and len(vpcs_result['Vpcs']):
                for vpc in vpcs_result['Vpcs']:
                    if 'Tags' in vpc:
                        tags = dict((t['Key'], t['Value']) for t in vpc['Tags'])
                        if 'Name' in tags:
                            self.vpc_ids[tags['Name']] = vpc['VpcId']
    
    
        def _get_subnet_ids(self):
            ''' Retrieve all subnet details from AWS API '''
    
            self.subnet_ids = dict()
            client = boto3.client('ec2')
            subnets_result = client.describe_subnets()
            if subnets_result and 'Subnets' in subnets_result and len(subnets_result['Subnets']):
                for subnet in subnets_result['Subnets']:
                    if 'Tags' in subnet:
                        tags = dict((t['Key'], t['Value']) for t in subnet['Tags'])
                        if 'Name' in tags:
                            self.subnet_ids[tags['Name']] = vpc['SubnetId']
    
    
    # vim: set ft=python ts=4 sts=4 sw=4 et:

    Just as with VPCs, we’re now retrieving subnet details and making a variable subnets available to playbooks for all hosts. We can make use of these super-handy variables with something like this:

    ---
    
    - hosts: localhost
      connection: local
    
      tasks:
      
        - name: Launch EC2 instance
          ec2:
            assign_public_ip: yes
            group: external
            image: ami-aabbccde
            instance_tags:
              Name: "{{ name }}"
              env: "{{ env }}"
              project: "{{ project }}"
            instance_type: t2.medium
            keypair: ansible
            vpc_subnet_id: "{{ subnet_ids['acme-private-subnet'] }}"
            wait: yes

    Note how we’re specifying the subnet in which to launch this instance by looking up the subnet ID in our global subnet_ids variable. We don’t have to use ec2_vpc_subnet_facts to explicitly look up the subnet, it’s just done in the background by the plugin.

    You can see the potential for vastly simplifying playbooks. Wherever we need an AWS resource ID, we can add the ability for our plugin to look it up transparently, and totally avoid a facts module call each time.

    I took this concept further and built a more comprehensive plugin, which is published on GitHub. It adds some nice features like caching lookup results, reading a configuration file for multi-region support and is able to parse any tag values to build a nested dictionary for each type of AWS resource it supports.

  • Ansible Recommended Patterns

    It can be tricky to figure things out when structuring new projects. You might set out to make things as comprehensive as possible, to accommodate future expansion, but this raises the barrier to entry and can leave you in a quandry about where things should go. Or you might opt for the lean approach, making things super simple and extending as you go, but you don’t want to set yourself up for big refactoring sessions later on.

    These are a collection of recommendations that I have for new Ansible projects. They can equally be applied to existing projects, depending on your needs and the effort involved. At the end I’ve also linked to a skeleton repository that you’re free to clone and make your own.

    Use wrapper scripts

    It’s good to use tools in the way they were intended and in an ideal world we wouldn’t need wrapper scripts, but conversely with orchestration tools like Ansible you want to standardize the way you use it for your project or organization. Your Ansible project might also need to run in multiple places, like your CI server, and wrapper scripts will help to encapsulate the runtime environment and settings.

    Everything in one repository

    I’m a big proponent of infrastructure and orchestration monorepos. In terms of Ansible, this means your playbooks, variables, scripts, roles, plugins, inventory scripts and configuration all resides together and is version controlled in the same repository. This makes it possible to deploy your code anywhere, on any platform, with root access or not, and be confident that all you need to execute is in one place.

    Lean playbooks

    Playbooks are your point of entry when running Ansible repeatedly, and they need to be easily readable and understandable without having to read hundreds of line of role invocations and arbitrary tasks. They should be short, to the point, and the roles that are called in your playbooks should represent the broad actions the playbook is taking. Don’t repeat stuff in each playbook, rather make composable, includable playbooks and chain them together using include: (Ansible <2.5) or import_playbook: (Ansible 2.5+).

    Roles represent business logic

    The real reason I encourage writing your own roles is that your roles should represent your own business logic and your own operational requirements. If you are installing nginx, what is it for? If it’s for a reverse proxy layer, then build a role for deploying your nginx reverse proxies. Don’t be afraid of having multiple roles install the same packages if they are for different reasons, and don’t be dazzled by the complicated, does-everything role on Ansible Galaxy. It probably requires you to provide a huge amount of input just to get the feature you want, when you’d be much better served by baking that feature into your own role. Use role dependencies wisely, and remember that you can inject conditionals and parameters just about everywhere. But most importantly, codify your own business logic.

    Namespace your group variables

    One issue with group variables is that you can’t declare a set of variables that apply to an intersection of groups. This is fine to begin with, but before long you’ll have groups based on project, server role, location, etc and you’ll want finer grained control over your group variables. My preferred approach is to namespace group vars by project and to enable dictionary merging. The setting to use in ansible.cfg is hash_behaviour = merge. With this in place, you can set variables like this…

    # group_vars/tag_project_atom/main.yml
    
    atom:
      repo: git@github.com:acme/atom.git
      user: projatom
    # group_vars/tag_role_app/atom.yml
    
    atom:
      packages:
        - nginx
    # group_vars/tag_env_production/atom.yml
    
    atom:
      autoscaling:
        instance_count: 5
        instance_type: m4.large

    … and they will all be merged into a single atom dictionary, from which you can reference relevent attributes for hosts in the respective groups.

    If you ever need to dynamically address the atom dictionary itself, you can use the syntax hostvars[ansible_hostname][project] where project is a supplied extra variable or such.

    Whilst on the topic of group variables, I thoroughly recommend putting your group var files into subdirectories, e.g. group_vars/all/main.yml. This helps to prune your variables and keep them in check, and avoid the hell that is a 1000 line configuration file.

    Don’t be afraid to write plugins

    Especially if you don’t code much Python, the prospect of writing a plugin for Ansible sounds rather daunting. But most Ansible plugins are super simple, almost one-liner scripts. If you find yourself chaining several Jinja filters together to manipulate something into the format you want, then you might consider putting that functionality into a plugin.

    To make a filter plugin, copy this boilerplate into filter_plugins/my-plugin.py (relative to your playbook).

    def noop(val):
        return val
    
    class FilterModule(object):
        def filters(self):
            return {
                'noop': noop,
            }

    This noop plugin literally does nothing to the value you provide. Use it like this:

    - name: Print thing
      debug:
        msg: "{{ 'foo' | noop }}"

    You can find some other useful filter plugins on my GitHub profile.

    Skeleton Project

    When starting a new Ansible project, I usually start with my skeleton project, and as I pick up new patterns, or improve my approach, I go back and update my skeleton project so that newer projects can benefit. It’s available on GitHub and I recommend using it specifically if you’re starting out with Ansible on AWS.