• Beyond Facts: Retrieving AWS Resource IDs

    I’m a huge fan of Ansible and I’ve made use of it in several projects to orchstrate AWS services. Ansible is designed to be simple, with most functionality contained in modules which are callable via tasks in playbooks. This has huge benefits, but also bears the major drawback of significant boilerplate when you need to retrieve data from external sources.

    From the beginning, Ansible has had a dynamic inventory facility to allow host data to be dynamically imported from sources like AWS, but although this is undergoing great improvements in Ansible 2.5, does not yet provide a way to retrieve arbitrary data which is commonly used when interacting with AWS services. Specifically, if you want to retrieve VPC, subnet or security group details, you are forced to do this with facts modules. When you have a large number of these resources, this quickly becomes unwieldy and repetitive.

    I was determined to find a better way. I experimented with various plugin types available in Ansible, but most of these are designed to manipulate existing data, or to perform once off lookups. Nothing really lends itself to mass-lookup of large data sets. I also wanted to do this automatically – since almost every playbook interacts with one or more AWS services this would have a massive payoff. Enter the widely underused (and underdocumented) Vars Plugins.

    Creating a vars plugin turns out to be surprisingly straightforward. First you need to configure a directory (or directories) for vars plugins. This can be done in ansible.cfg, but since there might be an existing default here, I wanted to supplement whatever might already be configured. Like many Ansible configuration values, a vars plugins directory can also be specified via an environment variable: ANSIBLE_VARS_PLUGINS. You can export this in your wrapper script(s), set it in your shell rc, CI tool, or wherever/however you use Ansible.

    I prefer to keep everything together for a project, so I set this to the vars_plugins/ directory, relative to the playbooks for a project. Once this is set, you need only place Python scripts in this directory, with each one describing a VarsModule class. Here’s what an empty vars module looks like.

    from __future__ import (absolute_import, division, print_function)
    __metaclass__ = type
    
    
    class VarsModule(BaseVarsPlugin):
        def get_vars(self, loader, path, entities, cache=True):
            super(VarsModule, self).get_vars(loader, path, entities)
            return {}
    
    
    # vim: set ft=python ts=4 sts=4 sw=4 et:

    Ramping this up pretty quickly, here’s how you’d retrieve VPC IDs and populate a glocal dictionary variable keyed by VPC name.

    from __future__ import (absolute_import, division, print_function)
    __metaclass__ = type
    
    try:
        import boto3
        import botocore.exceptions
        HAS_BOTO3 = True
    except ImportError:
        HAS_BOTO3 = False
    
    
    class VarsModule(BaseVarsPlugin):        
        def get_vars(self, loader, path, entities, cache=True):
            super(VarsModule, self).get_vars(loader, path, entities)
            self._get_vpc_ids()
            return dict(vpc_ids=self.vpc_ids)
    
    
        def _get_vpc_ids(self):
            ''' Retrieve all VPC details from AWS API '''
    
            self.vpc_ids = dict()
            client = boto3.client('ec2')
            vpcs_result = client.describe_vpcs()
            if vpcs_result and 'Vpcs' in vpcs_result and len(vpcs_result['Vpcs']):
                for vpc in vpcs_result['Vpcs']:
                    if 'Tags' in vpc:
                        tags = dict((t['Key'], t['Value']) for t in vpc['Tags'])
                        if 'Name' in tags:
                            self.vpc_ids[tags['Name']] = vpc['VpcId']
    
    
    # vim: set ft=python ts=4 sts=4 sw=4 et:

    This uses Boto3 to connect to EC2 and retrieve a list of VPCs, then parses the list and builds a dictionary indexed by Name tag values. In the get_vars() method, we return a dictionary of variables to set, and Ansible kindly obliges and makes these dictionary values available as variables on all hosts.

    We can extend this to also retrieve subnet IDs.

    from __future__ import (absolute_import, division, print_function)
    __metaclass__ = type
    
    try:
        import boto3
        import botocore.exceptions
        HAS_BOTO3 = True
    except ImportError:
        HAS_BOTO3 = False
    
    
    class VarsModule(BaseVarsPlugin):
        def get_vars(self, loader, path, entities, cache=True):
            super(VarsModule, self).get_vars(loader, path, entities)
            self._get_vpc_ids()
            self._get_subnet_ids()
            return dict(
                vpc_ids=self.vpc_ids,
                subnet_ids=self.subnet_ids,
            )
    
            
        def _get_vpc_ids(self):
            ''' Retrieve all VPC details from AWS API '''
    
            self.vpc_ids = dict()
            client = boto3.client('ec2')
            vpcs_result = client.describe_vpcs()
            if vpcs_result and 'Vpcs' in vpcs_result and len(vpcs_result['Vpcs']):
                for vpc in vpcs_result['Vpcs']:
                    if 'Tags' in vpc:
                        tags = dict((t['Key'], t['Value']) for t in vpc['Tags'])
                        if 'Name' in tags:
                            self.vpc_ids[tags['Name']] = vpc['VpcId']
    
    
        def _get_subnet_ids(self):
            ''' Retrieve all subnet details from AWS API '''
    
            self.subnet_ids = dict()
            client = boto3.client('ec2')
            subnets_result = client.describe_subnets()
            if subnets_result and 'Subnets' in subnets_result and len(subnets_result['Subnets']):
                for subnet in subnets_result['Subnets']:
                    if 'Tags' in subnet:
                        tags = dict((t['Key'], t['Value']) for t in subnet['Tags'])
                        if 'Name' in tags:
                            self.subnet_ids[tags['Name']] = vpc['SubnetId']
    
    
    # vim: set ft=python ts=4 sts=4 sw=4 et:

    Just as with VPCs, we’re now retrieving subnet details and making a variable subnets available to playbooks for all hosts. We can make use of these super-handy variables with something like this:

    ---
    
    - hosts: localhost
      connection: local
    
      tasks:
      
        - name: Launch EC2 instance
          ec2:
            assign_public_ip: yes
            group: external
            image: ami-aabbccde
            instance_tags:
              Name: "{{ name }}"
              env: "{{ env }}"
              project: "{{ project }}"
            instance_type: t2.medium
            keypair: ansible
            vpc_subnet_id: "{{ subnet_ids['acme-private-subnet'] }}"
            wait: yes

    Note how we’re specifying the subnet in which to launch this instance by looking up the subnet ID in our global subnet_ids variable. We don’t have to use ec2_vpc_subnet_facts to explicitly look up the subnet, it’s just done in the background by the plugin.

    You can see the potential for vastly simplifying playbooks. Wherever we need an AWS resource ID, we can add the ability for our plugin to look it up transparently, and totally avoid a facts module call each time.

    I took this concept further and built a more comprehensive plugin, which is published on GitHub. It adds some nice features like caching lookup results, reading a configuration file for multi-region support and is able to parse any tag values to build a nested dictionary for each type of AWS resource it supports.