Controlling your Azure Automation runbooks with Git and Azure DevOps (My PowerShell Journey through Azure DevOps – Part 1.5)

Part 1: My PowerShell Journey through Azure DevOps – Part 1

As I said in Part 1, I started down the paths of pipelines by copying some people that were using AppVeyor. AppVeyor does appear to have a set of tasks for building, testing, releasing etc, though for better or worse, the examples I found put must of the work into a PowerShell script and just used AppVeyor to execute that script, instead of parceling out build and test and deploy tasks directly.

With that as my influence, while I was still working on deployment for our modules, I also started looking at testing and deploying our Azure Automation runbooks through Azure DevOps. The old way was an Azure Automation runbook that would deploy any file that got committed to master of our runbook repository in GitHub. The new way uses Azure DevOps to run tests, do some kind of actual building, deploy to a test environment, and finally deploy to production. None of this was possible with our old system.

If you’ve never used Azure Automation this might not all make sense. Since this isn’t strictly about PowerShell modules I’ve labelled this Part 1.5, maybe think of it as bonus content. I would someday like to write more about how we’re using Azure Automation, and hopefully that future post can put this one in a little bit of context.

I’ve also made my code available on GitHub for anyone to look at. I’ll explain the repo a bit at the end but first here’s 2500 words explaining myself.

The problem statement

Pester testing my functions

If you’re going to put in all the work to build a pipeline, it should solve a problem. My immediate problem was that I wanted to write Pester tests against my runbooks. Pester testing a module is easy: loading the module doesn’t actually execute it, and you are then able to send sample data through your functions to run your tests. A runbook is a bit harder as it’s essentially a script. Pester loves functions, so I knew I’d need to break up as much of my runbook logic into functions that I could then load and test. But I couldn’t just declare those functions within my runbook, because to load them I would need to execute the entire runbook, which is what I’m trying to avoid in Pester entirely. I needed to pull them out of the runbook itself somehow. I could create a module, but these were task specific functions that would have no purpose outside of the particular runbook, so that seemed like a lot of overkill.

What I settled on was, for each runbook with functions, I would create a corresponding Functions file. For example, if I have a runbook file named Runbook.ps1, I would have a corresponding Runbook-Functions.ps1. I could then write tests in a Runbook-Functions.Tests.ps1 file. My initial plan was to then dot source my Functions runbook from within the calling runbook, because doing that works in Azure Automation. After talking with my team, however, we decided it would be a little strange to have a bunch of runbooks sitting around that would never actually be executed directly.

My next idea, which is what I ended up going with, was then that I would need to build each runbook by combining the runbook file and the functions file. Again using the example of Runbook.ps1, I would have a special string within that file (I used ##<runbook name>-Functions.ps1_goes_here##) that would be replaced by all the functions. This allows a file of just functions that can be loaded and Pester tested, while still only deploying a single Runbook.

If there isn’t a corresponding function file for the runbook, or if that special string doesn’t exist in the runbook, nothing is done, and the runbook gets deployed as written.

Syntax check the code

This goes along with Pester, but I want to ensure that nothing gets committed that is invalid PowerShell syntax, like missing a bracket or a quote. I also wanted to check for bad Unicode characters from copy and pasting code, as I’ve described earlier. PowerShell is smart enough to work just fine with those Unicode quotes and dashes, but I wanted to detect them.

Have a test environment

Previously, we were doing code reviews on all changes to master, and once a commit was made to master, the updated file was directly deployed to production. Even though we had code reviews, we’re sysadmins not software engineers (and we’re all human). A broken runbook would sometimes get deployed that would break a business process forcing us to scramble to revert the commit, figure out what went wrong, and push a new version. With some of our more critical functions we had a dev version in our Automation Account, but using that would require manually keeping the prod version in sync with the dev version. Cue there has to be a better way dot gif.

I created a separate test Azure Automation account for us. When someone does a commit to any branch that contains the string “!testdeploy” or a pull request has been made, the runbook is deployed to the test environment. The manual testing there depends on what the runbook is. Sometimes all that’s needed is to execute the runbook to see if it does what you expect it. Sometime that runbook will need a webhook that something else will call. I tried to make it as easy as possible and as flexible as possible to test runbooks. Do whatever you want with it once it’s in the test environment, including nothing!

Finally, to keep Test in sync with Prod, whenever a commit to master is made, the changed runbooks are deployed to both production and test, so the test runbooks will never be behind master.

My reasoning here is a little complex. We use Azure Automation to build new servers, by passing a JSON payload into a webhook. If that payload includes the value “testOS” set to True, the build will be built with test things instead of production things. We also have our SCCM task sequence call another Finalize runbook at the end to do some cleanup and notification. We’ve overloaded test to mean many things, so this could mean a test SCCM package, a test OS image, or a call to the test runbook instead of production at the end. So in this scenario it’s possible that our monthly process to update Server OS images, which builds with testOS set to True, there could be a call to the test FinalizeVM runbook even though that runbook isn’t in development, so I wanted to make sure that everything in the test space would always be at least up to date with production.

Ensure the right Azure Automation variables exist

Azure Automation has what they call shared resources, but I call variables. There are multiple types: Credentials, Certificates, Connections, and Variables (yes that means in my nomenclature there is a variable of type variable). We use this for credentials, service URIs, shared secrets, etc etc. After developing a runbook in the test environment, it got published to production, but the first run failed because I hadn’t put the correct variables into the production space, and so the first run failed. There has to be a better way dot gif.

Now when a “test” build is done (when commiting with !testdeploy in the message, opening a PR or pushing to a branch with a PR open, or when a commit to master is done) the runbooks being deployed are queried (using a Regex, did I mention I learned a lot of Regex through this?) to find all those shared resources, and the test environment is queried to make sure it has those variables.

When a deployment is done to production, all runbooks are queried for their shared resource and compared to what exists in the production environment.

I thought about just testing all variables in both environments in every deploy, but the first time I ran that test it came back with over 100 variables that didn’t exist in Prod. I wasn’t going to actually create all those variables, and I wouldn’t expect anyone else to. So I decided we’d create them incrementally as we needed them, instead of all at once. I figured that if anyone was required to create all of them upfront, they may be created with dummy data, which would defeat the purpose. This way hopefully only a few would be needed at a time, which is an amount I could expect my coworkers to faithfully implement into the test environment.

The values of the variables aren’t checked through this process, because there’d essentially be no way to do that automatically. I could compare that values in the test environment match those in the production environment, but I specifically wanted the ability to have different values in test and production. For example, I built a runbook that read data from a Google Sheet and did things with it. I used a variable for the URL of the spreadsheet, so in the test environment I could test my runbook against a dummy copy of the data, and then when it moves to production it will operate against the production copy of the spreadsheet. This is another example of the flexibility in the test environment. Any way I could envision to test a runbook, I build the system to work for that case.

Failure notification should spam the team as little as possible

All of our runbooks have the same pattern, where the entire script is wrapped in a try block, with a catch that will send an email if there is a failure (with other try-catch blocks inside the try block as appropriate). By default we want these failures to open an incident in our queue, so the email goes to the ticketing system.

When developing a runbook, there can be a lot of failures, and closing an incident in our system takes a lot of clicks. My solution was define a variable for the error email (creatively called $errorEmail) in my runbook, and the build script would find that definition and replace it based on the deployment being done. The code you commit to git can define the $errorEmail to be any value, but if the build is being done because of a commit to master, the value will be the email to open an incident. If the build is being done because of a pull request, the email will go directly to our team email. This way if someone else is evaluating the pull request by running the test runbook, and there’s a failure, we know the reviewer will get notified (since there’s no way to know who on our team is reviewing). If the build is being done by specifying !testdeploy, the email is set to the person who made the commit, or if Azure DevOps doesn’t know their email, it will default to our group email.

Logically we’re threading needle here. As I said earlier, when a commit is done to master, the runbook is deployed to both test and production, and in that case both the test and production runbooks will have the incident email as their error email, even though it’s a “test” deploy, it still gets the prod version of that variable.

It should be fast

My first iteration of this process would run tests against the entire repository on every commit, including code coverage. We have over 10,000 lines of code, with nearly 4,500 of them “coverable” according to Pester. This pester run took about 20 minutes, even though I knew the majority of the code didn’t have any Pester tests (nothing like spending over 20 minutes to get 0.2% code coverage). I made the testing logic a lot smarter, so it only runs tests against files that have tests. This has gotten build times down to just a couple of minutes.

A pretty chart that lays it all out but may just be more confusing than prose

This is a table I put in our readme that explains everything the build script does, and when it does it.

CommitTest DeployPull RequestCommit to Master
Test Production Variables
Test Test Variables
Run Code TestsFiles changed in this commit Files changed in this commit All changes in PRAll tests
Build
Deploy to Test Env
Deploy to Prod Env
Error Emailcommitter or teamteamincident queue

The tools I used

My Azure Pipeline has a few tasks. I have a task to download a package from Azure Artifacts (more on that later), then a task that runs a PowerShell script to call a massive psake file, then finally tasks to publish test result, generate a code coverage report, and publish that report. Calling a build.ps1 file that in turn executes a psake file seems to be pretty common, but I’m fairly certain I stole a copy of build.ps1 and psake.ps1 from one of Warren Frame’s (PSCookieMonster/RamblingCookieMonster) projects, and built on that.

The modules I use are Psake, Pester, BuildHelpers, and from the Az module, Az.Accounts and Az.Automation. BuildHelper is a project by Warren that I started using because he uses it and I was copying him (and he wrote it) which normalizes some environment variables across build systems. The closer I tied this project to Azure Pipelines the less valuable a cross-CI branch name variable became, but it does have a few helper functions that are useful, particularly Get-GitChangedFile. However the function was a bit rudimentary and would only show you changes in the specific commit you were running it in (unless that commit was a merge in which case it wouldn’t show anything). I needed a lot more flexibility, because I needed the ability to see the changes in all the commits since master, not just the most recent commit, and also to filter the types of changes. I ended up expanding this function quite a bit to meet all my needs, but as I write this, my PR for that hasn’t been accepted yet. I also added a normalized variable to use to determine if the build is the result of a pull request or not, which also hasn’t been merged yet. Because of this I’ve been maintaining my own fork of the project where I have merged in my changes until Warren merges my requests (if you’re reading this Warren, I’d love to be a co-maintainer of BuildHelpers if you need help with it!). This is the package I have hosted in Azure Artifacts that my pipeline downloads and imports.

I’m also using secrets from Azure Key Vault to authenticate to Azure (using Az.Accounts) and update the runbooks (using Az.Automation). I created a service principal specifically for this and granted it the neccessary permissions. I’ve got 6 variables: the username for my service princpal, the password for my service principal, the tenant ID of the service principal, the resource group my automation accounts are in (both the test and prod are in the same resource group), and the name of my test and prod Automation Accounts.

Things I’d do different: aka the future

This is a learning experience for me, and now that I’ve got this pushed into production I have ideas to go from here. The first was converting it from a Classic Build pipeline to a functionally equivalent YAML pipeline, which I’ve already done.

The next is to break the job into more distinct tasks, possibly using already existing tasks in the marketplace. With this I could better split up the jobs using a multi-stage pipeline. I may also make different pipelines entirely for each of my scenarios, possibly using YAML templates to share tasks between branches.

Azure DevOps itself has an Azure service principal, and I can use that and the Azure Powershell task to run my deploys as the Azure DevOps service principal, removing the need to pull credentials out of Azure Key Vault entirely.

Azure DevOps supports Python based runbooks, and I somewhat support Python in my existing process, but I didn’t take it to the finish line to actually deploy Python. Until I have a need to test and “build” Python runbooks I probably won’t put as much effort into them, but at the very least I should deploy them.

In a module project I’m working on (which I’ll hopefully cover in the next installment) I ported my unicode Pester test into a custom PSScriptAnalyzer rule, so I’d like to do that here as well.

Sample code

I can’t share our internal repository, so I’ve copied the necessary files into a public repo, which is hopefully self explanatory, but I’m going to explain it anyway.

azure-pipelines.yml

This is the yml file that defines the tasks. It includes a step to download the BuildHelpers artifact from an internal Azure Artifacts repo (that you’ll need to create), as well as a disabled PowerShell step that I’ve used for debugging to figure out what was going on in the environment.

buildhelpers.2.0.0.nupkg

This is the BuildHelpers package I compiled. I created the Azure Artifacts feed, connected to it from my desktop, and pushed that package into the feed, making it available to pull in my pipeline.

Build

This contains the build.ps1 script which installs the needed modules, then calls the psake.ps1 script where the bulk of the work is done.

Deploy

This is a folder where built runbooks will be placed so they can be deployed. Since this runs in a MS Hosted agent, everything placed here will be wiped out at the end of the job. I also have a .gitignore to ignore any .ps1 or .py files in case you’re running builds locally.

Tests

This folder is where tests go, obviously. I have a test for AutomationVariables to ensure they exist, and the Powershell.Tests tests the syntax and checks for unicode characters.

Example-Runbook.ps1

This, along with Example-Runbook-Functions.ps1 and Example-Runbooks-Functions.Tests.ps1 (in the test folder) show generally how the pieces fit together. The Runbook uses a function that exists in the Function file, and the Tests run against that Function file. At build time the deployed runbook will include both the functions and the code from the runbook file.

Conclusion (yes I’m finally done)

Now that I’ve gotten this out of the way, next time I’ll be talking about what I’ve been doing with Multi-stage pipelines and templates.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.