I am beginning to learn these tools and I noticed that Ansible is used to configure servers and Terraform does something similar but I can't figure what makes Ansible poor choice when doing provisioning?
NB: I am still learning please bear with my poor use of technical terms.
Terraform is for building infrastructure, you know foundations, skyscrapers, streets. Build an empty restaurant with a giant yellow excavator (Terraform)
Ansible, Chef are for building configuration, you know menus, staff schedules, grocery lists. Ensure a restaurant is configured correctly to serve customers with its wait staff (Ansible)
You can use an excavator to configure the stuff inside the restaurant. People do it. It's just not generally the most efficient way to do it. And you could have the wait staff at a restaurant pouring concrete for their second place a town over. You could do that too, but a lot of people would use the excavator.
So what really makes these tools effective is when you start using them at scale. They start to become helpful once you realize how much you can do with how little, and they each have this same strength solving different levels, and their strengths become weaknesses at the other end.
There are five broad categories of IAC (Infrastructure as a Code) tools:
a)Ad hoc scripts
The most straightforward approach to automating anything is to write an ad hoc script.
You take whatever task you were doing manually, break it down into discrete steps, use your favorite scripting language (e.g., Bash, Ruby, Python) to define each of those steps in code, and execute that script on your server
b) Configuration management tools
Chef, Puppet, Ansible, and SaltStack are all configuration management tools, which means that they are designed to install and manage software on existing servers.
c)Server templating tools
An alternative to configuration management that has been growing in popularity recently are server templating tools such as Docker, Packer, and Vagrant. Instead of launching a bunch of servers and configuring them by running the same code on each one, the idea behind server templating tools is to create an image of a server that captures a fully self-contained “snapshot” of the operating system (OS), the software, the files, and all other relevant details.
d)Orchestration tools
Server templating tools are great for creating VMs and containers, but how do you actually manage them?
Handling these tasks is the realm of orchestration tools such as Kubernetes, Marathon/Mesos, Amazon Elastic Container Service (Amazon ECS), Docker Swarm, and Nomad
e)Provisioning tools
Whereas configuration management, server templating, and orchestration tools define the code that runs on each server, provisioning tools such as Terraform, CloudFormation, and OpenStack Heat are responsible for creating the servers themselves. In fact, you can use provisioning tools to not only create servers, but also databases, caches, load balancers, queues, monitoring, subnet configurations, firewall settings, routing rules, Secure Sockets Layer (SSL) certificates, and almost every other aspect of your infrastructure
My advice would be to avoid Ansible for that sort of thing like the plague. All the nice things like idempotency and having the same script for setting things up and tearing them down no longer function when using those modules.
Essentially you need to 1. do the checks to ensure your playbook won't just create a new set of VMs every time it's run if they already exist 2. maintain a teardown playbook alongside your setup one because Ansible is entirely procedural and the steps would be reversed in that case and 3. do queries first to determine what actually exists in the cloud and do lots of jinja manipulations to work on the right things. Did you know EC2 has default subnets and routing tables? Did you know that the Ansible module will error out if you try to delete those objects?
If only there was a thing like Terraform that could just rely on a single description of the setup you'd like. Seriously. There's an Ansible module that will run a Terraform .tf file, and there's a provider for terraform that will run Ansible on the servers it provisions.
I use Ansible to provision dev environments on each PR, it works fine, every time a deployment occurs Ansible will deploy if required, otherwise proceeded with the deployment.
Destroying the environment is done separate, triggered by a webhook once the PR is closed.
He didn't say that. He presumably uses one playbook that creates vms and provisions the software for every PR. With ansible this can be idempotent and creating vms is by default idempotent.
Yep it does indeed, Ansible also can talk to AWS, GCP etc, but that is not the point ;-) (And Terraform can SSH in to machines and execute commands just fine)
If someone is new to this, having a strong boundary makes it much easier to process how those systems are intended to be used and how they are built. There are certain assumptions made in both that make them work differently, i.e. being command centric vs. resource centric. That's interesting when someone is up to the point where you can apply what you know in a more general or generic sense, because then he systems themselves matter less.
TLDW; You can do resource management (e.g. creating EC2 instances in AWS) and deployment (e.g. installing packages on an instance) through both Terraform and Ansible. Terraform is best used for resource management - the documentation states using the "provisioning"/deployment function is a last resort. Ansible is great at deploying packages but less so at resource management for the reasons you'll see in the other comments. Either use them together for what they're good at, or use Terraform to do resource management and other techniques (such as prebuilt images) for deployment:
Ansible connects to remote servers to configure them, while Terraform calls cloud provider API’s to provision resources.
For example, you can use Terraform to provision virtual machines, database instances, or Kubernetes clusters on AWS. Terraform does this via the AWS API.
In my opinion, Terraform is better for provisioning because of the way it manages its own state. Terraform remembers what resources it created the last time it ran, and can edit or delete them according to any change in your Terraform code.
I like Ansible, but not for managing cloud resources. Ansible has no memory. For example, if I ran a playbook that installs MySQL, Ansible has no built-in way to undo this change and bring me back to my previous state.
Ansible has full integration with cloud providers API. It's actually better for managing instances and highly dynamic resources because it has much better state management than Terraform.
If you (re)create some EC2 instances with Terraform. Terraform save the ID the first time they are created (in a state file that needs to be shared and keep in sync). It goes mental the next time it runs if any of the instances are not found, or the state file is missing, or some of the instances were modified or died.
Ansible always lookup what's actually running, instances with the intended name/tags and match versus what's expected. It skips when it's already there, it's much less accidentally destructive and never run out of sync.
If you're using terraform with multiple people and are having problems keeping the state in sync I would suggest looking at remote terraform backends and state locking.
For example on AWS we use the s3 backend and a dynamodb table for locking. This way when terraform runs it will first acquire the lock, and then access the state on s3. And everyone is working on the same state.
Ansible doesn't handle the delete case. It has no way to "notice" that a resource is no longer in the playbook and therefore should be removed. This is why terraform keeps it's state file so it can do that sort of operation.
Ansible handles deletion just fine. Provisioning instances for example takes a number of instances, set to 0 to delete them. The more stable resources have a separate command to delete like ec2_vpc vs ec2_vpc_delete.
IMO The way terraform automatically/accidentally delete stuff is a major design flaw, not a feature to emulate. It's madness that it tries to auto nuke potentially a whole company just because it lost track of one resource identifier.
I'm not sure why you've had problems with Terraform trying to nuke things - I'd say the planning capability was one of its strong points. A quick glance at the plan will tell you what it needs to remove to put an environment in the expected state (and it's called out again in the destroy count summary at the end of then plan). Terraform doesn't "accidentally" delete things - it's doing it because you've told it they're not needed anymore.
>> Terraform doesn't "accidentally" delete things - it's doing it because you've told it they're not needed anymore.
That's putting it backward to say the least. One never tells terraform that something is not needed anymore. One declares what is needed and terraform will find a way to get there by altering/creating/deleting stuff.
There is a review phase of course and it's very important because it might do anything. Anybody who's had to use terraform can attest that it is scary to run. Any slight error in configuration or state can be tremendously destructive.
>> There is a review phase of course and it's very important because it might do anything. Anybody who's had to use terraform can attest that it is scary to run.
This is no worse than Ansible - if for a set of EC2 instances the user "set to 0 to delete them" then Ansible will blindly do as requested and be just as destructive. On the other hand:
* Terraform does its best to enforce the recommended plan/apply workflow - the plan is always presented before any changes are made, and auto-approval is strongly discouraged.
* There are multiple options for review - do it there and then, or store the plan as an artefact and share with others for review.
* It doesn't matter when you run a stored plan - the plan is the set of changes that will be applied regardless of current state.
* The summary makes very clear if anything is going to be destroyed in bright red text.
Ansible offers some visibility of what it will do with dry runs, although it's not as complete - there's no way to guarantee it will do the same thing next time if changes have been made in the interim.
How does the coverage of APIs compare. Just AWS is a gigantic set of APIs. I see most of what I'd need in the Ansible Module Index but it doesn't seem like it covers all that is available.
Ansible has everything that's needed to automate instances, security groups, ELB, S3, RDS and few more things. I automated all the infra for a startup mainly with ansible (tens of services and a hundred hosts).
Terraform has better support for some static things, mostly VPC, routing tables, gateways. I've had infra retrofitted in terraform but honestly it's more for the show and as a documentation. Low level needs only be setup once and it's always been done manually forever ago.
If you were working around 2014-2017, both tools and many AWS services were new. There were significant gaps in support as well as a few bugs. Had to run from the beta build regularly. It is much better nowadays.
Unfortunately this is true - the Terraform AWS provider has thousands of PRs closed (and hundreds still open) as proof. Nevertheless, things seem to get support quicker in Terraform than in CloudFormation.
I know I'm late to comment, so this will probably get buried, but I think a key to understanding Terraform and why it is different is to understand that it's an implementation of the Reconciler Pattern. This is a more useful distinction than the usual declarative vs imperative contrast that is usually brought up.
The Reconciler Pattern basically means:
* there is some notion of "expected" state, which is what you define (declaritively) in the configuration
* there is some "actual" state, which is basically what is running at whatever cloud service, etc. you are dealing with.
* the reconciler's job is to query the actual state, compare it to the expected state, calculate the difference (usually in terms of a graph), then make whatever changes it needs to to bring "actual" in line with "expected".
Kubernetes, SaltStack, and others implement the same pattern (just on different levels of resources) and it's becoming increasingly common and important to understand if you're working with cloud stuff.
Ansible is really SSH on steroid across multiple hosts, with extra commands that bash never added. It can configure servers and services. It can also configure cloud products and it's a better choice than Terraform for many things because it's more flexible.
Terraform can only provision cloud resources on AWS/GCP/Azure/other. Usually it gets support first for new products they release. Terraform is very static (see issues with sharing the state file) so it's more indicated to configure very static stuff, like networking and subnets.
This is not a bad high level description.
I asked myself almost the exact same question as the OP a year ago, though I had prior experience with Ansible, so knew what I was getting into.
My advice to the OP, as it's all new to you, is to learn Ansible. It will require more work than Terraform, but Ansible can be made to perform the same functions as Terraform, and a heck of a lot more stuff that will prove useful to you, if you're looking into how to provision cloud instances.
That makes Ansible sound like it's hard work, but it's actually quite the opposite.
It's surprisingly easy to do something useful with it.
Ansible is probably best described as a scripting environment / DSL combination to help you control multiple machines remotely for build and provisioning purposes.
You have 2 separate remote machines, and want to install Postgres on both of them? Use Ansible.
Want to install GIT and then pull your project repo onto two machines? Use Ansible.
Perhaps you have 3 machines, want git on all 3, but Postgres on only one of them?
Ansible again.
Where there is an overlap with Terraform, is that Ansible can also be used as an interface to control AWS/Google cloud/Whatever services, which is Terraform's sole purpose. Terraform provides a cloud platform agnostic interface to allow you to spin up new cloud instances, and perform some provisioning tasks, but it will only skim the surface of what you can do with dedicated Ansible scripts.
You write the state you would like resources to be in. When you run terraform plan, it tells you how the state might end up. You run terraform apply and then get to find out what actually happens.
Will it create the resources how I expect? What will the resource's properties be? Will it fail half way into the changes and stop in a broken state? Will it blow away changes without asking/showing me first? Will it refuse to do anything? Who knows.
It's a guessing game. The only way to be sure of what it will or won't do is to write procedural code and tests, so at least you know what decisions it will and won't make.
'Declarative' is just us fooling ourselves that we can make complexity easy to deal with.
To define something as declarative or imperative, it is important to compare the definition model to the execution model.
So I would rather say that Ansible is much less declarative than Terraform, because Ansible tasks (the different steps of an Ansible Playbook) are executed sequentially.
The tasks of Ansible are its statements, so yeah we would say that each Ansible task is declarative. And still, a requirement for that would be for the task to use a module/role which is idempotent, right?
Another proof, Ansible natively offers loop, blocks, and conditional to control the execution flow throughout its tasks.
(This is not a critic of Ansible. I am happy to use it as is, as a high-level scripting mechanism.)
There are so many answers here that are incorrect about what ansible is/does.
Thank you for providing one that is correct.
If I could add to your answer:
Not only is ansible declarative and idempotent but it also includes countless cloud provisioning modules to bring idempotency to cloud environments.
Terraform may be easier for basic provisioning of cloud resources but I always switch back to ansible cloud modules when I need to do anything complex. Ansible also has the added benefit of easily context switching over to configuring the compute resources after they have launched.
There are a lot of answers but none are geared to the beginner. I read the question as asking for an answer like the below-
To a large degree expressing something is a "poor choice" is an opinion, maybe expert, about optimizations, not about capabilities.
When one is learning, adopting the value judgements of experts is a form of premature optimization that actually prevents learning.
The only way to build your own opinions is through your own experience. You will need to have your own problems, and solve them using a variety of tools, to build your own opinions.
Try both tools in real problems, and the mental model that accrues in your experience will start to guide your opinions about ways to optimize your work.
Also- everybody is just making it up. And all tools suck.
Terraform is very good when it comes to declaring topologies: “there should be N items of this type, in this network, with these characteristics”. It remembers state; as you add or remove stuff to your topology, it will take care of doing all the necessary work to go from topology A to topology B, and detect any inconsistency.
I don’t know Ansible much, but I believe it’s more of a procedure-oriented system, where you declare the steps necessary to reach A, then again to go from A to B. This can be an issue if any item is actually not in the state you expected.
Terraform tracks and provisions cloud provider state.
Ansible you need to pass and parse Ansible output around which can take considerable time.
Terraform tells how your Infastructure should look like.
Ansible what software should be on your infrastructure/servers.
I tend to use Terraform to describe how the underlying Cloud infrastructure should look like.
I use Ansible to describe and configure what software should be running on those servers.
Usage cases:
Simply put Terraform cloud infrastructure provisioning.
Ansible server software and configuration files provisioning.
Technically Terraform tracks and provisions its own state. The cloud provider's state at any given time may be different, and Terraform may find it impossible to resolve the difference, leaving you to manually fix it.
Ansible (mostly) does not refuse to do anything just because the state changed. If you need to make sure something happens, you can be more confident Ansible will do it, because it doesn't care what the state was before now.
I like both of them. One thing interesting in Terraform is the ability to say I want to go from X to Y and see what will be the impact without actually doing the steps.
You can use Ansible to provision servers, it works, but if you do that a lot it's better to use Terraform. With Ansible you are a bit at a lower level and you need to manage the state of your system yourself. It's fine for 4 permanent VMs but not for more complicated infrastructures.
I'm not very familiar with Ansible but consider it to be somewhat interchangeable with Puppet (which I use extensively at work). You can certainly use Puppet to manage thousands of hosts but it entirely depends on other practices and technologies (an external node classifier in Puppet's case) to keep things manageable. I assume the same is true for Ansible.
Ansible is agentless, so the management of endpoints is entirely based on your server doing the scripting. It can use a static inventory file (text) or a dynamic one which can be served from anywhere (eg, sql query). Whenever you write playbooks you target groups of hosts based on tagging that’s done through inventory.
Ansible is primarily used for provisioning resources, on the other hand, Terraform is used for managing and deploying cloud resources. This differentiation falls fairly well in the concept of immutable infrastructure.
If you're familiar with Packer, then Packer is responsible for creating identical VM images which can be integrated to a CI pipeline and provisioned and baked using Ansible. This baked image is then deployed using Terraform.
Be advised that provisioning in Terraform during VM deployment is not recommended since it increases startup time of the machine. To perform ad hoc configuration management, you use Ansible.
You could very well use Ansible for managing and deploying cloud resources, but that's not what it's meant to do. Moreover, Ansible does not support the concept of state as does Terraform.
They somewhat compliment each other, they are not really alternatives to each other.
Usually Ansible is used for declaring the desired state of the individual servers for example you may use it to manage installed packages and configuration files on the servers.
Whereas with Terraform you declare the desired state of cloud resources for example you may ask Terraform to give you 5 EC2 instances, 1 RDS instance for DB and 1 S3 bucket for storage.
There's some overlap between them but what I've said is largely accurate.
The fundamental model behind Terraform is declarative. You use the Terraform language to define resources for your target system, e.g. a load balancer in AWS. You then run Terraform and it checks the desired configuration vs the running configuration, and it shows the differences. If the new config is what you want, you apply the changes, and it updates the production system.
Ansible is much more of an imperative system, sort of "executable YAML". You define a series of tasks in a YAML file. There are predefined tasks for standard things that you need to do when configuring a system, e.g. creating a directory or generating a config file by merging Ansible configuration variables with template. You can and should make these tasks idempotent, but as the system gets more complex, it becomes difficult and runtime can be slow as it compares tasks one by one to the running system.
Both systems suffer somewhat from difficulty in writing code. The fundamental task is to transform configuration variables and templates into running resources. To do that, you need loops, if/then/else logic, etc. Ansible has some constructs, but it is basically string manipulation, with a backdoor of being able to write modules in python. Terraform has a better syntax to define resources. Logic is generally things like ternary operator and list comprehensions. Terraform 0.12 improved this tremendously, but it is still somewhat weak. Ansible has a bit better management of config variables. Terraform tends to make you serialize things through environment vars, and it's awkward to define structure sometimes. Both would benefit greatly from first class functions and programming logic, even as they are "functional", just transforming data.
I love them both, and I hate them both. Terraform is best for provisioning complex infrastructure. Ansible is great for setting up instances, and it's easy for everyone to understand, dev and ops. Here is a complete example of deploying a complex, full-featured app to AWS using Terraform and Ansible: https://github.com/cogini/multi-env-deploy
I feel like we are suffering through a period where the tools are immature. People are focusing on syntax, but we are missing fundamental parts of the way the system should work. https://www.cogini.com/blog/is-it-time-for-lisp-in-devops/
The exact same thing is going on in the Kubernetes world. Back in the .com days, we would laugh at the "HTML programmers", but now we are "YAML programmers".
There are a couple of fundamental ways of managing the new cloud systems, all of which are better or worse depending on what you are doing. There are declarative systems like Terraform or CloudFormation. There is imperative with tasks, like Ansible. There are things that talk directly to the API like boto. There are tools like Pulumi which take a library approach in a general purpose programming language. Dockerfiles are crying out for higher level solutions, which are being developed. Ultimately I like the approach of a dedicated syntax like Terraform, but with more programming capability, or Pulumi.
Your description of Ansible makes me think of the Apache Ant build system. James Duncan Davidson's post mortem was illuminating. He never intended to create executable XML scripting.
Am noob. Have done a wee bit of CloudFormation, Docker, k8s. And once completed a Terraform howto. I've never touched Ansible, Chef, Puppet, etc.
I'd love a feature comparison matrix. Or maybe a decision flowchart on how to choose which tool for which job.
--
Update: This comparison was linked upthread. It's pretty good.
Ansible, Chef are for building configuration, you know menus, staff schedules, grocery lists. Ensure a restaurant is configured correctly to serve customers with its wait staff (Ansible)
You can use an excavator to configure the stuff inside the restaurant. People do it. It's just not generally the most efficient way to do it. And you could have the wait staff at a restaurant pouring concrete for their second place a town over. You could do that too, but a lot of people would use the excavator.
So what really makes these tools effective is when you start using them at scale. They start to become helpful once you realize how much you can do with how little, and they each have this same strength solving different levels, and their strengths become weaknesses at the other end.