Part 1 : Why engineers should bother about cloud security?
When I say engineers here I mean Particularly anyone who is coding for the cloud. And this post spcifically discusses AWS cloud.
The cloud is becoming more relevant with each passing day and we all know it is going to be the driving force for coming time. Most of us as engineers have already started pivoting our careers in that direction. One way or other most of us would be dealing with cloud. The terms VPC, Subnets, Security Groups, ECS, and so many doesn’t sound alien anymore. But have we really understood the foundation of this tectonic shift with the attention that it requires?
I have been thinking about it for sometime… and I find myself with some findings around which I plan to write up here. It is more about my thoughts, trying to connect the dots.
First and foremost, until now most of us have been more concerned about Cost when we thought about cloud. Cost savings was primary motive during first wave towards cloud optimisation. Ofcourse the agility inbuilt in cloud platform is what accelerated it. And hence you would notice many SaaS companies or consultancies that emerged during this first wave are focused towards cost optimization. Every other tool which lets you identify idle cloud resources, which allows us to schedule cloud shutdowns, which allows us to fetch a spot instance with lowest cost are prime examples of it. And that is understandable. But there is something more fundamental, and almost important aspect that we need to pay attention, that is Security.
Security is often an afterthough for many of us. We are engineers who love to solve problems, design solutions and quickly deploy it to cloud for that sense of achievement. And that is where is this critical security piece needs to thought of. With each new cloud service that makes a debut the whole cloud is becoming a dense place and we should better undertsand few basic aspects of it to make the journy more fun, secure and naviable.
I personally come from background where as an engineer I wouldn’t have to bother about the security except ports that we would us for communication among components, or encryption/decryption of application data etc. Big guns of security — network security, application security, endpoint security and likes of it would be taken care by experts. I would simply write my code, integrate well to make it a whole working machine, and I’m done. I would just offload whatever happens to that application to the deployment teams. Deployment teams — which would have network engineers, Database admins, Operations engineers and an army of experts. Seldom I would interact with them. In a sense I never bothered about security in a broader term.
But cloud has changed everything — and we all now know about DevOps. I wouldn’t get into that. Point being all those security and deployment aspects I wouldn’t bothet are now a part of my daily life. And perhaps we engineers don’t understand them very well yet.
Why is that happening now? There are two main reasons for it: first, cloud is abstraction which works with programmable APIs. We use all these APIs to bake our applications. Second, the abstraction typically wraps all the physical aspects of the hardware and instead give us software constructs like IAM, Security groups, Subnets etc. Many of us does not realize the implications of these two together: now we are the engineers who are controlling many physical aspects indirectly, and by virtue of APIs the speed at which this happens has tremendously increased. In short, the work which experts used to do for us need to be taken care by us.
And that is what Shared Responsibility model is all about. The most of security onus is on us. In fact, I’m sure many of you would have already encountered a Gartner warning somewhere : “Through 2025, 99% of cloud security failures will be the customer’s fault.”
AWS is expanding its offering aggresively. At this moment there are 200+ services in offering. Few of them, such as EC2, RDS, S3, SQS, Lambda are used heavily, while some are used marginally. However, there is one service which you would have to deal with no matter what, and that is called IAM (Identity and Access Management). This service forms the backbone AWS cloud and as you may already know it deals with Identity and Authorization.
Understanding IAM Model
If I say, IAM is now cornerstone of cloud security, it should not be an overstatement. IAM is what new network is, specially in serverless world. Understanding IAM is crucial for any well written application and its secure deployment in cloud.
In nutshell, IAM is about understanding what identities have which accesses and to what degree. Identity could be anything, be it a user which is logged in a console session or an API using a STS token. Identity could be compute like EC2 or even an abstracted compute such as lambda function. It is critical to understand that almost anything in cloud can assume an identity. That means any cloud resource (think of cloud resource as any entity which can either provide compute, network or storage) can assume identity which would have wide reaching consequences.
In AWS parlance identity would be called as ‘Principal’. These principals can be thought of as actors in cloud. Principals directly control what Actions are allowed.
Actions can be anything that can be initiated with API. Each cloud resource type would have its own set of actions that can be carried out. E.g. S3 have actions like Create/Delete/ListBucket. The action is second piece of information that forms the IAM model.
IAM also controls onto which Resources these actions can be performed. Allowing actions on the resources would be too straightforward, and that is where Conditions come into picture. Conditions check if any specific conditions such Principals, Tags etc are met before that action is allowed. E.g. S3 have conditions such as TagKeys, Prefix etc. The actions on any resources are always in Deny state to begin with, and that ensures unless user explicitly allows any action that specific action cannot be carried out. One need to Allow the actions on any resource.
When we put all these four terms together in one simple diagram it the IAM model becomes easier to understand:
If you now look at the above diagram, it becomes pretty much clear that the cloud resources can be both Principals as well as Resource. That means, in practice a Lambda function can execute another Lambda or any SQS can execute Lambda and so on.
If you pause for a moment and look at this, it would occur that now the resources can have direct and indirect relationships, some of which could be obvious and some not so obvious. These relationships can form a chain or a tree and may have disatarus consequences if not understood and dealt with. I will come to this important observation afterwards and cover it in detail.
IAM model helps us to understand at conceptual level. What realizes this model in practice is IAM Policy. IAM Policy is a document which combines all these four parts together. The IAM Policy can be attached to identities such as user, groups, roles or cloud resources, which then acquires the permissions (to perform specific actions within the constraints of conditions) listed in the IAM policy. In that sense IAM Policy is like a key, which can change hand easily. Just be careful to whom you hand over keys directly or indirectly.
There are two major ways the policies differ. Identity based policies: A policy is in effect only when it is attached to an IAM entity (user, group or role). When a policy specifying a Resource is attached to a user, this user is the Principal (Actor) of the action. Lets take a look at a simple IAM policy like one below:
In essence this policy allows the attached entity to list all S3 buckets in the account. Thus if this policy is attached to the lambda (through service role), the lambda would be able to list all S3 buckets, or if policy is attached to EC2, the EC2 would be able to list all S3 buckets.
Can there be scenarios with no identity? Absolutely! This is the case for anonymous access, or when an AWS service does not use a service role, such as an API Gateway, or when cross account access is required to given as in case of Lambda.
Now when there is no identity, identity-based policies can not be used and there comes into picture the other variation of IAM Policy — Resource based Policies. Resource based policy is used when you wish to give an access of a resource to a Principal (actor). In this policy we specify who has access to the resource and what actions they can perform on the resource. Resource bases policies are supported by a handful of AWS services only.
Take a look at below S3 bucket policy, which is used to give anonymous access to the bucket. In this policy you would notice a new field Principal, and this particular policy ‘*’ in Principal would mean everybody, allowing anonymous access to the S3 Objects.
Yet another very important class of policy is basically a resource-based policy and is known as Trust Policy. This allows services and identities to assume the role. For example, a cross-account access role can use below trust policy to allow access from a different account. Many third party SaaS services use these types of policies to access our AWS accounts.
This is the awesome power of IAM policy. And this is also the reason why one has to be absolutely aware of the content of the policy. Any misconfiguration here can quickly escalate to be a security risk.
This brings us to the IAM best practices that should be always followed. Lets take a look at few interesting observations regarding IAM policies.
- Least privilege (Grant least privilege): Each IAM should have the permissions which are absolutely required for the task at the hand, no less and no more.
While most of the time, we begin following this principle, over the period of time often temptation to reuse these IAM policies is hard to resist. It leads to fat IAM policies which also becomes vulnerable from security perspective. As a thumb rule, there should be one IAM policy per task. Resist the temptation to reuse policies.
Following one IAM Policy per task quickly leads to a good number of IAM policies in any sizeable cloud environment. If you ask anyone questions such as which IAM policy is mapped to what cloud resource, if any IAM policy is mapped to multiple cloud resources, it becomes mostly a guess work (or a daunting task using AWS CLI) unless you’re using a tool to help you. I would agrue we would need a tool which analyzes all the information (IAM policy and the cloud resources) and prepare some kind of map which visualizes all these relationships.
There are already few interesting open source tooling in this area to identify and visualize the IAM relationships:
Another problem that would be faced with least privilege principle is how to write a limiting access policy. Surprisingly it is hard to consistently write such policies at scale. For each such policy you need to consult AWS documentation for actions specific to the cloud resource. Going through AWS documentation for referring to allowed actions quickly becomes cumbersome in real world. An alternative is to use attribute based access control policies.
- Be careful with cross account IAM roles: Cross account roles opens a door to even a most secure cloud environment. This could be dangerous particularly when access is given to not so secure or unknown account and hence we need to be careful about them.
You should always know about cross account IAM roles. Unfortunately this is no easy task to do without help of any external tool. AWS has AWS Access Analyzer for this purpose do give it a try. Personally I feel it helps to visualize the cross account roles, to know if only legitimate AWS accounts are accessing our environment or not. The relationship between our cloud account and external cloud account can become clear and managebale with clear visualization.
Here is a list of Open source projects which can be useful either for writing secure policies, or identifying over provisioned fat IAM policies.
- https://github.com/Netflix/repokid — This tool can help to remove access to unused services. This comes from Netflix and that means it is battle tested tool. This tool uses another Netflix tool which need to be deployed and in that sense, it is little cumbersome to deploy and use. Read this excellent post for details.
Another set of tools which use CloudTrail for analyzing the IAM policies.
- https://github.com/flosell/trailscraper — a downside of this tool is false positives. Since it is a heuristic based tool which attempts to map the CloudTrail events to IAM actions sometimes it generates false positives.
- https://github.com/duo-labs/cloudtracker — this tool uses Athena for queries and may cost you some $$$.
- https://github.com/salesforce/policy_sentry/ — this tool was recently released and can be used to write least privilege policies.
- As engineers we need to understand IAM model and master the art of writing secure IAM policies.
- Cloud resources can have direct and indirect relationships and thus the IAM can have long reaching consequences.
- Visualizing these complex relationship would be very important going forward.
- Writing IAM policies which adhere to best practices is a daunting task, and some tool can really come handy to make life bit easy.
After going through a basic introduction to the IAM model, which is backbone of AWS cloud, I think we can build on this to thoroughly discuss the challenges we face during cloud development and secure production deployment. I intend to write to cover those topics in coming posts. Please do suggest topics that you think would be useful for us to discuss and understand.
Thanks for reading!