As most people familiar with OpenStack are already aware, it is made up of many software components that are typically deployed in a distributed manner. The more scalable an OpenStack deployment is, the more distributed the underlying components are as the infrastructure is usually scaled out horizontally on commodity hardware. As a consequence of this distributed architecture, there are many communication channels used between all of the software components. We have users communicating with the services via REST APIs and Dashboard, services communicating with each other via REST APIs and the message queue, services accessing databases, and so on. One only needs to look at the following simplified diagram to get an idea of the number of communication channels that there are.
Knowing about all of this communication taking place in an OpenStack deployment should raise a few questions. What communication channels need to be secured, and how can it be done? The OpenStack Security Guide attempts to address these questions at a high-level. The guidance can be summarized as “use SSL/TLS on both public facing and internal networks”. If you talk to those deploying OpenStack though, you will find that there are many different opinions on where and how SSL/TLS should be used. For example, some deployments will use SSL/TLS on public facing proxies only, leaving traffic on their internal networks in the clear. I don’t think that anyone really thinks that having unencrypted traffic on internal networks is more secure than encrypting it, but there are some with the opinion that it is unnecessary due to network security being “good enough”. I also think that technical difficulties in setting up SSL/TLS to protect all of these communication channels is a factor, especially when you start adding in complexities with load balancing and highly-available deployments. If actually deploying with SSL/TLS everywhere is too difficult, it makes it easier to accept the compromise of relying on network security alone internally. This is far from ideal.
The first thing one should do when evaluating their OpenStack SSL/TLS needs is to identify the threats. You can divide these threats into external and internal attacker categories, but the lines tend to get blurred since certain components of OpenStack operate on both the public and management networks.
For publicly facing services, the threats are pretty straight-forward. Users will be authenticating against Horizon and Keystone with their username and password. Users will also be accessing the API endpoints for other services using their Keystone tokens. If this network traffic is unencrypted, password and tokens can be intercepted by an attacker using a man-in-the-middle attack. The attacker can then use these valid credentials to perform malicious operations. All real deployments should be using SSL/TLS to protect publicly facing services.
For services that are deployed on internal networks, the threats aren’t so clear due to the bridging of security domains previously mentioned. There is always the chance that an administrator with access to the management network decides to do something malicious. SSL/TLS isn’t going to help in this situation if the attacker is allowed to access the private key. Not everyone on the management network would be allowed to access the private key of course, so there is still a lot of value in using SSL/TLS to protect yourself from internal attackers. Even if everyone that is allowed to access your management network is 100% trusted, there is still a threat that an unauthorized user gains access to your internal network by exploiting a misconfiguration or software vulnerability. One must keep in mind that you have users running their own code on instances in the OpenStack Compute nodes, which are deployed on the management network. If a vulnerability allows them to break out of the hypervisor, they will have access to your management network. Using SSL/TLS on the management network can minimize the damage that an attacker can cause.
It is generally accepted that it is best to encrypt sensitive data as early as possible and decrypt it as late as possible. Despite this best practice, it seems that it’s common to use a SSL/TLS proxy in front of the OpenStack services and use clear communication afterwards:
Let’s look at some of the reasons for the use of SSL/TLS proxies as pictured above:
- Native SSL/TLS in OpenStack services does not perform/scale as well as SSL proxies (particularly for Python implementations like Eventlet).
- Native SSL/TLS in OpenStack services not as well scrutinized/audited as more proven solutions.
- Native SSL/TLS configuration is difficult (not well documented, tested, or consistent across services).
- Privilege separation (OpenStack service processes should not have direct access to private keys used for SSL/TLS).
- Traffic inspection needs for load balancing.
All of the above are valid concerns, but none of the prevent SSL/TLS from being used on the management network. Let’s consider the following deployment model:
This is very similar to the previous diagram, but the SSL/TLS proxy is on the same physical system as the API endpoint. The API endpoint would be configured to only listen on the local network interface. All remote communication with the API endpoint would go through the SSL/TLS proxy. With this deployment model, we address a number of the bullet points above. A proven SSL implementation that performs well would be used. The same SSL proxy software would be used for all services, so SSL configuration for the API endpoints would be consistent. The OpenStack service processes would not have direct access to the private keys used for SSL/TLS, as you would run the SSL proxies as a different user and restrict access using permissions (and additionally mandatory access controls using something like SELinux). We would ideally have the API endpoints listen on a Unix socket such that we could restrict access to it using permissions and mandatory access controls as well. Unfortunately, this doesn’t seem to work currently in Eventlet from my testing. It is a good future development goal.
What about high availability or load balanced deployments that need to inspect traffic? The previous deployment model wouldn’t allow for deep packet inspection since the traffic is encrypted. If the traffic only needs to be inspected for basic routing purposes, it might not be necessary for the load balancer to have access to the unencrypted traffic. HAProxy has the ability to extract the SSL/TLS session ID during the handshake, which can then be used to achieve session affinity. HAProxy can also use the TLS Server Name Indication (SNI) extension to determine where traffic should be routed to. These features likely cover some of the most common load balancer needs. HAProxy would be able to just pass the HTTPS traffic straight through to the API endpoint systems in this case:
What if you want cryptographic separation of your external and internal environments? A public cloud provider would likely want their public facing services (or proxies) to use certificates that are issued by a CA that chains up to a trusted Root CA that is distributed in popular web browser software for SSL/TLS. For the internal services, one might want to instead use their own PKI to issue certificates for SSL/TLS. This cryptographic separation can be accomplished by terminating SSL at the network boundary, then re-encrypting using the internally issued certificates. The traffic will be unencrypted for a brief period on the public facing SSL/TLS proxy, but it will never be transmitted over the network in the clear. The same re-encryption approach that is used to achieve cryptographic separation can also be used if deep packet inspection is really needed on a load balancer. Here is what this deployment model would look like:
As with most things, there are trade-offs. The main trade-off is going to be between security and performance. Encryption has a cost, but so does being hacked. The security and performance requirements are going to be different for every deployment, so how SSL/TLS is used will ultimately be an individual decision.
What can be done in the OpenStack community to ensure that a secure deployment is as friendly as possible? After all, many of the deployment models described above don’t even use components of OpenStack to implement SSL/TLS.
On the documentation side of things, we can improve the OpenStack Security Guide to go into more detail about secure reference architectures. There’s no coverage on load balancers and highly available deployments with SSL/TLS, which would be a nice topic to cover. Nearly everything in the deployment models described above should work today.
On the development side of things, there are a number of areas where improvements can be made. I’ve focused on the server side SSL/TLS implementation of the API endpoints, but the OpenStack services all have client-side SSL/TLS implementations that are used when communicating with each other. Many of the improvements we can make are on the SSL/TLS client side of things:
- SSL/TLS client support in the OpenStack services isn’t well tested currently, as Devstack doesn’t have the ability to automatically configure the services for SSL/TLS.
- Tempest should perform SSL/TLS testing to ensure that everything remains working for secure deployments.
- The HTTP client implementations and configuration steps for SSL/TLS varies between OpenStack services. We should standardize in these areas for feature parity and ease of configuration.
- OpenStack services should support listening on Unix sockets instead of network interfaces. This would allow them to be locked down more securely when co-located with a SSL/TLS proxy.
It would be great if we can get some cross-project coordination on working towards these development goals in the Juno cycle, as I really think that we would have a more polished security story around the API endpoints. I’m hoping to get a chance to discuss this with other interested Stackers at the Summit in Atlanta.