Monthly Archives: April 2014

Secure messaging with Kite

OpenStack uses message queues for communication between services.  The messages sent through the message queues are used for command and control operations as well as notifications.  For example, nova-scheduler uses the message queues to control instances on nova-compute nodes.  This image from Nova’s developer documentation gives a high-level idea of how message queues are used:


It’s obvious that the messages being sent are critical to the functioning of an OpenStack deployment.  Actions are taken on behalf of the messages that are received, which means that the contents of the message need to be trusted.  This calls for secure messaging.  Before discussing how messages can be secured, we need to define what makes a message secure.

A secure message has integrity and confidentiality.  Message integrity means that  the sender has been authenticated by the recipient and that the message is tamper-proof.  Think of this like an imprinted wax seal on an envelope.  The imprint is used to identify the sender and ensure that it is authentic.  An unbroken seal indicates that the contents have not been tampered with after the sender sealed it.  This is usually accomplished by computing a digital signature over the contents of the message.  Message confidentiality means that the message is only readable by it’s intended recipient.  This is usually accomplished by using encryption to protect the message contents.

Messages are not well protected in OpenStack today.  Once a message is on the queue, no further authorization checks of that message are performed.  In many OpenStack deployments, the only thing used to protect who can put messages on the queue is basic network isolation.  If one has network access to the message broker, messages can be placed on a queue.

It is possible to configure the message broker for authentication.  This authenticates the sender of the message to the broker itself.  Rules can then be defined to restrict who can put messages on specific queues.  While this authentication is a good thing, it leaves a lot to be desired:

  • The sender doesn’t know who it is really talking to, as the broker is not authenticated to the sender.
  • Messages are not protected from tampering or eavesdropping.
  • The recipient is unable to authenticate the sender, allowing one sender to impersonate another.

SSL/TLS can be enabled on the message broker to protect the transport.  This buys us a few things over the authentication described above.  The broker is now authenticated to the sender by virtue of certificate trust and validation.  Messages are also protected from eavesdropping and tampering between the sender and the broker as well as between the recipient and the broker.  This still leaves us with a few security gaps:

  • Messages are not protected from tampering or eavesdropping by the broker itself.
  • The recipient is still unable to authenticate the sender.

Utilizing the message broker security described above looks like this:


You can see that what is protected here is the transport between each communicating party and the broker itself.  There is no guarantee that the message going into the broker is the same when it comes out of the broker.

Kite is designed to improve upon the existing message broker security by establishing a trust relationship between the sender and recipient that can be used to protect the messages themselves.  This trust is established by knowledge of shared keys.  Kite is responsible for the generation and secure distribution of signing and encryption keys to communicating parties.  The keys used to secure messages are only known by the sender, recipient, and Kite itself.  Once these keys are distributed to the communicating parties, then can be used to ensure message integrity and confidentiality.  Sending a message would look like this:


You can see that the message itself is what is protected, and doesn’t rely on protection of the transport.  The broker is unable to view or manipulate the contents of the message since it does not have the keys used to protect it.

In order for Kite to securely distribute keys to the communicating parties, a long-term shared secret needs to be established between Kite and each individual communicating party.  The long-term shared secret allows Kite and an individual party to trust each other by proving knowledge of the shared secret.  Once a long-term shared secret is established, it is never sent over the network.

In the diagram below, we can see that two parties each have a unique long-term shared secret that is only known by themselves and Kite, which is depicted as the Key Distribution Service (KDS):


When one party wants to send a message to another party, it requests a ticket from Kite.  A ticket request is signed by the requestor using it’s long-term shared secret.  This allows Kite to validate the request by checking the signature using the long-term shared secret of the requesting party.  This signature serves to authenticate the requestor to Kite and ensure that it has not been tampered with.  Conceptually, this looks like this:


When Kite receives a valid ticket request, it generates a new set of signing and encryption keys.  These keys are only for use between two specific parties, and only for messages being sent in one direction.  The actual contents of the ticket request are shown below:

 "metadata": <Base64 encoded metadata object>,
 "signature": <HMAC signature over metadata>

 "source": "",
 "destination": "",
 "timestamp": "2012-03-26T10:01:01.720000",
 "nonce": 1234567890

The contents of the ticket request are used by Kite to generate the signing and encryption keys.  The timestamp and nonce are present to allow Kite to check for replay attacks.

For signing and encryption key generation, Kite uses the HMAC-based Key Derivation Function (HKDF) as described in RFC 5869.  The first thing that Kite does is to generate an intermediate key.  This intermediate key is generated by using the sender’s long-term shared secret and a random salt as inputs to the HKDF Extract function:

intermediate_key = HKDF-Extract(salt, source_key)

The intermediate key, sender and recipient names (as provided in the ticket request),  and a timestamp from Kite are used as inputs into the HKDF Expand function, which outputs the key material that is used as the signing and encryption keys:

keys = HKDF-Expand(intermediate_key, info, key_size)

info = "<source>,<dest>,<timestamp>"

Once the signing and encryption keys are generated, they are returned to the ticket requestor as a part of a response that is signed with the requestor’s long-term shared secret:


Since the ticket response is signed using the requestor’s long-term shared secret, the requestor is able to validate the ticket response truly came from Kite since nobody else has knowledge of the long-term shared secret.  The contents of the ticket response are shown below:

    "metadata": <Base64 encoded metadata object>,
    "ticket": <Ticket object encrypted with source's key>,
    "signature": <HMAC signature over metadata + ticket>

    "source": "",
    "destination": "",
    "expiration": "2012-03-26T11:01:01.720000"

    "skey": <Base64 encoded message signing key>,
    "ekey": <Base64 encoded message encryption key>,
    "esek": <Key derivation info encrypted with destination's key>

    "key": <Base64 encoded intermediate key>,
    "timestamp": <Timestamp from KDS>
    "ttl": <Time to live for the keys>

We can see that the actual key material in ticket is encrypted using the requestor’s long-term shared secret.  Only the requestor will be able to decrypt this portion of the response to extract the keys.  We will discuss the esek portion of the response in more detail when we get to sending messages.  For now, it is important to note that esek is an encrypted payload that Kite created for the destination party.  It is encrypted using the destination party’s long-term shared secret, so it is an encrypted blob as far as the source party is concerned.  A conceptual diagram of the ticket response should make this clear:


When sending a secured message, the sender will use an envelope that contains information the recipient needs to derive the keys, a signature, and a flag indicating if the message is encrypted.  The envelope looks like this:

    _METADATA_KEY: MetaData,
    _MESSAGE_KEY: Message,
    _SIGNATURE_KEY: Signature

    'source': <sender>,
    'destination': <receiver>,
    'timestamp': <timestamp>,
    'nonce': <64bit unsigned number>,
    'esek': <Key derivation info encrypted with destination's key>,
    'encryption': <true | false>

The following diagram shows how the envelope is used when sending a message:


Upon receipt of a secured message, the recipient can decrypt esek using it’s long-term shared secret.  It can trust that the contents of esek are from Kite since nobody else has knowledge of the shared secret.  The contents of esek along with the source and destination from the envelope contain all of the information that is needed to derive the signing and encryption keys.

    "key": <Base64 encoded intermediate key>,
    "timestamp": <Timestamp from KDS>
    "ttl": <Time to live for the keys>

HKDF-Expand(intermediate_key, info, key_size)

info = "<source>,<dest>,<timestamp>"

We perform this derivation step on the recipient to force the recipient to validate the source and destination from the metadata in the message envelope.  It the source and destination were somehow modified, the correct keys would not be able to be derived from esek.  This provides the recipient with a guarantee that Kite generated the signing and encryption keys specifically for the correct source and destination.

Once the signing and encryption keys are derived, the recipient can validate the signature and decrypt the message if necessary.  The end result after deriving the signing and encryption keys looks like this:


Signing and encryption keys are only valid for a limited period of time.  The validity period is a policy determination set by the person deploying Kite.  In general, the validity period represents the amount of exposure you would have if a particular pair of signing and encryption keys were compromised.  In general, it is advisable to have a short validity period.

This validity period is defined by the expiration timestamp in the ticket response for the sender, and the timestamp + ttl in esek for the recipient.  While keys are still valid, a sender can reuse them without needing to contact Kite.  The esek payload for the recipient is still sent with every message, and the recipient derives the signing and encryption keys for every message.  When the keys expire, the sender needs to send a new ticket request to Kite to get a new set of keys.

Kite also supports sending secure group messages, though the workflow is slightly different than it is for direct messaging.  Groups can be defined in Kite, but a group does not have a long-term shared secret associated with it.  When a ticket is requested with a group as the destination, Kite will generate a temporary key that is associated with the group if a current key does not already exist.  This group key is used as the destination’s long-term shared secret.  When a recipient receives a message where the destination is a group, it contacts Kite to request the group key.  Kite will deliver the group key to a group member encrypted using the member’s long-term shared secret.  A group member can then use the group key to access the information needed to derive the keys needed to verify and decrypt the secured group message.

The usage of a temporary group key prevents the need to have a log-term shared secret shared amongst all of the group members.  If a group member becomes compromised, they can be removed from the group in Kite to cut-off access to any future group keys.  Using a short group key lifetime limits the exposure in this situation, and it also doesn’t require changing the shared secret across all group members since a new shared secret will be generated upon expiration.

There is one flaw in the group messaging case that is important to point out.  All members of a group will have access to the same signing and encryption keys once they have received a message.  This allows a group member to impersonate the original sender who requested the keys.  This means one compromised group member is able to send falsified messages to other members within the same group.  This is a limitation due to the use of symmetric cryptography.  It would be possible to improve upon this by using public-key cryptography for message signing.  There is a session scheduled at the Juno design summit to discuss this.

There are a number of areas to look into making future improvements in Kite.  Improving the group messaging solution as mentioned above is an obvious area to investigate.  It would also be a good idea to look into using Barbican to store the long-term shared secrets.  There have been brief discussions around adding policies to Kite to be able to restrict which parties are allowed to request tickets for certain recipients.

Kite is currently being implemented as a standalone service within the Key Management (Barbican) project.  Patches are landing, and hopefully we get to something initially usable in the Juno timeframe.  To utilize Kite, changes will also be needed in oslo.messaging.  An ideal way of dealing with this would be to allow for “message security” plug-ins to Oslo.  A Kite plug-in would allow services to use Kite for secure messages, but other plug-ins can be developed if alternate solutions come along (such as a public-key based solution as mentioned above).  This would allow Oslo to remain relatively static in this area as capabilities around secure messaging change.  There is a lot that can be done around secure messaging, and I think that Kite looks like a great step forward.

Security auditing of OpenStack releases

I was recently asked some high-level security related questions about OpenStack.  This included questions such as:

  • What cryptographic algorithms are used, and are the algorithms user configurable?
  • What implementations are used for cryptographic functions?
  • How is sensitive data handled?

These are common questions for those evaluating and deploying OpenStack, as they want to see if it meets their security requirements and know what security related areas they need to watch out for when configuring everything.

Unfortunately, I have no good answer to these questions, as this information isn’t really collected anywhere (unless you want to go code diving).  OpenStack is also much too large for any single person to provide easy answers due to the number of projects involved (we’re up to 12 integrated projects not counting Devstack as of the Icehouse release by my count).  That’s a lot of code to review to come up with accurate answers.

The answers to these security questions also change from release to release, as the development teams are always marching forward improving existing features and adding new ones.  If one were to conduct their own audit of all of the integrated projects for a particular OpenStack release, it would quickly be time to start over again for the next release due to the 6-month release cycle.

I feel that the answers to these questions are also invaluable for developers, not just evaluators and deployers.  If we don’t know where are weak points are from a security perspective, how can we hope to improve or eliminate them?  Many projects are also solving the same security related issues, but not necessarily in a consistent manner.  If we have a comprehensive security overview of all OpenStack projects, we can identify areas of inconsistency and duplication.  This can serve to identify areas where we can improve things.

What form would this information take to be easily consumable for deployers and developers both?  For starters, we would want to see the following information collected in a single place for each project:

  • Implemented crypto – any cryptography directly implemented in OpenStack code (not used via an external library).
  • Used crypto – any libraries that are used to provide cryptographic functionality.
  • Hashing algorithms – What hashing algorithms are used, and for what purpose?  Is the algorithm configurable or optional to use?
  • Encryption algorithms – What encryption algorithms are used, and for what purpose?  Is the algorithm configurable or optional to use?
  • Sensitive data – What sensitive data is handled?  How is it protected by default, and are their optional features that can be configured to protect it further?
  • Potential improvements – What are potential areas that things can be improved from a security perspective?

So with that said, I went code diving and took a pass at collecting this security information for Keystone.  Keystone seemed like an obvious place to start given it’s role within the OpenStack infrastructure.  Here is what I put together:

This information would be collected for each project for a specific OpenStack release.  A top-level release page would collect links to the individual project pages.  This could even contain a high-level summary such as listing all crypto algorithms and libraries used across all projects.  Here’s an example that I put together for the upcoming Icehouse release:

My hope is that there is interest in collecting (and maintaining) this security related information from all of the development teams for the integrated projects.  The Keystone page I created can be used to discuss the most useful format, which we can then use as an example for the rest of the projects.  Once an initial pass is done for one OpenStack release, keeping this information up to date as things change with new development should not a very big task.  We would simply need to be vigilant during code reviews to identify when code changes are made that require changes to the wiki pages.  It would also be fairly easy to look over the bug fixes and blueprints when a milestone is reached to double-check if any security related functionality was changes.

If we get through a successful first pass at collecting this information for all projects, it would probably make sense to have a cross-project discussion or even an in-person security hackfest to go over the results together to work on consistency issues and removing duplication (moving some security related things into Oslo maybe?).  It would be great to get a group of security interested developers from each project together to discuss this at the Atlanta Summit.

SSL/TLS Everywhere – visions of a secure OpenStack

As most people familiar with OpenStack are already aware, it is made up of many software components that are typically deployed in a distributed manner.  The more scalable an OpenStack deployment is, the more distributed the underlying components are as the infrastructure is usually scaled out horizontally on commodity hardware.  As a consequence of this distributed architecture, there are many communication channels used between all of the software components.  We have users communicating with the services via REST APIs and Dashboard, services communicating with each other via REST APIs and the message queue, services accessing databases, and so on.  One only needs to look at the following simplified diagram to get an idea of the number of communication channels that there are.


Knowing about all of this communication taking place in an OpenStack deployment should raise a few questions.  What communication channels need to be secured, and how can it be done?  The OpenStack Security Guide attempts to address these questions at a high-level.  The guidance can be summarized as “use SSL/TLS on both public facing and internal networks”.  If you talk to those deploying OpenStack though, you will find that there are many different opinions on where and how SSL/TLS should be used.  For example, some deployments will use SSL/TLS on public facing proxies only, leaving traffic on their internal networks in the clear.  I don’t think that anyone really thinks that having unencrypted traffic on internal networks is more secure than encrypting it, but there are some with the opinion that it is unnecessary due to network security being “good enough”.  I also think that technical difficulties in setting up SSL/TLS to protect all of these communication channels is a factor, especially when you start adding in complexities with load balancing and highly-available deployments.  If actually deploying with SSL/TLS everywhere is too difficult, it makes it easier to accept the compromise of relying on network security alone internally.  This is far from ideal.

The first thing one should do when evaluating their OpenStack SSL/TLS needs is to identify the threats.  You can divide these threats into external and internal attacker categories, but the lines tend to get blurred since certain components of OpenStack operate on both the public and management networks.

For publicly facing services, the threats are pretty straight-forward.  Users will be authenticating against Horizon and Keystone with their username and password.  Users will also be accessing the API endpoints for other services using their Keystone tokens.  If this network traffic is unencrypted, password and tokens can be intercepted by an attacker using a man-in-the-middle attack.  The attacker can then use these valid credentials to perform malicious operations.  All real deployments should be using SSL/TLS to protect publicly facing services.

For services that are deployed on internal networks, the threats aren’t so clear due to the bridging of security domains previously mentioned.  There is always the chance that an administrator with access to the management network decides to do something malicious.  SSL/TLS isn’t going to help in this situation if the attacker is allowed to access the private key.  Not everyone on the management network would be allowed to access the private key of course, so there is still a lot of value in using SSL/TLS to protect yourself from internal attackers.  Even if everyone that is allowed to access your management network is 100% trusted, there is still a threat that an unauthorized user gains access to your internal network by exploiting a misconfiguration or software vulnerability.  One must keep in mind that you have users running their own code on instances in the OpenStack Compute nodes, which are deployed on the management network.  If a vulnerability allows them to break out of the hypervisor, they will have access to your management network.  Using SSL/TLS on the management network can minimize the damage that an attacker can cause.

It is generally accepted that it is best to encrypt sensitive data as early as possible and decrypt it as late as possible.  Despite this best practice, it seems that it’s common to use a SSL/TLS proxy in front of the OpenStack services and use clear communication afterwards:


Let’s look at some of the reasons for the use of SSL/TLS proxies as pictured above:

  • Native SSL/TLS in OpenStack services does not perform/scale as well as SSL proxies (particularly for Python implementations like Eventlet).
  • Native SSL/TLS in OpenStack services not as well scrutinized/audited as more proven solutions.
  • Native SSL/TLS configuration is difficult (not well documented, tested, or consistent across services).
  • Privilege separation (OpenStack service processes should not have direct access to private keys used for SSL/TLS).
  • Traffic inspection needs for load balancing.

All of the above are valid concerns, but none of the prevent SSL/TLS from being used on the management network.  Let’s consider the following deployment model:


This is very similar to the previous diagram, but the SSL/TLS proxy is on the same physical system as the API endpoint.  The API endpoint would be configured to only listen on the local network interface.  All remote communication with the API endpoint would go through the SSL/TLS proxy.  With this deployment model, we address a number of the bullet points above.  A proven SSL implementation that performs well would be used.  The same SSL proxy software would be used for all services, so SSL configuration for the API endpoints would be consistent.  The OpenStack service processes would not have direct access to the private keys used for SSL/TLS, as you would run the SSL proxies as a different user and restrict access using permissions (and additionally mandatory access controls using something like SELinux).  We would ideally have the API endpoints listen on a Unix socket such that we could restrict access to it using permissions and mandatory access controls as well.  Unfortunately, this doesn’t seem to work currently in Eventlet from my testing.  It is a good future development goal.

What about high availability or load balanced deployments that need to inspect traffic?  The previous deployment model wouldn’t allow for deep packet inspection since the traffic is encrypted.  If the traffic only needs to be inspected for basic routing purposes, it might not be necessary for the load balancer to have access to the unencrypted traffic.  HAProxy has the ability to extract the SSL/TLS session ID during the handshake, which can then be used to achieve session affinity.  HAProxy can also use the TLS Server Name Indication (SNI) extension to determine where traffic should be routed to.  These features likely cover some of the most common load balancer needs.  HAProxy would be able to just pass the HTTPS traffic straight through to the API endpoint systems in this case:


What if you want cryptographic separation of your external and internal environments?  A public cloud provider would likely want their public facing services (or proxies) to use certificates that are issued by a CA that chains up to a trusted Root CA that is distributed in popular web browser software for SSL/TLS.   For the internal services,  one might want to instead use their own PKI to issue certificates for SSL/TLS.  This cryptographic separation can be accomplished by terminating SSL at the network boundary, then re-encrypting using the internally issued certificates.  The traffic will be unencrypted for a brief period on the public facing SSL/TLS proxy, but it will never be transmitted over the network in the clear.  The same re-encryption approach that is used to achieve cryptographic separation can also be used if deep packet inspection is really needed on a load balancer.  Here is what this deployment model would look like:


As with most things, there are trade-offs.  The main trade-off is going to be between security and performance.  Encryption has a cost, but so does being hacked.  The security and performance requirements are going to be different for every deployment, so how SSL/TLS is used will ultimately be an individual decision.

What can be done in the OpenStack community to ensure that a secure deployment is as friendly as possible?  After all, many of the deployment models described above don’t even use components of OpenStack to implement SSL/TLS.

On the documentation side of things, we can improve the OpenStack Security Guide to go into more detail about secure reference architectures.  There’s no coverage on load balancers and highly available deployments with SSL/TLS, which would be a nice topic to cover.  Nearly everything in the deployment models described above should work today.

On the development side of things, there are a number of areas where improvements can be made.  I’ve focused on the server side SSL/TLS implementation of the API endpoints, but the OpenStack services all have client-side SSL/TLS implementations that are used when communicating with each other.  Many of the improvements we can make are on the SSL/TLS client side of things:

  • SSL/TLS client support in the OpenStack services isn’t well tested currently, as Devstack doesn’t have the ability to automatically configure the services for SSL/TLS.
  • Tempest should perform SSL/TLS testing to ensure that everything remains working for secure deployments.
  • The HTTP client implementations and configuration steps for SSL/TLS varies between OpenStack services.  We should standardize in these areas for feature parity and ease of configuration.
  • OpenStack services should support listening on Unix sockets instead of network interfaces.  This would allow them to be locked down more securely when co-located with a SSL/TLS proxy.

It would be great if we can get some cross-project coordination on working towards these development goals in the Juno cycle, as I really think that we would have a more polished security story around the API endpoints.  I’m hoping to get a chance to discuss this with other interested Stackers at the Summit in Atlanta.