Category Archives: OpenStack

Leveraging Linux Platform for Identity Management in Enterprise Web Applications

I gave a presentation this past weekend at Linuxfest Northwest on the topic of using a collection of Apache HTTPD modules and SSSD to provide identity management for web applications.   This is an approach that is currently used by OpenStack, ManageIQ, and Foreman to allow the Apache HTTPD web server to handle all of the authentication and retrieval of user identity data and exposing it to the web applications.

This is a nice approach that can remove the complexity of interfacing directly with a centralized identity source from the web application itself, with all of the great advantages that SSSD and the underlying Linux platform provides.  It’s particularly useful when using FreeIPA, as you can get Kerberos single sign-on and use centralized host-based access control (HBAC) to control access to your web applications.  If you’re developing web applications that have to deal with any sort of authentication and authorization (which is pretty much everything), it’s worth taking a look.

The slides from the presentation are available here.

There is also a great write-up about this approach from FreeIPA developer Jan Pazdziora on the FreeIPA wiki here.

Keystone is not an authentication service

In my previous post about security related updates that are coming in Juno, I mentioned that Keystone itself is a poor identity management solution.  I feel that this is a topic that deserves a more thorough discussion.

If you ask people familiar with OpenStack what Keystone’s purpose is, I’m willing to bet many of the answers include the term authentication.  In my mind, Keystone’s main purpose is authorization within an OpenStack deployment.  It achieves this by issuing authorization tokens to authenticated users.  It’s true that Keystone can perform authentication itself, but that doesn’t mean that it should be used for authentication.

Let’s consider what Keystone natively offers from an authentication perspective. At it’s simplest, Keystone uses a SQL backend for identity.  This involves storing a user entry that contains a hash of the user’s password.  When a user authenticates to Keystone to obtain a token, they provide their clear-text password, which Keystone hashes and compares with it’s stored copy.  This is a common approach for simple password based authentication.  The big downsides here are that the authentication is a single factor, and the user actually has to send their clear text password over the network.  The transport should of course be protected with SSL/TLS to prevent their password from being intercepted, but it’s still possible for an administrator on the Keystone system to get a user’s clear-text password by modifying Keystone or dumping memory contents.  If a user uses this same password for other authentication systems outside of OpenStack, the user has opened up their exposure to attack from a malicious Keystone administrator.  Most password-based authentication services also have additional capabilities that attempt to improve security.  This includes capabilities such as:

  • password syntax and dictionary checking to prevent against weak passwords
  • password expiration to force a regular password change interval
  • password history to prevent against password reuse
  • account lockout after a fixed number of failed authentication attempts

Keystone itself has none of the capabilities mentioned above, and I feel that it’s not worth adding them given that these capabilities are widely available in services that are designed to be centralized authentication services.  The typical answer for someone looking to have things like password policies with Keystone is to use LDAP as an identity backend instead of SQL.

Most people I talk to using OpenStack have their users stored in an existing centralized authentication service.  This is typically an LDAP server, or Active Directory (which offers LDAP capabilities).  Keystone has an LDAP driver for the identity backend to allow it to use LDAP for authentication and storage of users and groups.  Let’s consider what benefits we gain over using the SQL backend from an authentication perspective.

Keystone’s LDAP code currently only supports the LDAP simple bind operation.  This works exactly as Keystone’s SQL authentication does, which is described above.  A user provides Keystone with their clear-text password, and Keystone then sends this clear-text password to the LDAP server, which then hashes the password and compares it to the stored hash to authenticate the user.  This really doesn’t give us any benefit, as the user still gives up their clear-text password which is sent over the network.  LDAP has support for stronger authentication mechanisms via SASL, but Keystone doesn’t have any support for SASL bind operations.  Most LDAP servers have password policy and account lockout capabilities as described above, so this does provide some additional security over what Keystone itself offers.  From an identity management standpoint, using an LDAP server also gets Keystone out of the business of user provisioning and maintenance and allows for a centralized identity source that can be shared with other applications.

Keystone has the ability to allow for stronger forms of authentication than simple password-based authentication via it’s “external” authentication method.  When the “external” authentication method is used, Keystone expects that the web server that Keystone is running behind performs user authentication and supplies Keystone with the authenticated username via the “REMOTE_USER” environment variable.  This allows one to run Keystone within Apache HTTPD, utilizing the myriad of available Apache authentication modules to allow for strong forms of authentication such as Kerberos or X.509 client certificates.  This gets us away from the downsides of password-based authentication, as the user is no longer giving up their secret.  A user entry is still stored within Keystone (or in an LDAP server that Keystone utilizes), but a password credential does not need to be stored.  If we extract authentication out in this way, Keystone is really just responsible for authorization, which is the way it should be.

What about situations where you do not want to store all of your users within a single database that is used by Keystone?  If we think about it, all that Keystone really needs to know is the identity of the authenticated user and any information that it needs to map OpenStack specific roles and projects to that user.  Determining which roles to apply to a user is often based on group membership, though other attributes could be taken into account.  Instead of requiring Keystone to handle the many variations of LDAP schema and LDAP connection management itself, it would be ideal to have the additional user information supplied in a simple form by the web server along with “REMOTE_USER”.  This is exactly how Keystone’s federation extension works to provide SAML support.  From Keystone’s perspective, it is simply provided with all of the pertinent information about an authenticated user along with the request for a token.  Keystone is then able to map the user information it is provided with to the project and roles that apply to the user.  This is much simpler for Keystone than having to go out and look that information up itself as it does when LDAP is used.  This same model should be usable for any authentication method or identity source, not just SAML federation.  A very nice benefit of this approach  is that it makes Keystone itself agnostic of the authentication method.

There is a very useful Apache HTTPD module that was written by a colleague of mine called mod_lookup_identity that helps to eliminate the need for identity lookup functionality in Keystone itself.  This module utilizes SSSD from the underlying platform to provide user and group information from various identity providers.  If you’re not familiar with SSSD, it’s a robust set of client-side daemons designed to use identity providers such as FreeIPA, Active Directory, or LDAP for authentication and identity information.  It provides some nice features that  are a big step up from Keystone’s LDAP code, such as caching, connection pooling, and the ability to simultaneously provide identities from multiple sources.  Using SSSD offloads quite a bit of complexity from Keystone, which leaves Keystone to focus on it’s main purpose of providing authorization tokens.  This should result in a more scalable and performant Keystone.  SSSD is available on most common Linux platforms (Fedora, RHEL, Debian, Ubuntu, OpenSUSE, Gentoo, etc.), so using mod_lookup_identity is a nice general purpose approach for Keystone to use.

A picture is worth a thousand words, so let’s see what this whole approach looks like.

httpd_modulesWhat’s nice about this approach is that Keystone doesn’t know or care what authentication method was used or where the user information comes from.  This means that you can simply swap out the Apache modules to support different methods of authentication.  You want to use X.509 client-certificates for authentication?  Configure mod_ssl instead of mod_auth_kerb.  You want to use SAML?  Configure mod_shib or mod_mellon.  As far as Keystone is concerned, you would simply need to update the mappings to apply role assignments based on the way the environment variables are provided by the Apache modules.

A very common use-case for Keystone is to leverage users from Active Directory.  With the current LDAP identity driver, this requires putting service users in Active Directory and potentially needing to create additional OpenStack specific groups.  This is often a non-starter for many environments, as this additional information is not allowed to be added to Active Directory.  The Juno changes I’ve described in my previous post will solve the service user portion of this, but not the groups portion.  In addition, authentication is still restricted to LDAP simple bind.  A much better solution is to use FreeIPA’s cross-realm Kerberos trust functionality to establish a trust relationship with Active Directory.  This approach allows users from trusted Active Directory domains to access Linux/Unix resources that are managed in a FreeIPA realm.  An Active Directory user can use their Kerberos TGT that was obtained when logging into their Windows system to be used to fetch service tickets for Kerberos enabled services that are set up in FreeIPA, such as Keystone and Horizon.  OpenStack specific groups can also be defined locally in FreeIPA.  These groups can contain external users and groups from Active Directory in addition to local users that are defined in FreeIPA.

SSSD has been designed to take advantage of the cross-realm Kerberos trust functionality in FreeIPA.  SSSD is able to extract the group information contained in the PAC structure in the Kerberos tickets, performing the group lookups against FreeIPA.  This allows FreeIPA groups that have external members (such as Active Directory groups) to be resolved, which results in a new PAC being generated that represents the combined group memberships from multiple identity sources.  In effect, this functionality can be used to bridge multiple identity ‘silos’ without resorting to messy solutions that duplicate user data.

The area of cross-realm Kerberos trusts is complex, and likely a bit confusing if you are not already familiar with these areas.  What it boils down to is that you can allow Active Directory users to use Kerberos single-sign on for authentication to Keystone, with the ability to create additional groups for role assignment purposes outside of Active Directory.  In addition, you could create other accounts in FreeIPA which would also be able to authenticate via Kerberos.  You could even establish multiple cross-realm trusts to allow users from multiple Active Directory forests to utilize the same Keystone server.  Here is a revised version of the previous diagram that shows cross-realm Kerberos trusts in use with Keystone:


As you can see, offloading the authentication and identity lookup from Keystone can allow for some much more complex scenarios that are very useful for allowing existing identity providers to be used.  It also provides some very nice security benefits by providing stronger forms of authentication than Keystone is currently capable of, some of which provide for a much nicer user experience than passwords due to single-sign on.

Juno Updates – Security

There is a lot of development work going on in Juno in security related areas.  I thought it would be useful to summarize what I consider to be some of the more notable efforts that are under way in the projects I follow.


Nearly everyone I talk with who is using Keystone in anger is integrating it with an existing identity store such as an LDAP server.  Using the SQL identity backend is really a poor identity management solution, as it only supports basic password authentication, there is lack of password policy support, and the user management capabilities are fairly limited.  Configuring Keystone to use an existing identity store has it’s challenges, but some of the changes in Juno should make this easier.  In Icehouse and earlier, Keystone can only use one single identity backend.  This means that all regular users and service users must exist in the same identity backend.  In many real-world scenarios, the LDAP server used for users and credentials is considered to be read-only by anything other than the normal user provisioning tools.  A common problem is that the OpenStack service users are not wanted in the LDAP server.  In Juno, it will be possible to configure Keystone to use multiple identity backends.  This will allow a deployment to use an LDAP server for normal users and the SQL backend for service users.  In addition, this should allow multiple LDAP servers to be used by a single Keystone instance when using Keystone Domains (which previously only worked with the SQL identity backend).

I mentioned above that Keystone’s SQL identity backend is not ideal.  In many ways, Keystone’s LDAP identity backend is also not ideal.  Authentication is currently limited to using the LDAP simple bind operation, which requires users to send their clear-text password to Keystone, which is then sent to the LDAP server (hopefully all over SSL/TLS protected connections).   Keystone already allows for stronger authentication via external authentication, but there are some barriers to adoption that should be eliminated in Juno.  Using external authentication requires that Keystone is run in Apache httpd.  Unfortunately, the size of Keystone’s PKI formatted tokens can easily get large enough to cause problems with the httpd/mod_wsgi interaction due to the amount of service catalog information contained within the token.  In Juno, the default token format is a compressed PKI format called PKIZ.  This significantly reduces the size of tokens such that running Keystone in httpd is feasible.  This will make it possible for Keystone deployments to leverage a number of httpd modules that allow for strong forms of authentication such as Kerberos and X.509 client-certificates.  The Keystone team even switched all of it’s gate jobs to use httpd recently, as it is considered to be the recommended deployment method going forward.

Keystone has an existing Federation extension that allows one to define mappings to translate SAML assertions into Keystone role assignments.  In Juno, this mapping functionality is being made more general purpose to allow it to be used with external authentication via Apache httpd modules.  This will allow for some very interesting use-cases, such as having Apache provide all of the user and group information necessary to figure out role assignment without the need for Keystone to maintain it’s own identity store.  For example, one should be able to use httpd with mod_lookup_identity to allow SSSD on the underlying platform to provide Keystone with all of the user and group information from an external backend identity provider (FreeIPA, Active Directory).  This offloads all of the LDAP complexity to SSSD, which provides LDAP connection pooling and caching to allow for continued service even if the LDAP server is down.  Combined with strong authentication like Kerberos, this offers a performant and secure approach to authentication and identity while leaving Keystone to focus on it’s main task of authorization within OpenStack.


The Barbican project looks to be progressing nicely through the incubation process.  Barbican was initially designed with a plug-in model, with a single hardware security module plug-in.  There has been quite a bit of interest in implementing various new plug-ins, which has highlighted the need to re-architect the plug-in interface to allow for multiple plug-in types.  This re-architecture has has been one of the big focus items of the Juno cycle thus far, as it affects any new functionality that is being implemented as a plug-in.

A plug-in was implemented to allow the Dogtag PKI DRM (data recovery manager) subsystem to be used for key generation and storage.  This should allow for easy integration for those with existing Dogtag installations as well as being an attractive well-proven key archival and recovery solution for new deployments.

Barbican is expanding it’s functionality to allow for more than it’s initial use-case of storing (and optionally generating) symmetric keys.  The ability to store asymmetric key pairs and certificates is being added as well in Juno, which is of particular interest for things like LBSaaS.  The ability to act as an interface for handing certificate requests and interacting with a CA (certificate authority) are also being worked on, with plug-ins for Dogtag and Symantec CAs.


The Kite project continues to work it’s way through implementation, albeit slowly.  There has only been one developer working part-time on the implementation, so it’s not likely to be in a usable state until the Kilo release.  The good news is that an additional contributor has recently stared contributing to this effort, working from the Oslo messaging library side of things.  Hopefully this speeds things along so that Kite is available for other projects to start developing against early in the Kilo cycle.


A new Session object has been added to Keystoneclient for use by the client code in other projects.  The Session object centralizes the responsibility of authentication and transport handling.  From a security standpoint, this is very nice since it centralizes all of the HTTPS client-side code across all projects as opposed to the current situation of having many different implementations.  The configuration of things like certificate validation will also be consistent across all projects once they utilize the Session object, which is a particular  item that is a bit of a mess in Icehouse and earlier releases.  Nova has already been converted to use the new Session object, and conversion of the Cinder and Neutron projects is in progress.

There has been some work in Devstack that is worth noting, even though it’s code doesn’t directly affect actual deployments.  Patches have been proposed to allow Devstack to easily be set up with SSL/TLS support for many of the main services.  The goal of this work is to allow SSL/TLS to eventually be regularly tested as a part of the continuous integration gate jobs.  This is an area that is not currently well tested, even though it is crucial functionality for secure deployments.  This work should result in more robust SSL/TLS code, which benefits everyone.

Restricting the abilities of Keystone tokens

Discussions around Keystone during the Juno Design Summit got me thinking about Keystone tokens and all of the places they get sent once they leave the user’s hands.  A user usually gets a token from Keystone, then initiates an operation against a service like Nova.  This service may then send off the users token to another service  like Glance or Cinder to perform some portion of the intended operation.  A users token may much more powerful that is needed for the intended operation, so there is a lot of trust that the user is putting into OpenStack services to not use the token in unintended or unauthorized ways.  I feel that there is a lot of improvement that we can make here.  Before discussing these improvements, we should review the current behavior around scoped Keystone tokens.

Current behavior

With Keystone, one can currently get tokens that are scoped to a project, or tokens that are unscoped.  The idea of an unscoped token is that it can later be used to authenticate to Keystone to get a project scoped token.  Aside from this ability, there is no use for an unscoped token as it can’t be used to perform any actions on other services since it contains no roles and no project.  The unscoped token simply proves that a user was authenticated by Keystone.

With the current functionality, one is allowed to use an existing token to authenticate to Keystone to get a new token with a different project scope.  The token used to authenticate can be scoped or unscoped.  This allows one to switch from project to project without providing their actual user credentials.  It also allows one to use a project scoped token to get an unscoped token.

Desired behavior

The current behavior has some convenience properties, but it provides no actual security benefit.  The project and roles in a token are designed to limit what the token actually can be used for.  Given that Keystone tokens are bearer tokens, one needs to be concerned about any service that has access to the token performing operations that were not intended by the user.  All services that receive the token must be trusted to not maliciously use the token.

Ideally, one would be able to supply a token that is only authorized to perform the intended task to minimize the exposure if the token is obtained my a malicious party.  Unfortunately, this is not currently possible.  If a user has a low set of privileges on a particular project, a token scoped to that project can still be used to obtain a token scoped to another project that might potentially have a high set of privileges.  Essentially, this means that any token for a particular user can indirectly be used to perform any action that user is allowed to perform.

To allow for a user to create restricted tokens, we would need to make some changes to the way Keystone works when authenticating using an existing token.  The number one rule should be that one can only use an existing token to create a more restrictive token.  Said another way, privilege elevation should not be possible when authenticating with an existing Keystone token.

Project scoped tokens

In order to support the desired behavior, one would no longer be able to change the project scope by solely using an existing project scoped token.  If the ability to change project scope without supplying the original user credential is needed, one should use an unscoped token to obtain a project scoped token.

An unscoped token should be analogous to a TGT in Kerberos.  The unscoped token proves that a user authenticated to Keystone, but it can’t actually be used to perform any action other than obtaining a scoped token.  The only way to obtain an unscoped token should be for the user to use their original credentials to authenticate to Keystone.

To describe how this would work, let’s consider the Horizon use-case.  A user would authenticate to Horizon as usual via the login form.  Horizon would then obtain an unscoped token from Keystone on the user’s behalf.  When a project is selected in Horizon, the unscoped token will be used to authenticate to Keystone to obtain a new token that is scoped for the selected project.  Horizon will retain the unscoped token in case a new project is selected before the user logs out or the session expires.

Role-filtered tokens

Project scoped tokens that can’t be used for privilege escalation are a step in the right direction, but it would be better to have the ability to create even more restrictive tokens to perform specific tasks.  Ideally, a token would only have the roles required to perform the operations that it will be used for.  This would require the ability to request a token
that only contains a subset of the roles that a user has on a project.  I refer to this concept as role-filtered tokens.

The way that role filtered tokens would work is that a user can request a new token using an existing token, but they would include a list of roles that they want this new token to contain.  Keystone would only honor this request if the original token that is used for authentication contains the requested roles.  Keystone would also check if the requested
roles are still valid in the assignment database.  This behavior would allow the capabilities of a token to be restricted, and anyone that has possession of the restricted token would not be able to use it to obtain a new token with greater capabilities.

To utilize role-filtered tokens, one would need to have some knowledge of the roles that are required to perform the operations that the token will be used for, but I think that this is fine in some circumstances.  It might be possible to eventually allow a service to read a central policy to determine what roles are needed to perform a particular operation on
another service so it can restrict a token before sending if off.  I think that this would have a significant security benefit over the current way that Keystone tokens work.

Secure messaging with Kite

OpenStack uses message queues for communication between services.  The messages sent through the message queues are used for command and control operations as well as notifications.  For example, nova-scheduler uses the message queues to control instances on nova-compute nodes.  This image from Nova’s developer documentation gives a high-level idea of how message queues are used:


It’s obvious that the messages being sent are critical to the functioning of an OpenStack deployment.  Actions are taken on behalf of the messages that are received, which means that the contents of the message need to be trusted.  This calls for secure messaging.  Before discussing how messages can be secured, we need to define what makes a message secure.

A secure message has integrity and confidentiality.  Message integrity means that  the sender has been authenticated by the recipient and that the message is tamper-proof.  Think of this like an imprinted wax seal on an envelope.  The imprint is used to identify the sender and ensure that it is authentic.  An unbroken seal indicates that the contents have not been tampered with after the sender sealed it.  This is usually accomplished by computing a digital signature over the contents of the message.  Message confidentiality means that the message is only readable by it’s intended recipient.  This is usually accomplished by using encryption to protect the message contents.

Messages are not well protected in OpenStack today.  Once a message is on the queue, no further authorization checks of that message are performed.  In many OpenStack deployments, the only thing used to protect who can put messages on the queue is basic network isolation.  If one has network access to the message broker, messages can be placed on a queue.

It is possible to configure the message broker for authentication.  This authenticates the sender of the message to the broker itself.  Rules can then be defined to restrict who can put messages on specific queues.  While this authentication is a good thing, it leaves a lot to be desired:

  • The sender doesn’t know who it is really talking to, as the broker is not authenticated to the sender.
  • Messages are not protected from tampering or eavesdropping.
  • The recipient is unable to authenticate the sender, allowing one sender to impersonate another.

SSL/TLS can be enabled on the message broker to protect the transport.  This buys us a few things over the authentication described above.  The broker is now authenticated to the sender by virtue of certificate trust and validation.  Messages are also protected from eavesdropping and tampering between the sender and the broker as well as between the recipient and the broker.  This still leaves us with a few security gaps:

  • Messages are not protected from tampering or eavesdropping by the broker itself.
  • The recipient is still unable to authenticate the sender.

Utilizing the message broker security described above looks like this:


You can see that what is protected here is the transport between each communicating party and the broker itself.  There is no guarantee that the message going into the broker is the same when it comes out of the broker.

Kite is designed to improve upon the existing message broker security by establishing a trust relationship between the sender and recipient that can be used to protect the messages themselves.  This trust is established by knowledge of shared keys.  Kite is responsible for the generation and secure distribution of signing and encryption keys to communicating parties.  The keys used to secure messages are only known by the sender, recipient, and Kite itself.  Once these keys are distributed to the communicating parties, then can be used to ensure message integrity and confidentiality.  Sending a message would look like this:


You can see that the message itself is what is protected, and doesn’t rely on protection of the transport.  The broker is unable to view or manipulate the contents of the message since it does not have the keys used to protect it.

In order for Kite to securely distribute keys to the communicating parties, a long-term shared secret needs to be established between Kite and each individual communicating party.  The long-term shared secret allows Kite and an individual party to trust each other by proving knowledge of the shared secret.  Once a long-term shared secret is established, it is never sent over the network.

In the diagram below, we can see that two parties each have a unique long-term shared secret that is only known by themselves and Kite, which is depicted as the Key Distribution Service (KDS):


When one party wants to send a message to another party, it requests a ticket from Kite.  A ticket request is signed by the requestor using it’s long-term shared secret.  This allows Kite to validate the request by checking the signature using the long-term shared secret of the requesting party.  This signature serves to authenticate the requestor to Kite and ensure that it has not been tampered with.  Conceptually, this looks like this:


When Kite receives a valid ticket request, it generates a new set of signing and encryption keys.  These keys are only for use between two specific parties, and only for messages being sent in one direction.  The actual contents of the ticket request are shown below:

 "metadata": <Base64 encoded metadata object>,
 "signature": <HMAC signature over metadata>

 "source": "",
 "destination": "",
 "timestamp": "2012-03-26T10:01:01.720000",
 "nonce": 1234567890

The contents of the ticket request are used by Kite to generate the signing and encryption keys.  The timestamp and nonce are present to allow Kite to check for replay attacks.

For signing and encryption key generation, Kite uses the HMAC-based Key Derivation Function (HKDF) as described in RFC 5869.  The first thing that Kite does is to generate an intermediate key.  This intermediate key is generated by using the sender’s long-term shared secret and a random salt as inputs to the HKDF Extract function:

intermediate_key = HKDF-Extract(salt, source_key)

The intermediate key, sender and recipient names (as provided in the ticket request),  and a timestamp from Kite are used as inputs into the HKDF Expand function, which outputs the key material that is used as the signing and encryption keys:

keys = HKDF-Expand(intermediate_key, info, key_size)

info = "<source>,<dest>,<timestamp>"

Once the signing and encryption keys are generated, they are returned to the ticket requestor as a part of a response that is signed with the requestor’s long-term shared secret:


Since the ticket response is signed using the requestor’s long-term shared secret, the requestor is able to validate the ticket response truly came from Kite since nobody else has knowledge of the long-term shared secret.  The contents of the ticket response are shown below:

    "metadata": <Base64 encoded metadata object>,
    "ticket": <Ticket object encrypted with source's key>,
    "signature": <HMAC signature over metadata + ticket>

    "source": "",
    "destination": "",
    "expiration": "2012-03-26T11:01:01.720000"

    "skey": <Base64 encoded message signing key>,
    "ekey": <Base64 encoded message encryption key>,
    "esek": <Key derivation info encrypted with destination's key>

    "key": <Base64 encoded intermediate key>,
    "timestamp": <Timestamp from KDS>
    "ttl": <Time to live for the keys>

We can see that the actual key material in ticket is encrypted using the requestor’s long-term shared secret.  Only the requestor will be able to decrypt this portion of the response to extract the keys.  We will discuss the esek portion of the response in more detail when we get to sending messages.  For now, it is important to note that esek is an encrypted payload that Kite created for the destination party.  It is encrypted using the destination party’s long-term shared secret, so it is an encrypted blob as far as the source party is concerned.  A conceptual diagram of the ticket response should make this clear:


When sending a secured message, the sender will use an envelope that contains information the recipient needs to derive the keys, a signature, and a flag indicating if the message is encrypted.  The envelope looks like this:

    _METADATA_KEY: MetaData,
    _MESSAGE_KEY: Message,
    _SIGNATURE_KEY: Signature

    'source': <sender>,
    'destination': <receiver>,
    'timestamp': <timestamp>,
    'nonce': <64bit unsigned number>,
    'esek': <Key derivation info encrypted with destination's key>,
    'encryption': <true | false>

The following diagram shows how the envelope is used when sending a message:


Upon receipt of a secured message, the recipient can decrypt esek using it’s long-term shared secret.  It can trust that the contents of esek are from Kite since nobody else has knowledge of the shared secret.  The contents of esek along with the source and destination from the envelope contain all of the information that is needed to derive the signing and encryption keys.

    "key": <Base64 encoded intermediate key>,
    "timestamp": <Timestamp from KDS>
    "ttl": <Time to live for the keys>

HKDF-Expand(intermediate_key, info, key_size)

info = "<source>,<dest>,<timestamp>"

We perform this derivation step on the recipient to force the recipient to validate the source and destination from the metadata in the message envelope.  It the source and destination were somehow modified, the correct keys would not be able to be derived from esek.  This provides the recipient with a guarantee that Kite generated the signing and encryption keys specifically for the correct source and destination.

Once the signing and encryption keys are derived, the recipient can validate the signature and decrypt the message if necessary.  The end result after deriving the signing and encryption keys looks like this:


Signing and encryption keys are only valid for a limited period of time.  The validity period is a policy determination set by the person deploying Kite.  In general, the validity period represents the amount of exposure you would have if a particular pair of signing and encryption keys were compromised.  In general, it is advisable to have a short validity period.

This validity period is defined by the expiration timestamp in the ticket response for the sender, and the timestamp + ttl in esek for the recipient.  While keys are still valid, a sender can reuse them without needing to contact Kite.  The esek payload for the recipient is still sent with every message, and the recipient derives the signing and encryption keys for every message.  When the keys expire, the sender needs to send a new ticket request to Kite to get a new set of keys.

Kite also supports sending secure group messages, though the workflow is slightly different than it is for direct messaging.  Groups can be defined in Kite, but a group does not have a long-term shared secret associated with it.  When a ticket is requested with a group as the destination, Kite will generate a temporary key that is associated with the group if a current key does not already exist.  This group key is used as the destination’s long-term shared secret.  When a recipient receives a message where the destination is a group, it contacts Kite to request the group key.  Kite will deliver the group key to a group member encrypted using the member’s long-term shared secret.  A group member can then use the group key to access the information needed to derive the keys needed to verify and decrypt the secured group message.

The usage of a temporary group key prevents the need to have a log-term shared secret shared amongst all of the group members.  If a group member becomes compromised, they can be removed from the group in Kite to cut-off access to any future group keys.  Using a short group key lifetime limits the exposure in this situation, and it also doesn’t require changing the shared secret across all group members since a new shared secret will be generated upon expiration.

There is one flaw in the group messaging case that is important to point out.  All members of a group will have access to the same signing and encryption keys once they have received a message.  This allows a group member to impersonate the original sender who requested the keys.  This means one compromised group member is able to send falsified messages to other members within the same group.  This is a limitation due to the use of symmetric cryptography.  It would be possible to improve upon this by using public-key cryptography for message signing.  There is a session scheduled at the Juno design summit to discuss this.

There are a number of areas to look into making future improvements in Kite.  Improving the group messaging solution as mentioned above is an obvious area to investigate.  It would also be a good idea to look into using Barbican to store the long-term shared secrets.  There have been brief discussions around adding policies to Kite to be able to restrict which parties are allowed to request tickets for certain recipients.

Kite is currently being implemented as a standalone service within the Key Management (Barbican) project.  Patches are landing, and hopefully we get to something initially usable in the Juno timeframe.  To utilize Kite, changes will also be needed in oslo.messaging.  An ideal way of dealing with this would be to allow for “message security” plug-ins to Oslo.  A Kite plug-in would allow services to use Kite for secure messages, but other plug-ins can be developed if alternate solutions come along (such as a public-key based solution as mentioned above).  This would allow Oslo to remain relatively static in this area as capabilities around secure messaging change.  There is a lot that can be done around secure messaging, and I think that Kite looks like a great step forward.

Security auditing of OpenStack releases

I was recently asked some high-level security related questions about OpenStack.  This included questions such as:

  • What cryptographic algorithms are used, and are the algorithms user configurable?
  • What implementations are used for cryptographic functions?
  • How is sensitive data handled?

These are common questions for those evaluating and deploying OpenStack, as they want to see if it meets their security requirements and know what security related areas they need to watch out for when configuring everything.

Unfortunately, I have no good answer to these questions, as this information isn’t really collected anywhere (unless you want to go code diving).  OpenStack is also much too large for any single person to provide easy answers due to the number of projects involved (we’re up to 12 integrated projects not counting Devstack as of the Icehouse release by my count).  That’s a lot of code to review to come up with accurate answers.

The answers to these security questions also change from release to release, as the development teams are always marching forward improving existing features and adding new ones.  If one were to conduct their own audit of all of the integrated projects for a particular OpenStack release, it would quickly be time to start over again for the next release due to the 6-month release cycle.

I feel that the answers to these questions are also invaluable for developers, not just evaluators and deployers.  If we don’t know where are weak points are from a security perspective, how can we hope to improve or eliminate them?  Many projects are also solving the same security related issues, but not necessarily in a consistent manner.  If we have a comprehensive security overview of all OpenStack projects, we can identify areas of inconsistency and duplication.  This can serve to identify areas where we can improve things.

What form would this information take to be easily consumable for deployers and developers both?  For starters, we would want to see the following information collected in a single place for each project:

  • Implemented crypto – any cryptography directly implemented in OpenStack code (not used via an external library).
  • Used crypto – any libraries that are used to provide cryptographic functionality.
  • Hashing algorithms – What hashing algorithms are used, and for what purpose?  Is the algorithm configurable or optional to use?
  • Encryption algorithms – What encryption algorithms are used, and for what purpose?  Is the algorithm configurable or optional to use?
  • Sensitive data – What sensitive data is handled?  How is it protected by default, and are their optional features that can be configured to protect it further?
  • Potential improvements – What are potential areas that things can be improved from a security perspective?

So with that said, I went code diving and took a pass at collecting this security information for Keystone.  Keystone seemed like an obvious place to start given it’s role within the OpenStack infrastructure.  Here is what I put together:

This information would be collected for each project for a specific OpenStack release.  A top-level release page would collect links to the individual project pages.  This could even contain a high-level summary such as listing all crypto algorithms and libraries used across all projects.  Here’s an example that I put together for the upcoming Icehouse release:

My hope is that there is interest in collecting (and maintaining) this security related information from all of the development teams for the integrated projects.  The Keystone page I created can be used to discuss the most useful format, which we can then use as an example for the rest of the projects.  Once an initial pass is done for one OpenStack release, keeping this information up to date as things change with new development should not a very big task.  We would simply need to be vigilant during code reviews to identify when code changes are made that require changes to the wiki pages.  It would also be fairly easy to look over the bug fixes and blueprints when a milestone is reached to double-check if any security related functionality was changes.

If we get through a successful first pass at collecting this information for all projects, it would probably make sense to have a cross-project discussion or even an in-person security hackfest to go over the results together to work on consistency issues and removing duplication (moving some security related things into Oslo maybe?).  It would be great to get a group of security interested developers from each project together to discuss this at the Atlanta Summit.

SSL/TLS Everywhere – visions of a secure OpenStack

As most people familiar with OpenStack are already aware, it is made up of many software components that are typically deployed in a distributed manner.  The more scalable an OpenStack deployment is, the more distributed the underlying components are as the infrastructure is usually scaled out horizontally on commodity hardware.  As a consequence of this distributed architecture, there are many communication channels used between all of the software components.  We have users communicating with the services via REST APIs and Dashboard, services communicating with each other via REST APIs and the message queue, services accessing databases, and so on.  One only needs to look at the following simplified diagram to get an idea of the number of communication channels that there are.


Knowing about all of this communication taking place in an OpenStack deployment should raise a few questions.  What communication channels need to be secured, and how can it be done?  The OpenStack Security Guide attempts to address these questions at a high-level.  The guidance can be summarized as “use SSL/TLS on both public facing and internal networks”.  If you talk to those deploying OpenStack though, you will find that there are many different opinions on where and how SSL/TLS should be used.  For example, some deployments will use SSL/TLS on public facing proxies only, leaving traffic on their internal networks in the clear.  I don’t think that anyone really thinks that having unencrypted traffic on internal networks is more secure than encrypting it, but there are some with the opinion that it is unnecessary due to network security being “good enough”.  I also think that technical difficulties in setting up SSL/TLS to protect all of these communication channels is a factor, especially when you start adding in complexities with load balancing and highly-available deployments.  If actually deploying with SSL/TLS everywhere is too difficult, it makes it easier to accept the compromise of relying on network security alone internally.  This is far from ideal.

The first thing one should do when evaluating their OpenStack SSL/TLS needs is to identify the threats.  You can divide these threats into external and internal attacker categories, but the lines tend to get blurred since certain components of OpenStack operate on both the public and management networks.

For publicly facing services, the threats are pretty straight-forward.  Users will be authenticating against Horizon and Keystone with their username and password.  Users will also be accessing the API endpoints for other services using their Keystone tokens.  If this network traffic is unencrypted, password and tokens can be intercepted by an attacker using a man-in-the-middle attack.  The attacker can then use these valid credentials to perform malicious operations.  All real deployments should be using SSL/TLS to protect publicly facing services.

For services that are deployed on internal networks, the threats aren’t so clear due to the bridging of security domains previously mentioned.  There is always the chance that an administrator with access to the management network decides to do something malicious.  SSL/TLS isn’t going to help in this situation if the attacker is allowed to access the private key.  Not everyone on the management network would be allowed to access the private key of course, so there is still a lot of value in using SSL/TLS to protect yourself from internal attackers.  Even if everyone that is allowed to access your management network is 100% trusted, there is still a threat that an unauthorized user gains access to your internal network by exploiting a misconfiguration or software vulnerability.  One must keep in mind that you have users running their own code on instances in the OpenStack Compute nodes, which are deployed on the management network.  If a vulnerability allows them to break out of the hypervisor, they will have access to your management network.  Using SSL/TLS on the management network can minimize the damage that an attacker can cause.

It is generally accepted that it is best to encrypt sensitive data as early as possible and decrypt it as late as possible.  Despite this best practice, it seems that it’s common to use a SSL/TLS proxy in front of the OpenStack services and use clear communication afterwards:


Let’s look at some of the reasons for the use of SSL/TLS proxies as pictured above:

  • Native SSL/TLS in OpenStack services does not perform/scale as well as SSL proxies (particularly for Python implementations like Eventlet).
  • Native SSL/TLS in OpenStack services not as well scrutinized/audited as more proven solutions.
  • Native SSL/TLS configuration is difficult (not well documented, tested, or consistent across services).
  • Privilege separation (OpenStack service processes should not have direct access to private keys used for SSL/TLS).
  • Traffic inspection needs for load balancing.

All of the above are valid concerns, but none of the prevent SSL/TLS from being used on the management network.  Let’s consider the following deployment model:


This is very similar to the previous diagram, but the SSL/TLS proxy is on the same physical system as the API endpoint.  The API endpoint would be configured to only listen on the local network interface.  All remote communication with the API endpoint would go through the SSL/TLS proxy.  With this deployment model, we address a number of the bullet points above.  A proven SSL implementation that performs well would be used.  The same SSL proxy software would be used for all services, so SSL configuration for the API endpoints would be consistent.  The OpenStack service processes would not have direct access to the private keys used for SSL/TLS, as you would run the SSL proxies as a different user and restrict access using permissions (and additionally mandatory access controls using something like SELinux).  We would ideally have the API endpoints listen on a Unix socket such that we could restrict access to it using permissions and mandatory access controls as well.  Unfortunately, this doesn’t seem to work currently in Eventlet from my testing.  It is a good future development goal.

What about high availability or load balanced deployments that need to inspect traffic?  The previous deployment model wouldn’t allow for deep packet inspection since the traffic is encrypted.  If the traffic only needs to be inspected for basic routing purposes, it might not be necessary for the load balancer to have access to the unencrypted traffic.  HAProxy has the ability to extract the SSL/TLS session ID during the handshake, which can then be used to achieve session affinity.  HAProxy can also use the TLS Server Name Indication (SNI) extension to determine where traffic should be routed to.  These features likely cover some of the most common load balancer needs.  HAProxy would be able to just pass the HTTPS traffic straight through to the API endpoint systems in this case:


What if you want cryptographic separation of your external and internal environments?  A public cloud provider would likely want their public facing services (or proxies) to use certificates that are issued by a CA that chains up to a trusted Root CA that is distributed in popular web browser software for SSL/TLS.   For the internal services,  one might want to instead use their own PKI to issue certificates for SSL/TLS.  This cryptographic separation can be accomplished by terminating SSL at the network boundary, then re-encrypting using the internally issued certificates.  The traffic will be unencrypted for a brief period on the public facing SSL/TLS proxy, but it will never be transmitted over the network in the clear.  The same re-encryption approach that is used to achieve cryptographic separation can also be used if deep packet inspection is really needed on a load balancer.  Here is what this deployment model would look like:


As with most things, there are trade-offs.  The main trade-off is going to be between security and performance.  Encryption has a cost, but so does being hacked.  The security and performance requirements are going to be different for every deployment, so how SSL/TLS is used will ultimately be an individual decision.

What can be done in the OpenStack community to ensure that a secure deployment is as friendly as possible?  After all, many of the deployment models described above don’t even use components of OpenStack to implement SSL/TLS.

On the documentation side of things, we can improve the OpenStack Security Guide to go into more detail about secure reference architectures.  There’s no coverage on load balancers and highly available deployments with SSL/TLS, which would be a nice topic to cover.  Nearly everything in the deployment models described above should work today.

On the development side of things, there are a number of areas where improvements can be made.  I’ve focused on the server side SSL/TLS implementation of the API endpoints, but the OpenStack services all have client-side SSL/TLS implementations that are used when communicating with each other.  Many of the improvements we can make are on the SSL/TLS client side of things:

  • SSL/TLS client support in the OpenStack services isn’t well tested currently, as Devstack doesn’t have the ability to automatically configure the services for SSL/TLS.
  • Tempest should perform SSL/TLS testing to ensure that everything remains working for secure deployments.
  • The HTTP client implementations and configuration steps for SSL/TLS varies between OpenStack services.  We should standardize in these areas for feature parity and ease of configuration.
  • OpenStack services should support listening on Unix sockets instead of network interfaces.  This would allow them to be locked down more securely when co-located with a SSL/TLS proxy.

It would be great if we can get some cross-project coordination on working towards these development goals in the Juno cycle, as I really think that we would have a more polished security story around the API endpoints.  I’m hoping to get a chance to discuss this with other interested Stackers at the Summit in Atlanta.