Thu Jul 4, 2013
Last month, CNET announced that Facebook is working on implementing a property called forward secrecy in its encryption of user data. The article is pretty long, but the gist of it is:
- Forward secrecy is good news, at least theoretically. Right now, when you send data to Facebook’s servers, your data gets encrypted so that someone who intercepts your data can’t read it unless they have Facebook’s secret key. However, if an eavesdropper is recording your messages now and somehow gets the secret key in the future, they can go back and decrypt all your encrypted communications. Forward secrecy in an encryption protocol, by definition, means that if the secret key is compromised, your communications are still safe from decryption.
- Google is the only major web service that has forward secrecy in its encryption protocol enabled by default. Most services don’t do this because it’s computationally expensive.
- The leaked slides about the NSA surveillance program, PRISM, suggest that the NSA is tapping into Internet connections and collecting data in transit between your computer and Facebook’s servers, or at least that this is possible.
- Once your data is collected by the NSA, it can probably just sit around forever. That gives plenty of time for Facebook’s secret keys to be brute-forced or obtained by the NSA through a court order. Without perfect secrecy, all your data can be decrypted once this happens.
So in other words, Facebook is implementing an extra security measure to protect data in transit from users to their servers, and this announcement comes at an opportune time in light of the PRISM revelations in early July.
But data in transit isn’t the only kind of data that needs to be protected! What about data at rest?
To clarify what I mean here, let’s think about the flow of data from you to your friends when you make a FB post. This is simplified, but it goes something like:
- You type a message in a text box in your web browser and hit send.
- Your browser encrypts that message before it leaves your computer and goes on its way to a Facebook server.
- Your message travels over the Internet in encrypted form.
- Your message reaches a Facebook server, which decrypts the message and stores it in a database so that it can be retrieved later to be shown on your profile, in your friends’ news feeds, in searches, and so forth.
In Steps 3 and 4, your data goes from in transit to at rest. Here’s a handy diagram from Wikipedia that distinguishes the three states of data, which I’m simplifying into two by merging “data at rest” and “data in use”:
Nowadays, it’s expected for major web services to encrypt all data in transit by default using HTTPS, which uses the SSL/TLS cryptographic protocols. This is generally done by using a persistent private server key for both authentication (verifying the server’s identity) and encryption (encoding the data so that only the server can read it). This does not provide forward secrecy, and all encrypted messages are compromised once the persistent server key is found.
Facebook’s announcement, which follows one by Google in 2011, reflects a recent shift toward supporting forward secrecy by generating ephemeral Diffie Hellman keys for encryption during each session. A persistent RSA key is still used to authenticate the server. The ephemeral keys used for encryption, however, are not stored beyond a session. Thus, even if the persistent key is compromised, data that has been obtained through eavesdropping on an Internet connection is still safe from decryption.
However, recall that in Step 4 above, Facebook decrypts your data and stores it in a database, presumably one that is password-protected or has some other security measure that lets a trusted set of people access it (database admins, for instance). At this point, your data is just sitting around in decrypted form, protected by whatever-FB-does-to-protect-its-servers. It would be impractical to do otherwise because performing encrypted query operations efficiently is hard maths, and basically the definition of a web app is something that stores data and performs interesting/useful queries.
With Facebook’s announcement of support for forward secrecy in TLS/SSL, we’ve been assured of increased attention to security for data in transit, but this of course says nothing about data at rest. Indeed, it seems that the NSA surveillance leaks have sparked relatively little discussion of policies at companies like Facebook for securing data at rest. That’s surprising to me, because the oft-spoken phrase “NSA back door” vividly suggests that the NSA is in the trusted set of people who have access to the decrypted data in servers at FB, Google, and so forth.
To be clear, forward secrecy is effective against a particular adversary model, namely wiretapping. Although the CNET article points out that the NSA is probably doing a bunch of wiretapping these days and has agreements with Internet service providers to facilitate such, I assume it’s still easy for the NSA to walk up to someone at Facebook and demand access to the database. In fact, regardless of how secure data is while in transit, it basically always needs to get decrypted on the server side in order to be useful for a web service. As long as it’s easy for the web service to access the decrypted data, it’s easy for the NSA to do so as well*.
*Barring policy changes that would legally prevent surveillance. Even so …
In short, there probably exists no NSA-proof cryptographic protocol for securing data at rest so long as companies agree to comply with gov-authorized surveillance programs. Forward secrecy doesn’t seem like much of a deterrent to PRISM in that case.
But in terms of protecting our privacy from attackers who aren’t government spying agencies, the security of data at rest matters as much as, if not more than, security for data in transit. Unfortunately, whereas TLS/SSL is an established and widely-used standard for encrypting data in transit (albeit flawed in practice), procedures for securing data at rest seem to vary widely between companies (see Appendix A). These procedures are often described in vague terms if at all. For instance, one service simply states that it “provides multiple layers of security around your information, from access protected data centers, through network and application level security … Sensitive information, such as your email, messages and passwords, is always encrypted.” The methods of encryption, whether they are applied in transit or at rest, and other important details related to the security of the service are left out.
It’s hard to blame the product description writers for omitting these crucial details, because most users simply don’t care as long as they feel reasonably secure. It’s easy to feel secure with assurances like the ones quoted above, and news articles about implementations of forward secrecy in the works at FB. And then you get things like Twitter sending passwords in unencrypted form or Apple failing to properly authenticate SSL certificates or a whole Tumblr blog of websites that store and send passwords in plaintext, oops. There’s a difference between feeling secure and actually being secure on the web, clearly, but paranoia is tiring and you should probably go check on the 12 Facebook notifications that you got while reading this blog post.
Appendix A: A Brief, Non-Representative Survey of Company Policies for Securing Data at Rest
In the course of researching this blog post, I made a post on a certain social media site asking my friends in tech about how their companies secured non-sensitive data at rest, where non-sensitive data includes user-to-user communications and metadata (but not passwords, SSNs, and financial data). The few answers I got were more sophisticated than, “In plaintext, on a password-protected database,” but I suspect there’s some selection bias here. Anonymized responses below.
I’ve worked with so many apps, and it’s almost always a database behind a firewall. I’ve occasionally built systems where financial data was stored in some more secure way: one, where financial data and transaction processing happened on a separate box with no services and a very minimal API (to limit exposure), and where it was impossible to retrieve the sensitive information via the API.
The other one actually encrypted data using a key that was secured with the user’s password. It was re-encrypted using the user’s session on login, so the cleartext was only available when the user was logged in, during a request from that specific client. In this way, sensitive information was protected from our staff, and from an attacker, except that anyone who was actively using the system would be exposed (to a reasonably sophisticated attacker).
None of these ideas provide protection from a government, though. Client-side encryption doesn’t really, either. Just look at what happened to Hushmail. I really want there to be some trustable encryption API in the browser, so that client-side encryption for web apps could be a real thing.
We actually encrypt (with a separately stored per-user key) all of our user message content. It makes me way way more comfortable debugging problems that come up in production knowing I’m not going to accidentally read a message that was meant for a friend or co-worker. It’s pretty low-effort and low-resource-intensity for us to do this so it seems silly not to.
I guess the tradeoff is that for somebody like Facebook they’d lose the ability to do queries in aggregate to make advertising or similar decisions based on content.
… the keys [for encrypting messages] are stored with symmetric encryption. It defends us against outside attackers getting use out of dumps of our database but in theory with access to the layer that gets the messages and has the keys the encryption doesn’t matter. The practical limit when we’ve looked at doing more is one where as long as our site can display your messages then there’s some set of our infrastructure an attacker could use to read those.
We use client-side encryption with client-generated keys for user history/bookmarks/passwords, etc. So adding a device to access said data involves JPAKE (key exchange) via our servers (which are treated as an untrusted 3rd party).
This means syncing is hard, but we associate each record with a record ID and sync them if the hash changes.
For telemetry, we use anonymized data – uploads are linked to a UUID that only the client stores locally.