2 December 2014 10:01 PM (programming | algorithms | security | timing attacks | guile)
Imagine you have a web site that people can access via a password. No user name, just a password. There are a number of valid passwords for your service. Determining whether a password is in that set is security-sensitive: if a user has a valid password then they get access to some secret information; otherwise the site emits a 404. How do you determine whether a password is valid?
The go-to solution for this kind of problem for most programmers is a hash table. A hash table is a set of key-value associations, and its nice property is that looking up a value for a key is quick, because it doesn't have to check against each mapping in the set.
Hash tables are commonly implemented as an array of buckets, where each bucket holds a chain. If the bucket array is 32 elements long, for example, then keys whose hash is H are looked for in bucket H mod 32. The chain contains the key-value pairs in a linked list. Looking up a key traverses the list to find the first pair whose key equals the given key; if no pair matches, then the lookup fails.
Unfortunately, storing passwords in a normal hash table is not a great idea. The problem isn't so much in the hash function (the hash in H = hash(K)) as in the equality function; usually the equality function doesn't run in constant time. Attackers can detect differences in response times according to when the "not-equal" decision is made, and use that to break your passwords.
Edit: Some people are getting confused by my use of the term "password". Really I meant something more like "secret token", for example a session identifier in a cookie. I thought using the word "password" would be a useful simplification but it also adds historical baggage of password quality, key derivation functions, value of passwords as an attack target for reuse on other sites, etc. Mea culpa.
So let's say you ensure that your hash table uses a constant-time string comparator, to protect against the hackers. You're safe! Or not! Because not all chains have the same length, "interested parties" can use lookup timings to distinguish chain lookups that take 2 comparisons compared to 1, for example. In general they will be able to determine the percentage of buckets for each chain length, and given the granularity will probably be able to determine the number of buckets as well (if that's not a secret).
Well, as we all know, small timing differences still leak sensitive information and can lead to complete compromise. So we look for a data structure that takes the same number of algorithmic steps to look up a value. For example, bisection over a sorted array of size SIZE will take ceil(log2(SIZE)) steps to get find the value, independent of what the key is and also independent of what is in the set. At each step, we compare the key and a "mid-point" value to see which is bigger, and recurse on one of the halves.
One problem is, I don't know of a nice constant-time comparison algorithm for (say) 160-bit values. (The "passwords" I am thinking of are randomly generated by the server, and can be as long as I want them to be.) I would appreciate any pointers to such a constant-time less-than algorithm. However a bigger problem is that the time it takes to access memory is not constant; accessing element 0 of the sorted array might take more or less time than accessing element 10. In algorithms we typically model access on a more abstract level, but in hardware there's a complicated parallel and concurrent protocol of low-level memory that takes a non-deterministic time for any given access. "Hot" (more recently accessed) memory is faster to read than "cold" memory.
Non-deterministic memory access leaks timing information, and in the case of binary search the result is disaster: the attacker can literally bisect the actual values of all of the passwords in your set, by observing timing differences. The worst!
You could get around this by ordering "passwords" not by their actual values but by their cryptographic hashes (e.g. by their SHA256 values). This would force the attacker to bisect not over the space of password values but of the space of hash values, which would protect actual password values from the attacker. You still leak some timing information about which paths are "hot" and which are "cold", but you don't expose actual passwords.
It turns out that, as far as I am aware, it is impossible to design a key-value map on common hardware that runs in constant time and is sublinear in the number of entries in the map. As Zooko put it, running in constant time means that the best case and the worst case run in the same amount of time. Of course this is false for bucket-and-chain hash tables, but it's false for binary search as well, as "hot" memory access is faster than "cold" access. The only plausible constant-time operation on a data structure would visit each element of the set in the same order each time. All constant-time operations on data structures are linear in the size of the data structure. Thems the breaks! All you can do is account for the leak in your models, as we did above when ordering values by their hash and not their normal sort order.
Once you have resigned yourself to leaking some bits of the password via timing, you would be fine using normal hash tables as well -- just use a cryptographic hashing function and a constant-time equality function and you're good. No constant-time less-than operator need be invented. You leak something on the order of log2(COUNT) bits via timing, where COUNT is the number of passwords, but since that's behind a hash you can't use it to bisect on actual key values. Of course, you have to ensure that the hash table isn't storing values in sorted order and short-cutting early. This sort of detail isn't usually part of the contract of stock hash table implementations, so you probably still need to build your own.
Edit: People keep mentioning Cuckoo hashing for some reason, despite the fact that it's not a good open-hashing technique in general (Robin Hood hashes with linear probing are better). Thing is, any operation on a data structure that does not touch all of the memory in the data structure in exactly the same order regardless of input leaks cache timing information. That's the whole point of this article!
An alternative is to encode your data structure differently, for example for the "key" to itself contain the value, signed by some private key only known to the server. But this approach is limited by network capacity and the appropriateness of copying for the data in question. It's not appropriate for photos, for example, as they are just too big.
Edit: Forcing constant-time on the data structure via sleep() or similar calls is not a good mitigation. This either massively slows down your throughput, or leaks information via side channels. Remote attackers can measure throughput instead of latency to determine how long an operation takes.
Corrections appreciated from my knowledgeable readers :) I was quite disappointed when I realized that there were no good constant-time data structures and would be happy to be proven wrong. Thanks to Darius Bacon, Zooko Wilcox-O'Hearn, Jan Lehnardt, and Paul Khuong on Twitter for their insights; all mistakes are mine.
2 December 2014 11:02 PM
Here is the non-programmers question... could you defuse timing attacks like you are asking by using the worst case time as the standard time. So if you know that you have a worst case of 100 ms then you set a timer that waits until 100ms before returning no matter how fast stuff got done internally. Or does that have a different problem set?
2 December 2014 11:05 PM
Unfortunately, it is even worst than that. The row and column decoders in your RAM are really binary trees, with O(log(n)) levels of logic gates, so memory access can't get better than that. But even that is optimistic. With a finite speed of light and discrete bits that encompase finite volume, you can't get better than the cube-root of the size of your data structure.
3 December 2014 0:11 AM
Nice to see you're looking into capabilities, Wingo.
Why not use the output of a symmetric cipher as the capability? You could choose the plaintext to be easy to validate and leave only a few tens of bits for actual content. Then, checking the validity of the request does not require searching through what data you actually have stored.
3 December 2014 0:21 AM
Um, since these are capabilities and you'll probably hand out quite a few it would be smart to choose the structure of the plaintext to be resistant to offline cryptanalysis. Maybe modern symmetric block cyphers are good enough on their own? To do: research.
3 December 2014 0:21 AM
I may be missing something, but would adding a random wait time (in an appropriate range) to your computation solve the leak problem? Something like "jamming the signal".
3 December 2014 0:28 AM
Davi: The problem with adding a random delay is that you can just repeat the experiment often enough and eventually the delay will average out to a constant delay, exposing the underlying variation that the attacker is looking for. Of course, this requires the attacker to make many requests, but if someone's running timing attacks on you then they're clearly already prepared to make lots of requests, and the impediment which you've added will be fairly small.
3 December 2014 1:31 AM
Possibly something using bloom filters? Adds and lookups are constant time, and they're extremely space efficient.
You could tell in constant time if an element definitely was not in the set of "passwords", or if it "probably was". You could reduce the probability of a false positive by choosing an optimal number of hash functions.
3 December 2014 1:35 AM
Cuckoo hashing should solve this problem I think - lookup is always a constant 2 reads into the hashtable. (normally you'd skip the 2nd read if the 1st matched, but for this application you'd just do both every time and then compare afterwards.)
3 December 2014 1:39 AM
have you considered cuckoo hashing ?
3 December 2014 1:55 AM
Dude, love the article but your website is TOO THIN.
Make it fatter so I don't have to scroll so much to read please.
Also the closest thing you are looking for is a hash-map imo.
3 December 2014 1:55 AM
The data structure you are looking for is called an FM index.
Constant time search in the size of the corpus.
3 December 2014 2:16 AM
So why not just add little artificial random delay in your public API calls? Modifying data structures seems totally over kill here.
3 December 2014 2:16 AM
Use a GPU; they execute thousands of threads at once, and all threads visit all of their branches. With each one testing against one valid password and all of them executing in the same amount of time, you won't leak anything. Plus they're pretty common hardware.
3 December 2014 2:17 AM
See https://news.ycombinator.com/item?id=8690405 for comments and suggestions.
3 December 2014 2:46 AM
Why not add some random time to the algorithm verify the password? If that, the real consume time will not leak any more. Even the "interesting party" get the time information, they got the fake data.
3 December 2014 3:03 AM
Instead of a hash table, how about using a trie?
3 December 2014 4:03 AM
Rather than adding a random delay, why not add a variable delay such that the time of the lookup plus the time of the delay is constant? I.e. Receive request and record current time, do lookup and then wait until 100ms has elapsed since the request was received. You just need to pick the constant to be larger than the longest lookup.
3 December 2014 6:38 AM
Other popular solution is to make attacker not aware of what the hash is (I assume that's similar problem to DOS Hash attack). If you use HMAC or even simply 'hash(key | data)' then the attacker won't know which keys collide as long as the key is secret (and hash have avalanche-like property) so the timing information won't get him anything. Using scrypt or something similar as hash function would make the hash collision attacks infeasible - especially in combination with HMAC and secret key.
3 December 2014 6:39 AM
Don't add a random delay, it doesn't help. It just increases the number of probes needed by a small constant factor. The easiest answer is to pad your access time out to your worst-case access time. If you don't know your worst-case access time you can limit the leakage to a logarithmic number of bits IIRC by exponentially increasing the estimate of the delay. Andrew Myers and Danfeng Zhang have published work on limiting information flow through timing channels.
3 December 2014 7:15 AM
Instead of calculating worst-case access time and padding all other access times to have a constant time on each lookup, could you (on insert) create and save a new random delay time for inserted entry and delay response of every access using this delay? Could the attacker still get the real access times without this introduced delay?
3 December 2014 10:22 AM
Stephen Smoogen: Aside from being inefficient, I don’t see why your solution should not work. It would have to be some kind of busy-wait, though: Set the timer, do the calculation, then add busy-waiting to ensure that the performance of other threads running on the same system doesn’t leak information that you’re only waiting, not calculating.
Cuckoo Hashing sounds very interesting, though.
Note also that insertion performance isn’t that important: You don’t need to protect the system against users who set new passwords (they can only do that if they already have access, and in that case, you can detect the access pattern and throw them out before they do damage).
3 December 2014 10:26 AM
Cuckoo-Hashing is rather well explained and has quite a few nice references on Wikipedia: http://en.wikipedia.org/wiki/Cuckoo_hashing
3 December 2014 11:56 AM
For folks that think that this timing side channel is lost in the noise, check Adam Langley's writeup of the Lucky 13 attack:
In this case I don't think there is a constant-time solution. The remaining option is to accept some degree of timing leakage and ensure that there's no path from that side channel to actual "password" values, either directly (as in the bisection case) or indirectly (cryptographic hashing prevents this, I believe -- you can distinguish buckets and chain links but there's no path to the whole hash or the whole key).
3 December 2014 12:30 PM
Here’s another article on Cuckoo Hashing which might apply to your usecase:
- Optimistic Cuckoo (multiple readers):
3 December 2014 1:47 PM
I think you're conflating a few things.
1.) Constant algorithmic time (number of operations)
2.) Constant-time access to the underlying data
3.) No data flow from secret to public
The important thing to note is this:
So long as (3) is satisfied, _1 and 2 are purely performance considerations_.
Depending on the definition of "secret", blinding alone may be enough.
This further emphasizes that *it is critical to define your model*
For instance, if disclosing bits of the password is unacceptable, but a successful login disclosing that another recent login of the same token occurred is okay, blinding is entirely sufficient for satisfying (3).
A trivial way of doing that is, prior to inserting an entry in the table, encrypt it with a good block cipher (and mode) and a per-site key.
At that point, ordering no longer leaks any data about the plaintext, meaning that (2) is no longer an obstacle to using a binary tree.
tl;dr: Define your security model, _then_ worry about fulfilling it.
(Also, perfect hashing with blinding is both consistent-time (best-case-equals-worst-case) and constant-time (O(1)) aside from (2), _and_ satisfies (3). In return, updating the DB is hilariously expensive by comparison.)
3 December 2014 1:55 PM
To clarify, my objection is that the article is "$solution doesn't exist", when the problem is ill-defined (and, depending on definition, has _many_ other solutions)
3 December 2014 2:10 PM
Alex, you are exactly right. My point was simply that it is impossible to construct a perfect solution that does not leak any information at all. This was surprising to me. Once I realized that, then you're totally right it's all about defining security models and understanding data flow, and how to understand the impact of the leaks that a particular solution has.
I'm not sure why you mention perfect hashing with blinding being O(1) except for the way in which it's not O(1), though :)
3 December 2014 4:49 PM
That was partially lack-of-sleep, and partially to further highlight that you're using algorithmic complexity _terminology_, but your intent is actual _execution time_.
It is well and truly O(1) algorithmic complexity - which, when you say "constant-time data structures," is what comes to mind.
The issues of paging, caches, etc are not only machine-specific, but can be obviated by various means of forcing data either into (for small data) or out of (for large data) caches before operating on it.
Once you do that, the constraint loosens significantly - you no longer need to touch all data in the same order, but rather _the same number of accesses partitioned the same way into the same number of machine-level divisions_.
So instead of all your data, if you know that you'll need to access three cachelines of a directory page and one cacheline of a leaf page, that's enough to enforce truly constant-time execution.
3 December 2014 5:01 PM
Perhaps more concisely:
For hardware-dependent sources of variation in execution time, you're going to need hardware-dependent countermeasures.
On a simplistic machine with neither caches nor paging - i.e. O(1) access to memory - blinded algorithmic O(1) is sufficient.
If your machine does not provide O(1) access to memory, then you need to abstract it such that it provides O(1) access to the _relevant_ memory, by evicting from and/or loading into caches in a controlled manner.
3 December 2014 5:33 PM
I don't think you understand what "constant time" actually means, or you've perverted the term for your own purposes. If we're talking asymptotic complexity, then there are many "good" constant-time data structures. Hashtables are but one of many.
If you really mean "constant time" to mean "takes a constant amount of time", and your motivation for that is "security", then you're seriously misguided, not only about the precise nature of a timing attack, but about what sort of data structures would be most vulnerable to such a thing.
If you'd like to mitigate a timing attack on a conventional hashtable, one which provides best-case O(1) access and worst-case O(N) access, but which properly blinds the hashing function (so that O(N) performance cannot be chosen by the attacker and is extremely unlikely), you can look up not only the key requested by the user, but another random key (throwing away the result of that lookup, of course). Or look up 10 random keys. Or 100. (For any constant, it's still constant time.)
I challenge the attacker to make use of a timing attack on their supplied key in this situation. Will the query take a "constant" amount of absolute time? Of course not, such a thing is a ridiculous request on a non-realtime preemptive operating system with an incredibly complex, layered caching strategy.
But is it still a classical constant-time operation, as in, the cost of the operation does not grow asymptotically with the number of items hashed? Yes, clearly it is.
Will the performance decline as more and more items are added? Yes, clearly, the machine has limited resources, and as you exceed those resources you'll notice that performance falls. Importantly, though, 10x more items does not mean 10x more costly lookups. In fact, for any number of items below a certain number (say, the number that can fit into L3 cache), lookups will take about the same amount of time. Once you exceed that capacity, you'll note that lookups now take a longer amount of time, but roughly the same amount of time each even as capacity grows.
But then again, as you increase the rate of requests to the server, you'll also notice that performance may fall. Is this really something we'd attribute to the data structure? Is this a failure of the data structure to be "constant time"?
3 December 2014 6:09 PM
Why are you arguing against the term "constant-time" in a security context, Kat? It has a well-accepted meaning; see e.g. Coda Hale's article that I linked, or Adam Langley's https://www.imperialviolet.org/2013/02/04/luckythirteen.html for examples of uses. In my article it is clear that I am interested in constant-time operations.
The intent of this article was simply to take a look in the algorithmic toolbox to see if, on modern hardware, there are any data structures like maps or sets whose access time is sublinear in their size and which do not leak any timing information. There are not. I find that interesting. That's all :)
3 December 2014 6:41 PM
But that's the thing; by conflating hardware-independent timing leaks (algorithmic) and hardware-dependent timing leaks, you're losing precision _and_ correctness.
And the title is "constant-time data structures" which contextualizes it to algorithms and data structures, _not_ cryptography. When someone arrives at an article with a preconception the title gave them, it colors the entire rest of the article.
One can always flush caches, mlock() into RAM, and by doing so avoid the hardware-dependent timing leaks. This does cost performance, yes. Not nearly as much as a sleep, however - and on smaller data, or data with a small core that's _always_ used early (root node of tree, etc), you can do one better and preload or pin that into the cache.
3 December 2014 11:59 PM
Instead of just dismissing "sleep() kills performance", consider how you actually serve your clients. If you have a simple thread-per-client server, then sleep() just schedules the other threads while the one serving that one client waits.
If you have a more reasonable event loop system, just use timer events to handle authentication. Like, 1) user sends password, 2) start a timeout event after N ms, 3) check password with off-the-shelf algorithm, 4) return to event loop, 5) timeout occurs, push response to client.
Here N should be based on the worst-case performance of password checking algorithm. You definitely take a hit in performance, but since you can handle hundreds of clients simultaneously, you don't pay insane costs on authenticating just one. Since N should be a constant, you can use O(1) dequeue for timeout events (no need for actual priority queues).
4 December 2014 0:08 AM
This really is a case of trying to solve the wrong problem. The author shouldn't be using secrets as lookup keys at all (for the reasons mentioned).
Why not do what do many other systems do and give each secret a unique public id (e.g., username, app id, etc.) and key you data structure with the public id (and let the value). This way the lookup time is completely uncorrelated with the secret. (You'll still need a constant-time check to ensure the secret is correct.)
4 December 2014 12:27 PM
@One Idea: I think your idea is a detailed version of what Stephen Smoogen proposed in the first comment here: Ensure a constant time by only answering after a constant wait.
6 December 2014 6:40 PM
I don't have much experience with preventing timing attacks, but I've always thought that if I was going to write my own data structure with "constant time" lookup (there isn't really any such thing, as somebody else in this thread rightly pointed out, but we can typically pretend within the constraints of a particular pieces of hardware) to prevent timing attacks, it would be some sort of trie. The idea would be that all paths in the trie are the same length, and you try to make it such that dereferencing any node will hit main memory, not a cache (off the top of my head, I don't even know how to do this, if it's possible at all, but I haven't worked at such a low level in a long time). When a key is not a member of the data structure, you can just move on to the appropriate level of some shared tail, which ultimately results in a null result once you reach its end.
7 December 2014 11:01 AM
This should really be addressed at hardware level: without an instruction which tells CPU not to use cache and access memory directly, I don't think it's possible to have a constant-time hash in a multitasking environment.
7 December 2014 8:56 PM
What Salusa Secundus suggests can be done without changing the API, by letting the password (which can be extended in size freely) be in two parts, of which the first part is guaranteed unique (and is used for lookup) and the rest is the true secret. The hash table then maps the first part to the second
11 December 2014 7:04 AM
If I put a thread to sleep, it does not consume cpu, it's not in a runnable state.
When I get my first request, I check the system time, noting it as time A.
Now I do my mystery calculation that I do not want to leak out in any way how long it took. This takes time X.
Now I check the system time, noting it as time B.
Now I have a constant response time, say 3 seconds, wall clock time that I want all responses to come back in. C will be that constant length of wall clock time, and I will choose a C that is significantly longer than mystery calc time X, so X will reliably finish well before C has elapsed in wall clock time. If I add time A plus C, I will get a time D that I want to wake up.
So the next thing I do after noting time B above (after my mystery calculation is done) is subtract time B from time D to arrive at my sleep time E.
I set an alarm for E length of time.
My thread will wake up very close to time D, at which time I deliver my response to the client, yes or no on successfull authentication, or whatever the mystery calculation did.
You have a relative timeing situation that looks like this these scenarios (etc.):
D = A + C
E = D - B
While the X time can vary all over the place, it is completely unrelated to when D occurs, which is preset and relatively very much the same all the time.
The E time is spent simply with the thread sleeping, which is the standard way to minimize cpu resource usage when you have nothing for a thread to do.
Many of the responses essentially came up with this answer, but it does not appear that it's quite fully understood.
If you have the time D wakeup not happening on schedule - and thus telling the attacker that the system is under heavy load. If it the mystery calculations causing the slowdown - wow, your calculation takes more than 3 cpu seconds !!!!! What are you doing ? If it is not, then that is an entirely different issue - your server is severely burdened by something else.
Most of the wall clock time should be spent with the thread sleeping. No significant system slowdown, nothing revealed by response time - it just takes 3 seconds (or two or even 1 if say your mystery calc only takes ~ .0001 seconds, etc.) for logins to happen. Small price to pay for completey getting rid of the response time issue as a security issue.
13 December 2014 1:42 PM
Constant time comparison for fixed length strings isn't too hard; given strings A, B, pad with 0's to a integer number of words, XOR each word of A with the matching word of B, and AND the resulting words together. Compare to 0 at the end. Make sure the compiler doesn't "optimise" by early-outing.
You can similarly use the machine < operation on words to get masks for where A exceeds or is less than B, and bit-twiddle to get two (shorter) numbers to compare. For fixed size numbers this is constant time.
18 December 2014 1:18 PM
You can compare 2 fixed-length strings (a,b) by checking the sign of (2^N-a)+b, which will be in the carry flag if you perform a ripple-carry addition.
30 March 2015 5:16 PM
Last time I checked, less than 10% of all the CPUs sold in the world each year were 32-bit or more. I don't know any 8-bit CPUs still in production that have a cache, so apparently most CPUs sold each year don't have a cache.
A few assembly-language programmers talk a lot about zero-jitter loops that always execute in exactly the same number of cycles every time through the loop. The tricky bit is tweaking both sides of each if-then-else branch to use the same number of cycles; caches don't add any jitter because Atmel AVR and Microchip PIC chips don't have a cache. "isochronous assembly code" http://www.paleotechnologist.net/?p=2436 ; search http://www.piclist.com/techref/postbot.asp for "isochronous" ; etc.
On processors without a cache -- in other words, most processors -- dynamic perfect hashing and cuckoo hashing can be implemented by a clever assembly language programmer in a way that a lookup always uses exactly the same number of cycles for every key.
Can Robin Hood Hashing be implemented in a way that always uses the same number of cycles for every key?