Mastering Redis
上QQ阅读APP看书,第一时间看更新

Reviewing the time complexity of Redis data structures

With this understanding of the computing big notation, we'll next briefly review Redis's basic data structures, paying attention to the time complexity implications of using the data structure with the current commands supported by Redis.

Strings

The most basic data structure for Redis values is a string, the same data type as a Redis key. Using Redis at its simplest is as a string-to-string key-value storage. Note that Redis has similar performance characteristics to other key-value data storage solutions such as Memecached3.

In Redis, a string does not merely contain alphanumeric characters as strings are normally understood to be in higher-level programming languages, but contain serialized characters in C, the principal programming language used in Redis. The most basic GET and SET commands for Redis strings are O(1) operations, making Redis extremely fast as a simple key-value store. The speed and ease of using GET and SET should not be overlooked when thinking through your Redis solution. In my own experience with Redis, I'll often jump prematurely to using one of Redis's more complex data structures such as a sorted set, when a simple Redis string may be a faster and less complicated approach to solving the problem in front of me.

For most Redis string operations, both access and ingestion commands are either O(1) or O(n) in time complexity with the O(n) string commands being mostly bulk commands such as GETRANGE, MSET, and MGET. The GETRANGE command is an O(n) operation with n being the length of the return string. This makes intuitive sense if we think of the operation as a series of small GET commands (although GET does not return a subrange of a string value stored in the key). We can illustrate this with Redis CLI, SET, and GETRANGE:

127.0.0.1:6379> SET organization:1 "The British Library"
127.0.0.1:6379> GETRANGE organization:1 4 10 "British"

Therefore, in this example, the big O notation for SET is +1 and the GETRANGE is +6, the equivalent of retrieving six characters by issuing individual pseudo GET commands.

Because Redis stores all data as strings, the type information for particular strings is also maintained to support the INCR/DECR and bitstring commands. For the INCR and DECR commands, the value stored is a base-10 64-bit signed integer string, which if modified by the other Redis string commands such as APPEND, may corrupt the value; therefore, further integer-related Redis commands will fail if applied to the same key. We can easily replicate this situation from Redis CLI with the following INCR, GET, and DUMP commands on the new:counter key:

127.0.0.1:6379> INCR new:counter(integer) 1 127.0.0.1:6379> GET new:counter "1" 127.0.0.1:6379> DUMP new:counter "\x00\xc0\x01\x06\x00\xb0\x95\x8f6$T-o" 127.0.0.1:6379> APPEND new:counter "a" (integer) 2 127.0.0.1:6379> INCR new:counter (error) ERR value is not an integer or out of range 127.0.0.1:6379> GET new:counter "1a" 127.0.0.1:6379> DUMP new:counter "\x00\x021a\x06\x00\x8br\x9a\x98-9\x9a\xa6"

Hashes

Hashes, otherwise known as dictionaries or associative arrays in other programming languages, are data structures that map one or more fields to the corresponding value pairs. In Redis, all hash values must be Redis strings with unique field names. The field's values are simple Redis strings that are returned by calling the Redis HGET or HMGET commands with the appropriate Redis key and one or more field parameters. For many uses, Redis hashes provide excellent O(1) performance for the HSET and HGET commands. Similar to the string bulk commands, the hash HGETALL, HMSET, HMGET, HKEYS, and HVALS commands are all O(n) cases. If your hashes are small, there may not be any appreciable difference between returning all of the hash's keys and values with the HGETALL and HMGET commands. As your hash size increases in terms of the number of fields and values, the difference between the two can make a difference in your application. Take a hash with 1000 fields; if your application only regularly uses 300, the time complexity of a call to Redis with the HGETALL or HVALS command is O(1000), while with HMGET, the time complexity is only O(300) because although both HGETALL and HMGET are both O(n) cases, the HMGET upper bound is only the total number of fields being requested and not the entire hash. Finding and replacing the HGETALL commands with HMGET is one way to increase your application's Redis performance if the overall size of the hash is small. For large hash sizes, an HMGET command returning a large number of values can significantly impact the overall performance of Redis for other clients who are blocked from receiving any values until the HMGET command finishes execution within the Redis server. In this case, targeted HGET would be a better choice.

While Redis hash values cannot contain hashes, lists, or other data collection structures, Redis does offer the HINCRBY and HINCRBYFLOAT commands, which allow you to treat the string value stored in a field as an integer or a float, respectively. Redis returns an error if you try to update a field's value with the wrong data type as seen in the following example from the Redis command line:

127.0.0.1:6379> HMSET weather:2 temperature 46 moisture .001 127.0.0.1:6379> HINCRBY weather:2 temperature -1 127.0.0.1:6379> HGET weather:2 temperature "45" 127.0.0.1:6379> HINCRBY weather:2 moisture 1 (error) ERR hash value is not an integer 127.0.0.1:6379> HINCRBYFLOAT weather:2 moisture 1 127.0.0.1:6379> HGET weather:2 moisture "1.001"

You'll notice that Redis treats the setting of the value of 1 as either an integer or a float depending on the command.

Lists

Lists in Redis are ordered collections of strings that allow for duplicate string values. A list in Redis is more accurately labeled and implemented as a linked list. Because Redis lists are implemented as linked lists, adding an item to the front of a list with LPUSH or to the end of a list with RPUSH is a relatively inexpensive operation performed at a constant time complexity of O(1). For the LINSERT and LSET commands, the time complexity is linear, O(n), but with some significant differences. For the LSET command, where you can set a list value as an index value, the time complexity for the n variable is the length of the list while setting either the first or the last value in the list with LSET is O(1) because the lists are linked lists. For the LINSERT command, which allows you to insert a value before or after a reference value, the abovementioned time complexity is O(n), with n being the number of list elements that the command must go through before reaching the reference value, with the worst case inserting a value at the end of the list. Remember that with the big O notation, we're interested in the worst case scenario, so the time complexity of the LINSERT command is considered to be O(n), even if the reference value is the first list element making the time complexity of the LINSERT command O(1) in this special case.

Lists

For the LRANGE command, the official Redis documentation gives the complexity class for this command as O(s+n), with s being the number of elements to offset either from the head of the list or from the end of the list depending on the size of the list. The n variable for LRANGE is the total number of elements to be returned. If you want to return the entire list, the common list LRANGE pattern is LRANGE mylist 0-1, which results in a time complexity of O(10+10) for a list of length 10. LTRIM typically has a big O notation of O(n), where n is the number of elements to be returned to the calling client. As mentioned in the official documentation for the LTRIM command 4, using LTRIM with either RPUSH or LPUSH is a common way to only store a fixed-length collection of elements. For example, if you only want to keep the last seven days' worth of average temperature data, use the following Redis commands:

127.0.0.1:6379> LPUSH temp:last-seven-days 30 45 50 52 49 55 51
127.0.0.1:6379> LPUSH temp:last-seven-days 56
127.0.0.1:6379> LTRIM temp:last-seven-days 0 6
127.0.0.1:6379> LRANGE temp:last-seven-days 0 -1
1) "56"
2) "51"
3) "55"
4) "49"
5) "52"
6) "50"
7) "45"

As we see, this pattern allows us to store the last seven days' average temperature and when used in this way, the time complexity of our LTRIM command now approaches O(1) as long as only one value is pushed on to the list at a time.

Sets

Sets in Redis are a type of collection primitive where the uniqueness of string values is guaranteed but the ordering of these values is not. Redis sets also implement union, intersection, and difference set semantics along with the ability to store the results of these set operations as a new Redis set in the Redis instance. With the current implementation of the Redis cluster, the union, intersection, and difference set semantics are more limited and can only be used in a limited fashion. The SADD command, which adds one or more values to a set, is O(n), where n is the number of members to be added to the set. The important SISMEMBER command evaluates whether a value is a member of the set or not and is an O(1) operation, while the SMEMBERS command that returns all of the elements in the list is an O(N) operation. Sets may have a similar performance to that of the other data structures in Redis or in certain cases have much better memory usage when storing integers over hashes.

Where sets are extremely useful in Redis is its support for the union, intersection, and difference set operations, all of which have different time complexities but again are of much limited use with a Redis cluster.

Sets

The SUNION and SUNIONSTORE commands allow you to return all the members of one or more sets to either the call client or a new set stored in Redis. Both these commands are O(n), where n is the total number of members in all the sets. The SINTER and SINTERSTORE commands returns the intersection or stores the set when using the SINTERSTORE command. in Redis, the common members of one or more sets result in a time complexity of O(n*m), where n is the size of the smallest set and m is the total number of sets. Finally, the SDIFF and SDIFFSTORE commands either return or store in Redis the difference between the first set and zero or more subsequent sets. As for the SUNION command, the time complexity of SDIFF and SDIFFSTORE is O(n), where n is the total number of members of all the sets. The utility of the Redis sets is in these set operations that allow interaction at a very basic Boolean logic level with your data. However, as your set size increases, the amount of processing time increases at a minimum of O(n) time.