Redis源码阅读(九) – db
Redis是kv内存数据库,它的db是dict实现的,结构如下:
/* Redis database representation. There are multiple databases identified
 * by integers from 0 (the default database) up to the max configured
 * database. The database number is the 'id' field in the structure. */
typedef struct redisDb {
    dict *dict;                 /* The keyspace for this DB */
    dict *expires;              /* Timeout of keys with a timeout set */
    dict *blocking_keys;        /* Keys with clients waiting for data (BLPOP)*/
    dict *ready_keys;           /* Blocked keys that received a PUSH */
    dict *watched_keys;         /* WATCHED keys for MULTI/EXEC CAS */
    int id;                     /* Database ID */
    long long avg_ttl;          /* Average TTL, just for stats */
} redisDb;
db不只用来存储键值对,还有一些其他的功能:
- dict: 存放键值对
- expires: 存放键和过期时间的映射
- blocking_keys: 存放- block操作的- key与- client的映射
- ready_keys: 接收到- push操作的- blocked key
- watched_keys:- WATCHED keys
object
Redis使用object封装底层的各种数据结构,结构如下:
typedef struct redisObject {
    unsigned type:4;
    unsigned encoding:4;
    unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
                            * LFU data (least significant 8 bits frequency
                            * and most significant 16 bits decreas time). */
    int refcount;
    void *ptr;
} robj;
object实现了简单的引用计数refcount,不同的地方可能会使用相同的object,当refcount为0时会自动释放。这样做既能节省内存,也简化了生命周期管理:
void decrRefCount(robj *o) {
    if (o->refcount == 1) {
        switch(o->type) {
        case OBJ_STRING: freeStringObject(o); break;
        case OBJ_LIST: freeListObject(o); break;
        case OBJ_SET: freeSetObject(o); break;
        case OBJ_ZSET: freeZsetObject(o); break;
        case OBJ_HASH: freeHashObject(o); break;
        case OBJ_MODULE: freeModuleObject(o); break;
        default: serverPanic("Unknown object type"); break;
        }
        zfree(o);
    } else {
        if (o->refcount <= 0) serverPanic("decrRefCount against refcount <= 0");
        if (o->refcount != OBJ_SHARED_REFCOUNT) o->refcount--;
    }
}
同一type的object会有不同的底层结构encoding,在t_*.c文件中,为每种类型封装了一套api。具体类型如下:
| type | encoding | limit | 
|---|---|---|
| string | embedded string | len <= 44 | 
| string | sds | - | 
| list | quicklist | - | 
| set | intset | all members are numbers | 
| set | hash table | - | 
| zset | ziplist | length <= zset-max-ziplist-entries && element size <= zset-max-ziplist-value | 
| zset | skiplist | - | 
| hash | ziplist | number of entries <= hash-max-ziplist-entries && value size <= hash-max-ziplist-value | 
| hash | hash table | - | 
Redis对string做了一些优化,包括使用embedded string:
/* Create a string object with EMBSTR encoding if it is smaller than
 * OBJ_ENCODING_EMBSTR_SIZE_LIMIT, otherwise the RAW encoding is
 * used.
 *
 * The current limit of 39 is chosen so that the biggest string object
 * we allocate as EMBSTR will still fit into the 64 byte arena of jemalloc. */
#define OBJ_ENCODING_EMBSTR_SIZE_LIMIT 44
robj *createStringObject(const char *ptr, size_t len) {
    if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT)
        return createEmbeddedStringObject(ptr,len);
    else
        return createRawStringObject(ptr,len);
}
embedded string将obj header和sds存放在连续的内存,从注释中可以看出,embedded string为了fit进jemalloc 64byte arena,只有当len <= 44时,才会使用:
减去robj头部和sdshdr8头部还有sds结尾的\0: 64 - 16 - 3 - 1 = 44。
db
db的dictType如下:
/* Db->dict, keys are sds strings, vals are Redis objects. */
dictType dbDictType = {
    dictSdsHash,                /* hash function */
    NULL,                       /* key dup */
    NULL,                       /* val dup */
    dictSdsKeyCompare,          /* key compare */
    dictSdsDestructor,          /* key destructor */
    dictObjectDestructor   /* val destructor */
};
Redis的键以sds保存,值为object。键不用object的原因主要是键的类型固定,一定是string,而且键唯一,所以指向相同对象的情况很少,使用object会浪费空间。
命令的调用过程很简单:
- 查表lookupCommand()
- 调用命令
- 查找db中有无该键,判断类型
- 执行相关的操作
expire
Redis支持键过期操作,db->expires中存放键和过期时间的映射,主要有2种方式处理过期:
- 在查找key的相关命令中会调用expireIfNeeded(),处理相关key是否过期
- 因为有些key一直不被访问,所以会定时进行key过期扫描:- beforeSleep()中调用- activeExpireCycle(ACTIVE_EXPIRE_CYCLE_FAST)
- databaseCron()中调用- activeExpireCycle(ACTIVE_EXPIRE_CYCLE_SLOW)
 
为了避免处理过期key阻塞太久,activeExpireCycle()有几个限制条件,当不满足后会退出:
- 每次会随机选择ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP 20个key,查看是否过期。每16次循环会判断是否超时:- FAST的时间限制为- ACTIVE_EXPIRE_CYCLE_FAST_DURATION 1ms
- SlOW的时间限制为25%CPU时间
 
- 每次随机扫描到的已过期key数量,低于ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP/4会退出
这些都是为了将耗时长的操作分摊到多次操作中,避免阻塞服务器。
evict
Redis是内存数据库,maxmemory限制了能够使用的内存空间,在每次调用命令之前都会调用freeMemoryIfNeeded()检查内存够不够用,不够会进行evict,当无法释放足够的内存会返回错误,拒绝写操作。
Redis支持下列策略:
- volatile-lru -> Evict using approximated LRU among the keys with an expire set.
- allkeys-lru -> Evict any key using approximated LRU.
- volatile-lfu -> Evict using approximated LFU among the keys with an expire set.
- allkeys-lfu -> Evict any key using approximated LFU.
- volatile-random -> Remove a random key among the ones with an expire set.
- allkeys-random -> Remove a random key, any key.
- volatile-ttl -> Remove the key with the nearest expire time (minor TTL)
- noeviction -> Don’t evict anything, just return an error on write operations.
Redis首先创建一个pool,默认大小为16,存放待evict的候选元素:
#define EVPOOL_SIZE 16
#define EVPOOL_CACHED_SDS_SIZE 255
struct evictionPoolEntry {
    unsigned long long idle;    /* Object idle time (inverse frequency for LFU) */
    sds key;                    /* Key name. */
    sds cached;                 /* Cached SDS object for key name. */
    int dbid;                   /* Key DB number. */
};
每次需要evict时,Redis从每个db都随机挑选maxmemory-sample个key加入到evictionPool中。evictionPool中的元素按照idle升序排列,最右的为待evict的key。
idle按照策略的不同,有不同的计算方式:
- TTL:- idle = ULLONG_MAX - TTL
- LRU:- object->lru保存了最后一次访问该- key的时间,- idle计算方式为- currentTime - object->lru
- LFU:- object->lru格式为:- 16 bits 8 bits +----------------+--------+ + Last decr time | LOG_C | +----------------+--------+
LOG_C为对数的访问频率,每次访问该key时有几率增加,随着count值变大,增加的概率也变小,代码如下:
/* Logarithmically increment a counter. The greater is the current counter value
 * the less likely is that it gets really implemented. Saturate it at 255. */
uint8_t LFULogIncr(uint8_t counter) {
    if (counter == 255) return 255;
    double r = (double)rand()/RAND_MAX;
    double baseval = counter - LFU_INIT_VAL;
    if (baseval < 0) baseval = 0;
    double p = 1.0/(baseval*server.lfu_log_factor+1);
    if (r < p) counter++;
    return counter;
}
Last decr time保存着上次降低频率的时间,以分钟计。该值用于降低频率,因为随着时间的增加,频率需要降低,代码如下:
/* If the object decrement time is reached, decrement the LFU counter and
 * update the decrement time field. Return the object frequency counter.
 *
 * This function is used in order to scan the dataset for the best object
 * to fit: as we check for the candidate, we incrementally decrement the
 * counter of the scanned objects if needed. */
#define LFU_DECR_INTERVAL 1
unsigned long LFUDecrAndReturn(robj *o) {
    unsigned long ldt = o->lru >> 8;
    unsigned long counter = o->lru & 255;
    if (LFUTimeElapsed(ldt) >= server.lfu_decay_time && counter) {
        if (counter > LFU_INIT_VAL*2) {
            counter /= 2;
            if (counter < LFU_INIT_VAL*2) counter = LFU_INIT_VAL*2;
        } else {
            counter--;
        }
        o->lru = (LFUGetTimeInMinutes()<<8) | counter;
    }
    return counter;
}
最后的idle就是255 - LFUDecrAndReturn()。
会按照idle从大到小的顺序进行淘汰,直到内存释放到 maxmemory 以下。evictionPool中的key可能不会全部释放,这会带来问题:
- 之前的 key被添加到evictionPool但没有被淘汰
- 这个 key被访问到了,idle变小,但是不会从evictionPool中移除
- 这个 key在之后被淘汰了
不过本身淘汰也是随机选择 key,影响也不大,最多导致命中率下降。
bug?
看代码感觉在freeMemoryIfNeeded()最后有点问题,代码如下:
cant_free:
    /* We are here if we are not able to reclaim memory. There is only one
     * last thing we can try: check if the lazyfree thread has jobs in queue
     * and wait... */
    while(bioPendingJobsOfType(BIO_LAZY_FREE)) {
        if (((mem_reported - zmalloc_used_memory()) + mem_freed) >= mem_tofree)
            break;
        usleep(1000);
    }
    return C_ERR;
这是当释放不了需要的内存空间时,等待lasyfree完成。这个代码主要有2个问题:
- mem_reported值为开始- evict前分配的内存,- mem_reported - zmalloc_used_memory()值为总共释放的空间,再加上- mem_freed不知道代表着什么
- 永远返回C_ERR,为什么还要等待lazyfree完成呢?
lazyfree
Redis4.0新增了lazyfree功能,因为Redis是单线程的,某些操作会阻塞住服务,现在可以放到另一个线程去执行,
目前只有3种操作:
- BIO_CLOSE_FILE
- BIO_AOF_FSYNC
- BIO_LAZY_FREE
在Redis启动时会调用bioInit()初始化相关锁和条件变量,并为每种操作分别创建一个线程。bio的实现很简单,
传统的单生产者单消费者模型,用list作为队列,互斥锁提供互斥,条件变量来通知变化。
留下评论