Redis源码阅读(九) – db

Redis是kv内存数据库,它的dbdict实现的,结构如下:

/* Redis database representation. There are multiple databases identified
 * by integers from 0 (the default database) up to the max configured
 * database. The database number is the 'id' field in the structure. */
typedef struct redisDb {
    dict *dict;                 /* The keyspace for this DB */
    dict *expires;              /* Timeout of keys with a timeout set */
    dict *blocking_keys;        /* Keys with clients waiting for data (BLPOP)*/
    dict *ready_keys;           /* Blocked keys that received a PUSH */
    dict *watched_keys;         /* WATCHED keys for MULTI/EXEC CAS */
    int id;                     /* Database ID */
    long long avg_ttl;          /* Average TTL, just for stats */
} redisDb;

db不只用来存储键值对,还有一些其他的功能:

  • dict: 存放键值对
  • expires: 存放键和过期时间的映射
  • blocking_keys: 存放block操作的keyclient的映射
  • ready_keys: 接收到push操作的blocked key
  • watched_keys: WATCHED keys

object

Redis使用object封装底层的各种数据结构,结构如下:

typedef struct redisObject {
    unsigned type:4;
    unsigned encoding:4;
    unsigned lru:LRU_BITS; /* LRU time (relative to global lru_clock) or
                            * LFU data (least significant 8 bits frequency
                            * and most significant 16 bits decreas time). */
    int refcount;
    void *ptr;
} robj;

object实现了简单的引用计数refcount,不同的地方可能会使用相同的object,当refcount为0时会自动释放。这样做既能节省内存,也简化了生命周期管理:

void decrRefCount(robj *o) {
    if (o->refcount == 1) {
        switch(o->type) {
        case OBJ_STRING: freeStringObject(o); break;
        case OBJ_LIST: freeListObject(o); break;
        case OBJ_SET: freeSetObject(o); break;
        case OBJ_ZSET: freeZsetObject(o); break;
        case OBJ_HASH: freeHashObject(o); break;
        case OBJ_MODULE: freeModuleObject(o); break;
        default: serverPanic("Unknown object type"); break;
        }
        zfree(o);
    } else {
        if (o->refcount <= 0) serverPanic("decrRefCount against refcount <= 0");
        if (o->refcount != OBJ_SHARED_REFCOUNT) o->refcount--;
    }
}

同一typeobject会有不同的底层结构encoding,在t_*.c文件中,为每种类型封装了一套api。具体类型如下:

type encoding limit
string embedded string len <= 44
string sds -
list quicklist -
set intset all members are numbers
set hash table -
zset ziplist length <= zset-max-ziplist-entries && element size <= zset-max-ziplist-value
zset skiplist -
hash ziplist number of entries <= hash-max-ziplist-entries && value size <= hash-max-ziplist-value
hash hash table -

Redisstring做了一些优化,包括使用embedded string

/* Create a string object with EMBSTR encoding if it is smaller than
 * OBJ_ENCODING_EMBSTR_SIZE_LIMIT, otherwise the RAW encoding is
 * used.
 *
 * The current limit of 39 is chosen so that the biggest string object
 * we allocate as EMBSTR will still fit into the 64 byte arena of jemalloc. */
#define OBJ_ENCODING_EMBSTR_SIZE_LIMIT 44
robj *createStringObject(const char *ptr, size_t len) {
    if (len <= OBJ_ENCODING_EMBSTR_SIZE_LIMIT)
        return createEmbeddedStringObject(ptr,len);
    else
        return createRawStringObject(ptr,len);
}

embedded stringobj headersds存放在连续的内存,从注释中可以看出,embedded string为了fitjemalloc 64byte arena,只有当len <= 44时,才会使用: 减去robj头部和sdshdr8头部还有sds结尾的\0: 64 - 16 - 3 - 1 = 44

db

dbdictType如下:

/* Db->dict, keys are sds strings, vals are Redis objects. */
dictType dbDictType = {
    dictSdsHash,                /* hash function */
    NULL,                       /* key dup */
    NULL,                       /* val dup */
    dictSdsKeyCompare,          /* key compare */
    dictSdsDestructor,          /* key destructor */
    dictObjectDestructor   /* val destructor */
};

Redis的键以sds保存,值为object。键不用object的原因主要是键的类型固定,一定是string,而且键唯一,所以指向相同对象的情况很少,使用object会浪费空间。 命令的调用过程很简单:

  1. 查表lookupCommand()
  2. 调用命令
  3. 查找db中有无该键,判断类型
  4. 执行相关的操作

expire

Redis支持键过期操作,db->expires中存放键和过期时间的映射,主要有2种方式处理过期:

  1. 在查找key的相关命令中会调用expireIfNeeded(),处理相关key是否过期
  2. 因为有些key一直不被访问,所以会定时进行key过期扫描:
    • beforeSleep()中调用activeExpireCycle(ACTIVE_EXPIRE_CYCLE_FAST)
    • databaseCron()中调用activeExpireCycle(ACTIVE_EXPIRE_CYCLE_SLOW)

为了避免处理过期key阻塞太久,activeExpireCycle()有几个限制条件,当不满足后会退出:

  1. 每次会随机选择ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP 20key,查看是否过期。每16次循环会判断是否超时:
    • FAST的时间限制为ACTIVE_EXPIRE_CYCLE_FAST_DURATION 1ms
    • SlOW的时间限制为25%CPU时间
  2. 每次随机扫描到的已过期key数量,低于ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP/4会退出

这些都是为了将耗时长的操作分摊到多次操作中,避免阻塞服务器。

evict

Redis是内存数据库,maxmemory限制了能够使用的内存空间,在每次调用命令之前都会调用freeMemoryIfNeeded()检查内存够不够用,不够会进行evict,当无法释放足够的内存会返回错误,拒绝写操作。 Redis支持下列策略:

  • volatile-lru -> Evict using approximated LRU among the keys with an expire set.
  • allkeys-lru -> Evict any key using approximated LRU.
  • volatile-lfu -> Evict using approximated LFU among the keys with an expire set.
  • allkeys-lfu -> Evict any key using approximated LFU.
  • volatile-random -> Remove a random key among the ones with an expire set.
  • allkeys-random -> Remove a random key, any key.
  • volatile-ttl -> Remove the key with the nearest expire time (minor TTL)
  • noeviction -> Don’t evict anything, just return an error on write operations.

Redis首先创建一个pool,默认大小为16,存放待evict的候选元素:

#define EVPOOL_SIZE 16
#define EVPOOL_CACHED_SDS_SIZE 255
struct evictionPoolEntry {
    unsigned long long idle;    /* Object idle time (inverse frequency for LFU) */
    sds key;                    /* Key name. */
    sds cached;                 /* Cached SDS object for key name. */
    int dbid;                   /* Key DB number. */
};

每次需要evict时,Redis从每个db都随机挑选maxmemory-sample个key加入到evictionPool中。evictionPool中的元素按照idle升序排列,最右的为待evict的key。 idle按照策略的不同,有不同的计算方式:

  • TTL: idle = ULLONG_MAX - TTL
  • LRU: object->lru保存了最后一次访问该key的时间,idle计算方式为currentTime - object->lru
  • LFU: object->lru格式为:
             16 bits      8 bits
        +----------------+--------+
        + Last decr time | LOG_C  |
        +----------------+--------+
    

LOG_C为对数的访问频率,每次访问该key时有几率增加,随着count值变大,增加的概率也变小,代码如下:

/* Logarithmically increment a counter. The greater is the current counter value
 * the less likely is that it gets really implemented. Saturate it at 255. */
uint8_t LFULogIncr(uint8_t counter) {
    if (counter == 255) return 255;
    double r = (double)rand()/RAND_MAX;
    double baseval = counter - LFU_INIT_VAL;
    if (baseval < 0) baseval = 0;
    double p = 1.0/(baseval*server.lfu_log_factor+1);
    if (r < p) counter++;
    return counter;
}

Last decr time保存着上次降低频率的时间,以分钟计。该值用于降低频率,因为随着时间的增加,频率需要降低,代码如下:

/* If the object decrement time is reached, decrement the LFU counter and
 * update the decrement time field. Return the object frequency counter.
 *
 * This function is used in order to scan the dataset for the best object
 * to fit: as we check for the candidate, we incrementally decrement the
 * counter of the scanned objects if needed. */
#define LFU_DECR_INTERVAL 1
unsigned long LFUDecrAndReturn(robj *o) {
    unsigned long ldt = o->lru >> 8;
    unsigned long counter = o->lru & 255;
    if (LFUTimeElapsed(ldt) >= server.lfu_decay_time && counter) {
        if (counter > LFU_INIT_VAL*2) {
            counter /= 2;
            if (counter < LFU_INIT_VAL*2) counter = LFU_INIT_VAL*2;
        } else {
            counter--;
        }
        o->lru = (LFUGetTimeInMinutes()<<8) | counter;
    }
    return counter;
}

最后的idle就是255 - LFUDecrAndReturn()

会按照idle从大到小的顺序进行淘汰,直到内存释放到 maxmemory 以下。evictionPool中的key可能不会全部释放,这会带来问题:

  1. 之前的 key 被添加到 evictionPool 但没有被淘汰
  2. 这个 key 被访问到了,idle 变小,但是不会从 evictionPool 中移除
  3. 这个 key 在之后被淘汰了

不过本身淘汰也是随机选择 key,影响也不大,最多导致命中率下降。

bug?

看代码感觉在freeMemoryIfNeeded()最后有点问题,代码如下:

cant_free:
    /* We are here if we are not able to reclaim memory. There is only one
     * last thing we can try: check if the lazyfree thread has jobs in queue
     * and wait... */
    while(bioPendingJobsOfType(BIO_LAZY_FREE)) {
        if (((mem_reported - zmalloc_used_memory()) + mem_freed) >= mem_tofree)
            break;
        usleep(1000);
    }
    return C_ERR;

这是当释放不了需要的内存空间时,等待lasyfree完成。这个代码主要有2个问题:

  1. mem_reported值为开始evict前分配的内存,mem_reported - zmalloc_used_memory()值为总共释放的空间,再加上mem_freed不知道代表着什么
  2. 永远返回C_ERR,为什么还要等待lazyfree完成呢?

lazyfree

Redis4.0新增了lazyfree功能,因为Redis是单线程的,某些操作会阻塞住服务,现在可以放到另一个线程去执行, 目前只有3种操作:

  • BIO_CLOSE_FILE
  • BIO_AOF_FSYNC
  • BIO_LAZY_FREE

Redis启动时会调用bioInit()初始化相关锁和条件变量,并为每种操作分别创建一个线程。bio的实现很简单, 传统的单生产者单消费者模型,用list作为队列,互斥锁提供互斥,条件变量来通知变化。

分类:

更新时间: