Tuesday, 10 April 2018

SPO600 Project - Stage 2

In this Stage 2, I am going to learn deeply the Redis selected in Stage1. Redis (Remote Dictionary Server) is an in-memory-based key-value store. Performance is faster than memory-based databases, as it directly processes the data into memory. In the case of data types that can be stored, the rest of the repository provides only the primitive types, while the rest of the data types are data types such as String, Set, Hash, List, And provides basic functions such as search, add, and delete of data.
First, I chose a Redis client library to test the Redis database. It is Hiredis. Hiredis provides the APIs needed to manipulate Redis as a C client library. And, I will send a 1,000,000 commands to the server to benchmark using Hiredis. In relation to this, I picked c language source in a  Github, and I modified this source. The Github URLs are below.

Hiredis Client Library Git: https://github.com/redis/hiredis
Benchmark Git: https://github.com/stefanwille/redis-client-benchmarks

This source is the source for the benchmark.

const int N = 1000000;

int main() {
    printf("Connecting...\n");
    redisContext *redis = redisConnect("localhost", 6379);

    clock_t start, end;
    float ftime;

    if (redis->err) {
        puts(redis->errstr);
        char *p = redis->errstr;
        while (*p) {
            printf("%x ", (int)(*p++));
        }
        printf("\n");
    }
    else {
        start = clock();

        char *cmd;
        int len;

        for (int i = 0; i < N; i++) {
            len = redisFormatCommand(&cmd, "HSET myset:__rand_int__ element:__rand_int__");

            redisAppendFormattedCommand(redis, cmd, len);
        }

        for (int i = 0; i < N; i++) {
            redisReply *reply;
            assert(redisGetReply(redis, (void*)&reply) == REDIS_OK);
            redisGetReply(redis, (void**)&reply);

            freeReplyObject(reply);
        }

        end = clock();

        ftime = (float)(end - start) / CLOCKS_PER_SEC;

        printf("Runing Time: %f sec. \n", ftime);
    }
}


I analyzed the Hiredis source to implement optimization for the function. Unfortunately, it was not easy to find a place to optimize. So I decided to apply the optimization what I learned in this lecture. The source below invokes the redisFormatCommand function of the Hiredis library. So I looked for a part of the function that I could optimize. The redisFormatCommand invokes the redisvFormatCommand function. So I figured out where to optimize the function and implemented the optimization. Of course, this may not be the right way for optimization. However, I wanted to make sure if performance improvements can be made when optimizing in small parts.

The source below is a modified source of Hiredis.

    while(*c != '\0') {
        if (*c != '%' || c[1] == '\0') {
            if (*c == ' ') {
                if (touched) {


                    //Change Source for optimization
                    int result;
                    __asm__ __volatile__("add %0,%1,%2 \n\t":"=r" (result) : "r"(argc), "r"(1));
                    newargv = realloc(curargv, sizeof(char*)*(result));

                    //newargv = realloc(curargv,sizeof(char*)*(argc+1)); - Original Source
                    if (newargv == NULL) goto memory_err;
                    curargv = newargv;
                    curargv[argc++] = curarg;
                    totlen += bulklen(sdslen(curarg));

                    /* curarg is put in argv so it can be overwritten. */
                    curarg = sdsempty();
                    if (curarg == NULL) goto memory_err;
                    touched = 0;
Now let's benchmark using the Hiredis library. The first picture is the one executed prior to optimization. And the second figure shows the results after optimization. I did not change the results dramatically because I modified a very small part. However, you can see that performance has improved a bit since you have executed 1,000,000 commands and executed optimizations in the while statement. I benchmarked the Hset datatype, compiled it with different compile options, and checked the speed. The compile option -03 is the most optimized result. However, there is not much difference depending on the option.
  • The result of original
  • The result of optimization.

I modified the source code to run on the x86 platform.

    while(*c != '\0') {
        if (*c != '%' || c[1] == '\0') {
            if (*c == ' ') {
                if (touched) {

                    //Change Source for optimization
                    int result;
                    __asm__("addl %%ebx, %%eax;" : "=a" (result) : "a" (argc), "b" (1));
                    newargv = realloc(curargv,sizeof(char*)*(argc+1));

                    //newargv = realloc(curargv,sizeof(char*)*(argc+1)); - Original Source
                    if (newargv == NULL) goto memory_err;
                    curargv = newargv;
                    curargv[argc++] = curarg;
                    totlen += bulklen(sdslen(curarg));

                    /* curarg is put in argv so it can be overwritten. */
                    curarg = sdsempty();
                    if (curarg == NULL) goto memory_err;
                    touched = 0;

I modified the source code to run on the x86 platform.
I have tried hard to run the Hiredis library and this code on the x86 platform, but I have not found a way. So I tried to run it on the Windows platform, but the Redis libraries were not suitable for running on Windows. The following picture shows errors when running the Hiredis library on an x86 platform. Even the example sources supported by Redis have not been compiled.



I learned a few important things through this stage. First, hash-based data is very fast and useful. In addition, the hash data can be used on any platform. However, it is only possible to configure the library well. In addition, I was able to understand the process of working together on Github. This is a very useful and essential stuff for programmers. Best of all, I found that even a small amount of code optimization can improve the performance of the system.

No comments:

Post a Comment

SPO600 Project - Stage 3

I chose Redis (Remote Dictionary Server) for my project at stage1. Redis is open source software developed by Salvatore Sanfilippo, a volati...