Ilkyu Song - SPO600: April 2018

Sunday 22 April 2018

SPO600 Project - Stage 3

I chose Redis (Remote Dictionary Server) for my project at stage1. Redis is open source software developed by Salvatore Sanfilippo, a volatile and persistent key-value store. Then, it stores and manages the data in the memory. Let's look at the benefits and data types of Redis.

1. The advantages of The Redis

Advantage	Description
Specialized for processing data in lists and arrays.	• The value supports several data types such as string, list, set, sorted set, hash type. • List type data entry and deletion are about 10 times faster than MySQL.
Redis transaction is also atomic.	Atomic processing provides an Atomic processing function to prevent data mismatch when several processes simultaneously request the same key update.
Persistent data preservation while utilizing memory.	• Do not delete data unless explicitly deleted with the command or set expires. • The snapshot function allows you to save the contents of the memory as *.rdb file and restore it to that point in time.
Multiple server configurations.	Consistent hashing or master-slave configuration

2. Redis provides five data types, and there are many processing instructions for each data type.

Data Type	Description
String	• We cannot just store a string as a string, • Binary data can also be saved (note that Redis does not have integer or real numbers). • The maximum size of data that can be inserted into the key is 512 MB.
List	• We can think of it as an array. • The maximum number of elements in a key is 4,294,967,295. • If the value of the data type is larger than the condition set in the configuration file, it is encoded as a linked list or zip list.
Set	• There is no duplicate data in key with unaligned aggregate type • The amount of time spent adding, removing, and checking existence is constant regardless of the number of elements in the sets • The maximum number of elements that can be in a key is 4,294,967,295
Sorted sets	• Sorted sets are called the most advanced Redis datatype. • Adding, removing, and updating elements are done in a very fast way, which is proportional to the "number of elements in the log". • It can be used in linking systems. • Each element of sets has a real value called score and is sorted in ascending order by score value. • There is no redundant data in the key, but the score value can be duplicated
Hashes	• Similar to lists, consisting of a series of "field names" and "field values". • The maximum number of field-value pairs that can be contained in a key is 4,294,967,295.

I compiled the benchmark file for the Redis benchmark by changing the compile options. And benchmarks were done on aarchie and x86 servers with different command counts. The result below is the number of commands executed per second. The aarchie server is a bit faster than the x86 server, although it did not show much difference from the test in stage1. However, the specifications of the two servers are so different that simple comparison is difficult. Some developers and architectures tend to look at performance only with code without considering hardware specs. However, the first stuff to consider when tuning database or optimizing code is the hardware specification.

1. aarchie

2. x86

Moreover, I ran the benchmark once again in stage2. I chose the Redis library source in stage 2 and benchmarked it. Then I used the ASM inline assembler to optimize the code. However, ASM does not guarantee optimization over C language. It is better to use c language first for optimization. The two figures below show the result of using the original source and ASM. The two results are very similar.

I am performing Stage3 and thinking about code optimization again. Code optimization is a program conversion technique that improves code by consuming fewer resources (ie, CPU, memory), resulting in faster machine code generation. I think I should remind this meaning. I tried to convert only the code to a simple knowledge what I knew for code optimization. I thought that converting only the code would speed up execution, and I thought that changing the compile options would speed up the program. However, in a simple program, the difference is not so different. I have to keep a few things in mind for code optimization. First of all, I need to know exactly the environment of the OS or platform where my program will run. (Actually, the library which I chose on Stage2 did not run on x86.) And I think I should have a knowledge of the specs of the machine on which my program will run. So, I need to provide the user with the minimum recommended specification for my program. Finally, you should benchmark it repeatedly over and over. To make a good program, I have to test it repeatedly many times. If I follow these three things, I will be able to develop a program that is nearest to optimization. As I proceeded with this course project, I was not only knowledgeable about code optimization, but also experienced. I think in programming as well as coding skills, experience is very important to programmers. This experience will be very beneficial to me. And this project taught me how to perform in the upstream. And code optimization and portability are not simply changing the programming code. I have to be knowledgeable about all operating systems, platforms and hardware.

Tuesday 10 April 2018

SPO600 Project - Stage 2

In this Stage 2, I am going to learn deeply the Redis selected in Stage1. Redis (Remote Dictionary Server) is an in-memory-based key-value store. Performance is faster than memory-based databases, as it directly processes the data into memory. In the case of data types that can be stored, the rest of the repository provides only the primitive types, while the rest of the data types are data types such as String, Set, Hash, List, And provides basic functions such as search, add, and delete of data.
First, I chose a Redis client library to test the Redis database. It is Hiredis. Hiredis provides the APIs needed to manipulate Redis as a C client library. And, I will send a 1,000,000 commands to the server to benchmark using Hiredis. In relation to this, I picked c language source in a Github, and I modified this source. The Github URLs are below.

Hiredis Client Library Git: https://github.com/redis/hiredis
Benchmark Git: https://github.com/stefanwille/redis-client-benchmarks

This source is the source for the benchmark.

const int N = 1000000;

int main() {
    printf("Connecting...\n");
    redisContext *redis = redisConnect("localhost", 6379);

    clock_t start, end;
    float ftime;

    if (redis->err) {
        puts(redis->errstr);
        char *p = redis->errstr;
        while (*p) {
            printf("%x ", (int)(*p++));
        }
        printf("\n");
    }
    else {
        start = clock();

        char *cmd;
        int len;

        for (int i = 0; i < N; i++) {
            len = redisFormatCommand(&cmd, "HSET myset:__rand_int__ element:__rand_int__");

            redisAppendFormattedCommand(redis, cmd, len);
        }

        for (int i = 0; i < N; i++) {
            redisReply *reply;
            assert(redisGetReply(redis, (void*)&reply) == REDIS_OK);
            redisGetReply(redis, (void**)&reply);

            freeReplyObject(reply);
        }

        end = clock();

        ftime = (float)(end - start) / CLOCKS_PER_SEC;

        printf("Runing Time: %f sec. \n", ftime);
    }
}

I analyzed the Hiredis source to implement optimization for the function. Unfortunately, it was not easy to find a place to optimize. So I decided to apply the optimization what I learned in this lecture. The source below invokes the redisFormatCommand function of the Hiredis library. So I looked for a part of the function that I could optimize. The redisFormatCommand invokes the redisvFormatCommand function. So I figured out where to optimize the function and implemented the optimization. Of course, this may not be the right way for optimization. However, I wanted to make sure if performance improvements can be made when optimizing in small parts.

The source below is a modified source of Hiredis.

    while(*c != '\0') {
        if (*c != '%' || c[1] == '\0') {
            if (*c == ' ') {
                if (touched) {


                    //Change Source for optimization
                    int result;
                    __asm__ __volatile__("add %0,%1,%2 \n\t":"=r" (result) : "r"(argc), "r"(1));
                    newargv = realloc(curargv, sizeof(char*)*(result));

                    //newargv = realloc(curargv,sizeof(char*)*(argc+1)); - Original Source
                    if (newargv == NULL) goto memory_err;
                    curargv = newargv;
                    curargv[argc++] = curarg;
                    totlen += bulklen(sdslen(curarg));

                    /* curarg is put in argv so it can be overwritten. */
                    curarg = sdsempty();
                    if (curarg == NULL) goto memory_err;
                    touched = 0;

Now let's benchmark using the Hiredis library. The first picture is the one executed prior to optimization. And the second figure shows the results after optimization. I did not change the results dramatically because I modified a very small part. However, you can see that performance has improved a bit since you have executed 1,000,000 commands and executed optimizations in the while statement. I benchmarked the Hset datatype, compiled it with different compile options, and checked the speed. The compile option -03 is the most optimized result. However, there is not much difference depending on the option.

The result of original

The result of optimization.

I modified the source code to run on the x86 platform.

    while(*c != '\0') {
        if (*c != '%' || c[1] == '\0') {
            if (*c == ' ') {
                if (touched) {

                    //Change Source for optimization
                    int result;
                    __asm__("addl %%ebx, %%eax;" : "=a" (result) : "a" (argc), "b" (1));
                    newargv = realloc(curargv,sizeof(char*)*(argc+1));

                    //newargv = realloc(curargv,sizeof(char*)*(argc+1)); - Original Source
                    if (newargv == NULL) goto memory_err;
                    curargv = newargv;
                    curargv[argc++] = curarg;
                    totlen += bulklen(sdslen(curarg));

                    /* curarg is put in argv so it can be overwritten. */
                    curarg = sdsempty();
                    if (curarg == NULL) goto memory_err;
                    touched = 0;

I modified the source code to run on the x86 platform.
I have tried hard to run the Hiredis library and this code on the x86 platform, but I have not found a way. So I tried to run it on the Windows platform, but the Redis libraries were not suitable for running on Windows. The following picture shows errors when running the Hiredis library on an x86 platform. Even the example sources supported by Redis have not been compiled.

I learned a few important things through this stage. First, hash-based data is very fast and useful. In addition, the hash data can be used on any platform. However, it is only possible to configure the library well. In addition, I was able to understand the process of working together on Github. This is a very useful and essential stuff for programmers. Best of all, I found that even a small amount of code optimization can improve the performance of the system.