Wednesday, 28 February 2018

Lab4 - Vectorization Lab

In this lab, the performance of SIMD(Single Instruction Multiple Data) vectorization and auto-vectorization are examined by the GCC compiler. First of all, what is SIMD? It is a type of parallel processor that calculates multiple values simultaneously with a single instruction. It is often used in vector processors and is often used in multimedia applications such as video game consoles and graphics cards.

Below is the test source and compiles the source.
The compile options are: Below is the test source.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main() {
    int arrRand1[1000];
    int arrRand2[1000];
    int arrSum[1000];
    long int total;
    srand(time(NULL));

    for (int i = 0; i<1000; i++) {        
        arrRand1[i] = rand() % 2000 - 1000;
        arrRand2[i] = rand() % 2000 - 1000;

        arrSum[i] = arrRand1[i] + arrRand2[i];
        total += arrSum[i];
    }

    printf("%ld\n", total);

    return 0;
}

Disassemble the simple program above.
0000000000400560 <main>:
  400560:       d13f83ff        sub     sp, sp, #0xfe0
  400564:       d2800000        mov     x0, #0x0                        // #0
  400568:       a9007bfd        stp     x29, x30, [sp]
  40056c:       910003fd        mov     x29, sp
  400570:       a9025bf5        stp     x21, x22, [sp, #32]
  400574:       5289ba75        mov     w21, #0x4dd3                    // #19923
  400578:       a90153f3        stp     x19, x20, [sp, #16]
  40057c:       913f83b6        add     x22, x29, #0xfe0
  400580:       f9001bf7        str     x23, [sp, #48]
  400584:       72a20c55        movk    w21, #0x1062, lsl #16
  400588:       5280fa14        mov     w20, #0x7d0                     // #2000
  40058c:       910103b7        add     x23, x29, #0x40
  400590:       97ffffd8        bl      4004f0 <time@plt>
  400594:       97ffffeb        bl      400540 <srand@plt>
  400598:       97ffffde        bl      400510 <rand@plt>
  40059c:       97ffffdd        bl      400510 <rand@plt>
  4005a0:       9b357c01        smull   x1, w0, w21
  4005a4:       b84046e2        ldr     w2, [x23], #4
  4005a8:       9367fc21        asr     x1, x1, #39
  4005ac:       eb1602ff        cmp     x23, x22
  4005b0:       4b807c21        sub     w1, w1, w0, asr #31
  4005b4:       1b148020        msub    w0, w1, w20, w0
  4005b8:       510fa000        sub     w0, w0, #0x3e8
  4005bc:       0b020000        add     w0, w0, w2
  4005c0:       8b20c273        add     x19, x19, w0, sxtw
  4005c4:       54fffea1        b.ne    400598 <main+0x38>  // b.any
  4005c8:       aa1303e1        mov     x1, x19
  4005cc:       90000000        adrp    x0, 400000 <_init-0x4b8>
  4005d0:       911ee000        add     x0, x0, #0x7b8
  4005d4:       97ffffdf        bl      400550 <printf@plt>
  4005d8:       a9407bfd        ldp     x29, x30, [sp]
  4005dc:       52800000        mov     w0, #0x0                        // #0
  4005e0:       a94153f3        ldp     x19, x20, [sp, #16]
  4005e4:       a9425bf5        ldp     x21, x22, [sp, #32]
  4005e8:       f9401bf7        ldr     x23, [sp, #48]
  4005ec:       913f83ff        add     sp, sp, #0xfe0
  4005f0:       d65f03c0        ret
  4005f4:       00000000        .inst   0x00000000 ; undefined

Vectorization basically performs the same operation on successive data. Vectorization is a set of instructions that provides a SIMD (Single Instruction Multiple Data) architectures, in which the same operations are performed concurrently on successive data. Naturally, vectorization can result in higher performance than Single Instruction Single Data (SISD), which processes single data with a single existing instruction.







Tuesday, 27 February 2018

Lab 3 - Loop

In this lab, I implement loops through assembly language based on x86 64 and Aarch64. The loop will expand in the "Hello world" program. "Hello world" code was written as below.
 .text  
 .globl     _start  
   
 _start:  
      movq     $len,%rdx        /* message length */  
      movq      $msg,%rsi       /* message location */  
      movq     $1,%rdi          /* file descriptor stdout */  
      movq     $1,%rax          /* syscall sys_write */  
      syscall  
   
      movq     $0,%rdi          /* exit status */  
      movq     $60,%rax         /* syscall sys_exit */  
      syscall  
   
 .section .rodata  
   
 msg:     .ascii   "Hello, world!\n"  
      len = . - msg  
The first implementation will print a number from 0 to 9 on the screen. This is the code implemented on the x86 64 platform below.
 .text  
 .globl  _start  
   
 start = 0                  /* starting value for the loop */  
 max = 10                   /* ending value of loop */  
   
 _start:  
      mov      $start,%r15     /* loop index */  
   
 loop:  
      mov      %r15,%r14    /* copy loop index */  
      add      $48,%r14                 
      mov      %r14b,msg+6                 
   
      movq     $len,%rdx    /* message length */  
      movq     $msg,%rsi    /* message location */  
      movq     $1,%rdi      /* file descriptor stdout */  
      movq     $1,%rax      /* syscall sys_write */  
      syscall  
   
      inc      %r15         /* increment index */  
      cmp      $max,%r15    /* see if we're done */  
      jne      loop         /* loop if we're not */  
   
      movq     $0,%rdi      /* exit status */  
      movq     $60,%rax     /* syscall sys_exit */  
      syscall  
   
 .section .data  
   
 msg:     .ascii   "Loop: !\n"  
      len = . - msg  
The second one will display numbers from 0 to 30 on the screen. 0 to 9 will be preceded by a zero. This is the code implemented on the x86 64 platform below.
 .text  
 .globl     _start  
   
 start = 0                  /* starting value for the loop */  
 max = 31                   /* end loop number */  
   
 _start:  
      mov   $start,%r15     /* loop index */  
      mov   $0x30, %r12                 
   
 loop:  
    mov  $'0',%r14   
    mov  $10,%r13   
    mov  $0,%rdx   
    mov  %r15,%rax   
    div  %r13   
    cmp  $0,%rax   
   
    mov  %rax,%r13       
    add  %r14,%r13   
    mov  %r13,msg+6   
   
    mov  %rdx,%r12   
    add  %r14,%r12   
    mov  %r12,msg+7   
   
    movq     $len,%rdx    /* message length */  
    movq     $msg,%rsi    /* message location */  
    movq     $1,%rdi      /* file descriptor stdout */  
    movq     $1,%rax      /* syscall sys_write */  
    syscall  
   
    inc      %r15         /* increment index */  
    cmp      $max,%r15    /* see if we're done */  
    jne      loop         /* loop if we're not */  
   
    movq     $0,%rdi      /* exit status */  
    movq     $60,%rax     /* syscall sys_exit */  
    syscall  
   
 .section .data  
   
 msg: .ascii "Loop:   \n"  
      len = . - msg  
Finally, we implement the same loop code as above in the aarch64 platform.
 .text  
 .globl  _start  
 start = 0  
 max = 31  
 digit = 10  
   
 _start:  
     mov   x9, start  
     mov   x22, digit  
 loop:  
     mov   x0, 0  
     adr   x1, msg  
     mov   x2, len  
   
     mov   x8, 64  
     svc   0  
   
     mov   x10,10  
     adr   x23, msg  
     udiv  x20, x19, x22  
     msub  x21, x22, x20, x19  
     cmp   x9, 10  
   
     add   x20, x20, 0x30  
     strb  w20, [x1,6]  
   
     add   x21, x21, 0x30  
     strb  w21, [x1,7]  
   
     add   x19, x19, 1  
     cmp   x19, max  
     bne   loop  
     mov   x0, 0  
     mov   x8, 93  
     svc   0  
 .data  
     msg: .ascii "Loop: 0\n"  
     len = . - msg  
The x86 64 and Aarch64 platforms should use different commands. We must specify the address value directly or indirectly with the command. Learning the assembler will give a good understanding of the computer system and structure, as well as a better understanding of memory.

Saturday, 3 February 2018

Lab2 - Compiled C Lab

There is a file written in the c language. The file name is hello.c. Then I will compile as follows.
# gcc –o hello hello.c 
Compiling is done in the following order.
Compile Process


1. Source Code( .c ) - The source code written in c language by the user has the extension c.
#include <stdio.h>
int main() {
    printf("Hello World!\n");
}
Preprocessing
2. After preprocessing, the source-preprocessing file is created with the extension i. We start to translate C language into machine language.
Compile
3. Assembly Source - A file with s extension converted to assembly language that is most similar to machine language is created.
Assembly compile
4. Object file - A binary file is created. The extension is o.
Link
5. Executable - Creates a file that can be executed by a link.

Recompile the code with these changes
(1) Add the compiler option -static.  
gcc -g -O0 -fno-builtin -static -o hello hello.c
- If the static library and the shared library are together, the static library is linked first. (Consider the faster file size, but faster!)
000000000040048c <main>:
  40048c:       a9bf7bfd        stp     x29, x30, [sp, #-16]!
  400490:       910003fd        mov     x29, sp
  400494:       90000320        adrp    x0, 464000 <free_mem+0x88>
  400498:       911f0000        add     x0, x0, #0x7c0
  40049c:       94002d63        bl      40ba28 <_IO_printf>
  4004a0:       52800000        mov     w0, #0x0                        // #0
  4004a4:       a8c17bfd        ldp     x29, x30, [sp], #16
  4004a8:       d65f03c0        ret
  4004ac:       00000000        .inst   0x00000000 ; undefined
  
(2) Remove the compiler option -fno-builtin.
gcc -g -O0 -o hello hello.c
- Even if the -fno-builtin option is specified, it has the same meaning as a C library function. Many of these features are only optimized in certain cases. If it is not optimized in special cases, library function calls are emitted.  
0000000000400594 <main>:
  400594:       a9bf7bfd        stp     x29, x30, [sp, #-16]!
  400598:       910003fd        mov     x29, sp
  40059c:       90000000        adrp    x0, 400000 <_init-0x418>
  4005a0:       9119c000        add     x0, x0, #0x670
  4005a4:       97ffffb7        bl      400480 <puts@plt>
  4005a8:       52800000        mov     w0, #0x0                        // #0
  4005ac:       a8c17bfd        ldp     x29, x30, [sp], #16
  4005b0:       d65f03c0        ret
  4005b4:       00000000        .inst   0x00000000 ; undefined

(3) Remove the compiler option -g. 
gcc -O0 -o hello hello.c
-g asks the compiler and linker to generate symbol information and keep it in the executable itself.
0000000000400594 <main>:
  400594:       a9bf7bfd        stp     x29, x30, [sp, #-16]!
  400598:       910003fd        mov     x29, sp
  40059c:       90000000        adrp    x0, 400000 <_init-0x418>
  4005a0:       9119c000        add     x0, x0, #0x670
  4005a4:       97ffffb7        bl      400480 <puts@plt>
  4005a8:       52800000        mov     w0, #0x0                        // #0
  4005ac:       a8c17bfd        ldp     x29, x30, [sp], #16
  4005b0:       d65f03c0        ret
  4005b4:       00000000        .inst   0x00000000 ; undefined

(4) Add additional arguments to the printf() function in your program.
gcc -g -O0 -fno-builtin -o hello hello_arg.c
- The compiler will warn you about invalid placed arguments based on the format string used
0000000000400594 <main>:
#include <stdio.h>
void main()
{
  400594:       a9bf7bfd        stp     x29, x30, [sp, #-16]!
  400598:       910003fd        mov     x29, sp
 printf("Hello world",1, 2, 3, 4, 5);
  40059c:       90000000        adrp    x0, 400000 <_init-0x418>
  4005a0:       911a0000        add     x0, x0, #0x680
  4005a4:       528000a5        mov     w5, #0x5                        // #5
  4005a8:       52800084        mov     w4, #0x4                        // #4
  4005ac:       52800063        mov     w3, #0x3                        // #3
  4005b0:       52800042        mov     w2, #0x2                        // #2
  4005b4:       52800021        mov     w1, #0x1                        // #1
  4005b8:       97ffffb2        bl      400480 <printf@plt>
  4005bc:       d503201f        nop
  4005c0:       a8c17bfd        ldp     x29, x30, [sp], #16
  4005c4:       d65f03c0        ret

  
(5) Move the printf() call to a separate function named output(), and call that function from main(). 
gcc -g -O0 -fno-builtin -o hello hello_fnc.c
00000000004005b4 <main>:
int main()
{
  4005b4:       a9bf7bfd        stp     x29, x30, [sp, #-16]!
  4005b8:       910003fd        mov     x29, sp
 hello();
  4005bc:       97fffff6        bl      400594 <hello>
 return 0;
  4005c0:       52800000        mov     w0, #0x0                        // #0
}
  4005c4:       a8c17bfd        ldp     x29, x30, [sp], #16
  4005c8:       d65f03c0        ret
  4005cc:       00000000        .inst   0x00000000 ; undefined
  
(6) Remove -O0 and add -O3 to the gcc options. 
gcc -g -O3 -fno-builtin -o hello hello.c
- The -O3 option is the highest level of optimization. Treat all functions like inline functions. (The Call instruction is used, but it is recommended that you do not use it as much as possible. There is a risk of distortion because too many sources are changed.)

0000000000400490 <main>:
#include <stdio.h>
int main() {
  400490:       a9bf7bfd        stp     x29, x30, [sp, #-16]!
    printf("Hello World!\n");
  400494:       90000000        adrp    x0, 400000 <_init-0x418>
  400498:       9119c000        add     x0, x0, #0x670
int main() {
  40049c:       910003fd        mov     x29, sp
    printf("Hello World!\n");
  4004a0:       97fffff8        bl      400480 <printf@plt>
  4004a4:       52800000        mov     w0, #0x0                        // #0
  4004a8:       a8c17bfd        ldp     x29, x30, [sp], #16
  4004ac:       d65f03c0        ret





Lab 1 - MySQL vs MariaDB



Maria DB
MariaDB is an open source relational database management system (RDBMS). It is based on the same source code as MySQL and complies with the GPL v2 license. It is built against Oracle's current uncertain MySQL licensing status, and distributors should share copyright with the Monty Program AB (AB). This is intended to maintain high compatibility with MySQL, and precisely match the MySQL API and commands to provide for library binaries and equivalencies to improve interchangeability. Maria DB includes a new storage engine, Aria, as well as an XtraDB storage engine that can replace InnoDB. This is intended to accommodate transactions, non-transactional engines, and future versions of MySQL.



MySQL

MySQL is known as a typical open source DBMS widely used in general web development such as home page and shopping mall. In 2007, it was acquired by Sun Microsystems for $ 1 billion, but in 2009, Sun Microsystems was acquired by Oracle and naturally became an Oracle property.
Since then, the problem has arisen from blaming Oracle for claiming that Oracle is bringing MySQL to work around the MySQL community.
Michael Monty Wide Nyus, the creator of MySQL and the founder of the MySQL Foundation, directly raised the criticism that "Oracle is leading MySQL in a closed direction for profit."


Anyone can register a bug with MySQL on the site below.
https://bugs.mysql.com/
If we find bugs while using MySQL, we can report bugs at MySQL Bug Home.
Then, MySQL developers will analyze the bug and verify us how to fix it.
And when they need to patch, they release a new version of MySQL.
In case of databases, there are many bugs or issues related to configuration or development language.
Therefore, it is more important to cope with the problems according to the development environment than when updating the patch file.

Moreover, MariaDB is downloadable by version at the download section within the MariaDB website. When a new version or patch occurs, it will be registered on the download page.
MariaDB runs a blog on their website so we can notice about all the software related to MariaDB through their blog. MariaDB, unlike MySQL, does not allow developers to report bugs directly.



SPO600 Project - Stage 3

I chose Redis (Remote Dictionary Server) for my project at stage1. Redis is open source software developed by Salvatore Sanfilippo, a volati...