The inline assembler is an assembler that allows the use of assembly language at the language level and translates the assembly language between the codes into the machine language when the compiler translates the source code. In other words, it refers to using assembler commands directly in high-level languages. In this lab, we will be testing the inline assembler. First, run the example source on the aarchie server.
// vol_simd.c :: volume scaling in C using AArch64 SIMD
// Chris Tyler 2017.11.29-2018.02.20
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include "vol.h"
int main() {
int16_t* in; // input array
int16_t* limit; // end of input array
int16_t* out; // output array
// these variables will be used in our assembler code, so we're going
// to hand-allocate which register they are placed in
// Q: what is an alternate approach?
register int16_t* in_cursor asm("r20"); // input cursor
register int16_t* out_cursor asm("r21"); // output cursor
register int16_t vol_int asm("r22"); // volume as int16_t
int x; // array interator
int ttl; // array total
in=(int16_t*) calloc(SAMPLES, sizeof(int16_t));
out=(int16_t*) calloc(SAMPLES, sizeof(int16_t));
srand(-1);
printf("Generating sample data.\n");
for (x = 0; x < SAMPLES; x++) {
in[x] = (rand()%65536)-32768;
}
// --------------------------------------------------------------------
in_cursor = in;
out_cursor = out;
limit = in + SAMPLES ;
// set vol_int to fixed-point representation of 0.75
// Q: should we use 32767 or 32768 in next line? why?
vol_int = (int16_t) (0.75 * 32767.0);
printf("Scaling samples.\n");
// Q: what does it mean to "duplicate" values in the next line?
__asm__ ("dup v1.8h,%w0"::"r"(vol_int)); // duplicate vol_int into v1.8h
while ( in_cursor < limit ) {
__asm__ (
"ldr q0, [%[in]],#16 \n\t"
// load eight samples into q0 (v0.8h)
// from in_cursor, and post-increment
// in_cursor by 16 bytes
"sqdmulh v0.8h, v0.8h, v1.8h \n\t"
// multiply each lane in v0 by v1*2
// saturate results
// store upper 16 bits of results into v0
"str q0, [%[out]],#16 \n\t"
// store eight samples to out_cursor
// post-increment out_cursor by 16 bytes
// Q: what happens if we remove the following
// two lines? Why?
: [in]"+r"(in_cursor)
: "0"(in_cursor),[out]"r"(out_cursor)
);
}
// --------------------------------------------------------------------
printf("Summing samples.\n");
for (x = 0; x < SAMPLES; x++) {
ttl=(ttl+out[x])%1000;
}
// Q: are the results usable? are they correct?
printf("Result: %d\n", ttl);
return 0;
}
The image below is the result of running the above code.
Now, let's answer the questions in the code.
Q: what is an alternate approach?
An alternative approach is not to assign a value to an object, but to allow the object to recognize the value. Therefore, declare a register variable without assigning a value.
Q: should we use 32767 or 32768 in next line? why?
The range of integer values is -32,768 to 32,767. Therefore, we must use the value 32,767.
Q: what does it mean to "duplicate" values in the next line?
It means to store the value of vol_int variable in v1.8h.
Q: what happens if we remove the following two lines? Why?
If these two lines are erased, a segment fault will occur.
Q: are the results usable? are they correct?
Yes, the results are correct and usable.
The assembly is 1:1 matched with the machine language. The C language combines several CPU instructions into a single statement. Eventually, it can be seen to the extent that you can easily express the part that is a little handy when you implement the assembly.
Thus, C language is called low-level language and more machine-friendly. If you think about common sense, if you are only good at optimizing it, it will be almost similar to what you've coded into the assembly. It's a good expression for creating programs like speed-critical operating systems.
No comments:
Post a Comment