计组

Chapter1: Computer Abstraction and Technology

\(KB = 10^3B, KiB = 2^{10}B\)
Response Time / Execution Time 从程序开始到结束的时间
Throughput / Bandwidth(带宽) 单位时间内完成的任务数量
Performance 可以定义为 \(\frac{1}{Response\ Time}\)
Eight Great Ideas
Design for Moore’s Law(设计紧跟摩尔定律)
Use Abstraction to Simplify Design (采用抽象简化设计)
Make the Common Case Fast (加速大概率事件)
Performance via Parallelism (通过并行提高性能)
Performance via Pipelining (通过流水线提高性能)
Performance via Prediction (通过预测提高性能)
- 例如先当作 if() 条件成立，执行完内部内容，如果后来发现确实成立，那么直接 apply，否则就再重新正常做；
- 这么做就好在，预测成功了就加速了，预测失败了纠正的成本也不高；
Hierarchy of Memories (存储器层次)
- Disk / Tape -> Main Memory(DRAM) -> L2-Cache(SRAM) -> L1-Cache -> Registers
Dependability via Redundancy (通过冗余提高可靠性)
CPU time的计算
\(CPU\ Time = Instruction\ Count \times CPI\times Clock\ Cycle \ Time=\frac{Instruction\ Count \times CPI}{Clock\ Rate}\)
CPI: Cycles per instruction

Chapter2: Instructions: Language of the Computer

Operations of the computer hardware (计算机硬件的操作)

\(add\ a,\ b,\ c\ :\ a\leftarrow b+c\)
c //C code: f = (g + h) - (i + j); //RISC-V code add t0, g, h add t1, i, j sub f, t0, t1

Operands of the computer hardware(计算机硬件的操作数)

RISCV提供了32个数据寄存器x0~x31, 他们都是64bit
一个word为32位(4个字节), doubleword为64位(8个字节)
RISC-V architecture 的地址是 64 位的，地址为字节地址，因此总共可以寻址 \(2^{64}\) 个字节，即 \(2^{61}\) 个 dword

Name	Usage	Preserved On call
x0	The constant value 0	n.a.
x1(ra)	Return address(link register)	yes
x2(sp)	Stack pointer	yes
x3(gp)	Global pointer	yes
x4(tp)	Thread(线程) pointer	yes
x5-x7	Temporaries	no
x8(opti)	frame pointer	yes
x8-x9	Saved	yes
x10-x17	Arguments/results	no
x18-x27	Saved	yes
x28-x31	Temporaries	no

c //C code: f = (g + h) - (i + j); //RISC-V code add t0, g, h add t1, i, j sub f, t0, t1 //Compiled RISC-V code add x5, x20, x21 add x6, x22, x23 sub x19, x5, x6
Load from memory to registers( ld xxx (from) xxx )

Store from register to memory( sd xxx (to) xxx )

Memory Alignment(内存对齐)

c //C code A[12] = h + A[8]; //h in x21, base address of A in x22 //Compiled RISC-V code ld x9, 64(x22) add x9, x21, x9 sd x9, 96(x22)
RISC-V 支持 PC relative 寻址、立即数寻址 ( lui )、间接寻址 ( jalr )、基址寻址 ( 8(sp) )：

Signed and unsigned numbers (有符号和无符号数)

\(x+ \overline x=111...111_2=-1\), 因此\(-x=\overline x+1\), 前导 0 表示正数，前导 1 表示负数。

因此在将不足 64 位的数据载入寄存器时，如果数据是无符号数，只需要使用 0 将寄存器的其他部分填充 (zero extension)；而如果是符号数，则需要用最高位即符号位填充剩余部分，称为符号扩展 (sign extension)。

即，在指令中的 lw , lh , lb 使用 sign extension，而 lwu , lhu , lbu 使用 zero extension

ld/sd : load doubleword / store doubleword

lw/sw : load word / store word

lh/sh : load halfword / store halfword

lb/sb : load byte / store byte

后面加u(lwu, lhu, lbu等)即指unsigned

1's Complement表示按位取反

2's Complement表示按位取反后加一

Representing instructions in the computer(计算机中指令的表示)

其中 I 型指令有两个条目；这是因为立即数移位操作 slli , srli , srai 并不可能对一个 64 位寄存器进行大于 63 位的移位操作，因此 12 位 imm 中只有后 6 位能实际被用到，因此前面 6 位被用来作为一个额外的操作码字段，如上图中第二个 I 条目那样。其他 I 型指令适用第一个 I 条目。

另外，为什么 SB 和 UJ 不存立即数（也就是偏移）的最低位呢？因为，偏移的最后一位一定是 0，即地址一定是 2 字节对齐的，因此没有必要保存。

rd: destination register
rs: source register

R-type

Arithmetic

add x9, x20, x21
//rd = 01001, rs1 = 10100, rx2 = 10101

I-type

load & Immediate

ld x9, 64(x22)
//rd = 9, rs1 = 22, i = 64
addi x1, x2, 1000
//rd = 1, rs1 = 2, i = 1000
sra: shift right arithmetic
srli: shift right left immediate
//逻辑右移就是不考虑符号

S-type

store

sd x9, 64(x22)
//rs1 = 22, rs2 = 9, i = 64

SB-type

条件分支

C code:
    if(i == j) goto L1;
    f = g + h;
L1: f = f - i;

RISC-V:
    beq x22, x23, L1
    add x19, x20, x21
    sub x19, x19, x22
(Assume f~j : x19~x23)

beq/bne 等或不等跳转
slt: set on less than
slt x5, x8, x9 #x5 = 1 if x8 < x9
blt/bge 小于或大于等于, 跳转
后面加u则为无符号比较, 如bltu, bgeu

UJ-type

jal/jalr : jump and link (register)

U-type

lui : load upper immediate

Supporting procedures in computer hardware(计算机对过程的支持)

int fact(int n)
{
    if(n < 1)   return 1;
    else return n * fact(n - 1);
}

RISC-V:

fact:   addi sp, sp, 16     # adjust stack for 2 items
        sd ra, 8(sp)        # save the return address：x1
        sd a0, 0($sp)       # save the argument n: x10
        addi t0, a0, -1     # x5 = n - 1
        bge t0, zero, L1    # if n >= 1, go to L1(else)
        addi a0, zero, 1    # return 1 if n <1
        addi sp, sp, 16     # Recover sp (Why not recover x1and x10 ?)
        jalr zero, 0(ra)    # return to caller

L1:     addi a0, a0, -1     # n >= 1: argument gets ( n - 1 )
        jal ra, fact        # call fact with ( n - 1)
        add t1, a0, zero    # move result of fact(n - 1) to x6(t1)
        ld a0, 0(sp)        # return from jal: restore argument n
        ld ra, 8(sp)        # restore the return address
        add sp, sp, 16      # adjust stack pointer to pop 2 items
        mul a0, a0, t1      # return n*fact ( n - 1 )
        jalr zero, 0(ra)    # return to the caller