OpenTitan Big Number Accelerator (OTBN) Instruction Set Architecture

This document describes the instruction set for OTBN. For more details about the processor itself, see the OTBN Technical Specification. In particular, this document assumes knowledge of the Processor State section from that guide.

The instruction set is split into base and big number subsets. The base subset (described first) is similar to RISC-V’s RV32I instruction set. It also includes a hardware call stack and hardware loop instructions. The big number subset is designed to operate on 256b WDRs. It doesn’t include any control flow instructions, and just supports load/store, logical and arithmetic operations.

In the instruction documentation that follows, each instruction has a syntax example. For example, the SW instruction has syntax:

  SW <grs2>, <offset>(<grs1>)

This means that it takes three operands, called grs2, offset and grs1. These operands are further documented in a table. Immediate operands like offset show their valid range of values.

Below the table of operands is an encoding table. This shows how the 32 bits of the instruction word are filled in. Ranges of bits that map to an operand are named (in capitals) and those names are used in the operand table. For example, the SW instruction’s offset operand is split across two ranges of bits (31:25 and 11:7) called OFF_1 and OFF_0, respectively.

Pseudo-code for operation descriptions

Each instruction has an Operation section. This is written in a Python-like pseudo-code, generated from the instruction set simulator (which can be found at hw/ip/otbn/dv/otbnsim). The code is generated from Python, but there are some extra changes made to aid readability.

All instruction operands are considered to be in scope and have integer values. These values come from the encoded bits in the instruction and the operand table for the instruction describes exactly how they are decoded. Some operands are encoded PC-relative. Such an operand has its absolute value (an address) when it appears in the Operation section.

Some state updates are represented as an assignment, but take effect at the end of the instruction. This includes register updates or jumps and branches (updating the PC). To denote this, we use the ⇐ symbol, reminiscent of Verilog’s non-blocking assignment.

The program counter (PC) is represented as a variable called PC.

Machine registers are accessed with an array syntax. These arrays are:

GPRs: General purpose registers
WDRs: Wide data registers
CSRs: Control and status registers
WSRs: Wide special purpose registers

Accesses to these arrays are as unsigned integers. The instruction descriptions are written to ensure that any value written to a register is representable. For example, a write to GPRs[2] will always have a non-negative value less than 1 << 32.

Memory accesses are represented as function calls. This is because the memory can be accessed on either the narrow or the wide side, which isn’t easy to represent with an array syntax. Memory loads are represented as DMEM.load_u32(addr), DMEM.load_u256(addr). Memory stores are represented as DMEM.store_u32(addr, value) and DMEM.store_u256(addr, value). In all cases, memory values are interpreted as unsigned integers and, as for register accesses, the instruction descriptions are written to ensure that any value stored to memory is representable.

Some instructions can stall for one or more cycles (those instructions that access memory, CSRs or WSRs). To represent this precisely in the pseudo-code, and the simulator reference model, such instructions execute a yield statement to stall the processor for a cycle.

There are a few other helper functions, defined here to avoid having to inline their bodies into each instruction.

def from_2s_complement(n: int) -> int:
    '''Interpret the bits of unsigned integer n as a 32-bit signed integer'''
    assert 0 <= n < (1 << 32)
    return n if n < (1 << 31) else n - (1 << 32)


def to_2s_complement(n: int) -> int:
    '''Interpret the bits of signed integer n as a 32-bit unsigned integer'''
    assert -(1 << 31) <= n < (1 << 31)
    return (1 << 32) + n if n < 0 else n

def logical_byte_shift(value: int, shift_type: int, shift_bytes: int) -> int:
    '''Logical shift value by shift_bytes to the left or right.

    value should be an unsigned 256-bit value. shift_type should be 0 (shift
    left) or 1 (shift right), matching the encoding of the big number
    instructions. shift_bytes should be a non-negative number of bytes to shift
    by.

    Returns an unsigned 256-bit value, truncating on an overflowing left shift.

    '''
    mask256 = (1 << 256) - 1
    assert 0 <= value <= mask256
    assert 0 <= shift_type <= 1
    assert 0 <= shift_bytes

    shift_bits = 8 * shift_bytes
    shifted = value << shift_bits if shift_type == 0 else value >> shift_bits
    return shifted & mask256

def extract_quarter_word(value: int, qwsel: int) -> int:
    '''Extract a 64-bit quarter word from a 256-bit value.'''
    assert 0 <= value < (1 << 256)
    assert 0 <= qwsel <= 3
    return (value >> (qwsel * 64)) & ((1 << 64) - 1)

Errors

OTBN can detect various errors when it is operating. For details about OTBN’s approach to error handling, see the Errors section of the Technical Specification. The instruction descriptions below describe any software errors that executing the instruction can cause. These errors are listed explicitly and also appear in the pseudo-code description, where the code sets a bit in the ERR_BITS register with a call to state.stop_at_end_of_cycle().

Other errors are possible at runtime. Specifically, any instruction that reads from a GPR or WDR might detect a register integrity error. In this case, OTBN will set the REG_INTG_VIOLATION bit. Similarly, an instruction that loads from memory might detect a DMEM integrity error. In this case, OTBN will set the DMEM_INTG_VIOLATION bit.

TODO: Specify interactions between these fatal errors and any other errors. In particular, how do they interact with instructions that could cause other errors as well?

Base Instruction Subset

The base instruction set of OTBN is a limited 32b instruction set. It is used together with the 32b wide General Purpose Register file. The primary use of the base instruction set is the control flow in applications.

The base instruction set is an extended subset of RISC-V’s RV32I_Zcsr. Refer to the RISC-V Unprivileged Specification for a detailed instruction specification. Not all RV32 instructions are implemented. The implemented subset is shown below.

ADD

Add.

This instruction is defined in the RV32I instruction set.

Errors

ADD might cause the following software errors:

A CALL_STACK error from using x1 as grs1 or grs2 when the call stack is empty.
A CALL_STACK error from using x1 as grd when the call stack is full and neither grs1 nor grs2 is x1.

Syntax

ADD <grd>, <grs1>, <grs2>

Operands

Operand	Description
`grd`	Decode as `unsigned(GRD)`
`grs1`	Decode as `unsigned(GRS1)`
`grs2`	Decode as `unsigned(GRS2)`

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
ADD	0	0	0	0	0	0	0	GRS2					GRS1					0	0	0	GRD					0	1	1	0	0	1	1

Operation

val1 = GPRs[grs1]
val2 = GPRs[grs2]
if state.gprs.call_stack_err:
    state.stop_at_end_of_cycle(ErrBits.CALL_STACK)
    return

result = (val1 + val2) & ((1 << 32) - 1)
GPRs[grd] ⇐ result

ADDI

Add Immediate.

This instruction is defined in the RV32I instruction set.

Errors

ADDI might cause the following software errors:

A CALL_STACK error from using x1 as grs1 when the call stack is empty.
A CALL_STACK error from using x1 as grd when the call stack is full and grs1 is not x1.

Syntax

ADDI <grd>, <grs1>, <imm>

Operands

Operand Description

grd

Decode as unsigned(GRD)

grs1

Decode as unsigned(GRS1)

imm

Valid range: -2048 to 2047.

Decode as signed(IMM)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
ADDI	IMM												GRS1					0	0	0	GRD					0	0	1	0	0	1	1

Operation

val1 = GPRs[grs1]
if state.gprs.call_stack_err:
    state.stop_at_end_of_cycle(ErrBits.CALL_STACK)
    return

result = (val1 + imm) & ((1 << 32) - 1)
GPRs[grd] ⇐ result

LUI

Load Upper Immediate.

This instruction is defined in the RV32I instruction set.

Errors

LUI might cause the following software errors:

A CALL_STACK error from using x1 as grd when the call stack is full.

Syntax

LUI <grd>, <imm>

Operands

Operand Description

grd

Decode as unsigned(GRD)

imm

Valid range: 0 to 1048575.

Decode as unsigned(IMM)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
LUI	IMM																				GRD					0	1	1	0	1	1	1

Operation

GPRs[grd] ⇐ imm << 12

SUB

Subtract.

This instruction is defined in the RV32I instruction set.

Errors

SUB might cause the following software errors:

A CALL_STACK error from using x1 as grs1 or grs2 when the call stack is empty.
A CALL_STACK error from using x1 as grd when the call stack is full and neither grs1 nor grs2 is x1.

Syntax

SUB <grd>, <grs1>, <grs2>

Operands

Operand	Description
`grd`	Decode as `unsigned(GRD)`
`grs1`	Decode as `unsigned(GRS1)`
`grs2`	Decode as `unsigned(GRS2)`

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
SUB	0	1	0	0	0	0	0	GRS2					GRS1					0	0	0	GRD					0	1	1	0	0	1	1

Operation

val1 = GPRs[grs1]
val2 = GPRs[grs2]
if state.gprs.call_stack_err:
    state.stop_at_end_of_cycle(ErrBits.CALL_STACK)
    return

result = (val1 - val2) & ((1 << 32) - 1)
GPRs[grd] ⇐ result

SLL

Logical left shift.

This instruction is defined in the RV32I instruction set.

Errors

SLL might cause the following software errors:

A CALL_STACK error from using x1 as grs1 or grs2 when the call stack is empty.
A CALL_STACK error from using x1 as grd when the call stack is full and neither grs1 nor grs2 is x1.

Syntax

SLL <grd>, <grs1>, <grs2>

Operands

Operand	Description
`grd`	Decode as `unsigned(GRD)`
`grs1`	Decode as `unsigned(GRS1)`
`grs2`	Decode as `unsigned(GRS2)`

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
SLL	0	0	0	0	0	0	0	GRS2					GRS1					0	0	1	GRD					0	1	1	0	0	1	1

Operation

val1 = GPRs[grs1]
val2 = GPRs[grs2] & 0x1f
if state.gprs.call_stack_err:
    state.stop_at_end_of_cycle(ErrBits.CALL_STACK)
    return

result = (val1 << val2) & ((1 << 32) - 1)
GPRs[grd] ⇐ result

SLLI

Logical left shift with Immediate.

This instruction is defined in the RV32I instruction set.

Errors

SLLI might cause the following software errors:

A CALL_STACK error from using x1 as grs1 when the call stack is empty.
A CALL_STACK error from using x1 as grd when the call stack is full and grs1 is not x1.

Syntax

SLLI <grd>, <grs1>, <shamt>

Operands

Operand Description

grd

Decode as unsigned(GRD)

grs1

Decode as unsigned(GRS1)

shamt

Valid range: 0 to 31.

Decode as unsigned(SHAMT)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
SLLI	0	0	0	0	0	0	0	SHAMT					GRS1					0	0	1	GRD					0	0	1	0	0	1	1

Operation

val1 = GPRs[grs1]
if state.gprs.call_stack_err:
    state.stop_at_end_of_cycle(ErrBits.CALL_STACK)
    return

result = (val1 << shamt) & ((1 << 32) - 1)
GPRs[grd] ⇐ result

SRL

Logical right shift.

This instruction is defined in the RV32I instruction set.

Errors

SRL might cause the following software errors:

A CALL_STACK error from using x1 as grs1 or grs2 when the call stack is empty.
A CALL_STACK error from using x1 as grd when the call stack is full and neither grs1 nor grs2 is x1.

Syntax

SRL <grd>, <grs1>, <grs2>

Operands

Operand	Description
`grd`	Decode as `unsigned(GRD)`
`grs1`	Decode as `unsigned(GRS1)`
`grs2`	Decode as `unsigned(GRS2)`

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
SRL	0	0	0	0	0	0	0	GRS2					GRS1					1	0	1	GRD					0	1	1	0	0	1	1

Operation

val1 = GPRs[grs1]
val2 = GPRs[grs2] & 0x1f
if state.gprs.call_stack_err:
    state.stop_at_end_of_cycle(ErrBits.CALL_STACK)
    return

result = val1 >> val2
GPRs[grd] ⇐ result

SRLI

Logical right shift with Immediate.

This instruction is defined in the RV32I instruction set.

Errors

SRLI might cause the following software errors:

A CALL_STACK error from using x1 as grs1 when the call stack is empty.
A CALL_STACK error from using x1 as grd when the call stack is full and grs1 is not x1.

Syntax

SRLI <grd>, <grs1>, <shamt>

Operands

Operand Description

grd

Decode as unsigned(GRD)

grs1

Decode as unsigned(GRS1)

shamt

Valid range: 0 to 31.

Decode as unsigned(SHAMT)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
SRLI	0	0	0	0	0	0	0	SHAMT					GRS1					1	0	1	GRD					0	0	1	0	0	1	1

Operation

val1 = GPRs[grs1]
if state.gprs.call_stack_err:
    state.stop_at_end_of_cycle(ErrBits.CALL_STACK)
    return

result = val1 >> shamt
GPRs[grd] ⇐ result

SRA

Arithmetic right shift.

This instruction is defined in the RV32I instruction set.

Errors

SRA might cause the following software errors:

A CALL_STACK error from using x1 as grs1 or grs2 when the call stack is empty.
A CALL_STACK error from using x1 as grd when the call stack is full and neither grs1 nor grs2 is x1.

Syntax

SRA <grd>, <grs1>, <grs2>

Operands

Operand	Description
`grd`	Decode as `unsigned(GRD)`
`grs1`	Decode as `unsigned(GRS1)`
`grs2`	Decode as `unsigned(GRS2)`

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
SRA	0	1	0	0	0	0	0	GRS2					GRS1					1	0	1	GRD					0	1	1	0	0	1	1

Operation

val1 = from_2s_complement(GPRs[grs1])
val2 = GPRs[grs2] & 0x1f
if state.gprs.call_stack_err:
    state.stop_at_end_of_cycle(ErrBits.CALL_STACK)
    return

result = val1 >> val2
GPRs[grd] ⇐ to_2s_complement(result)

SRAI

Arithmetic right shift with Immediate.

This instruction is defined in the RV32I instruction set.

Errors

SRAI might cause the following software errors:

A CALL_STACK error from using x1 as grs1 when the call stack is empty.
A CALL_STACK error from using x1 as grd when the call stack is full and grs1 is not x1.

Syntax

SRAI <grd>, <grs1>, <shamt>

Operands

Operand Description

grd

Decode as unsigned(GRD)

grs1

Decode as unsigned(GRS1)

shamt

Valid range: 0 to 31.

Decode as unsigned(SHAMT)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
SRAI	0	1	0	0	0	0	0	SHAMT					GRS1					1	0	1	GRD					0	0	1	0	0	1	1

Operation

val1 = from_2s_complement(GPRs[grs1])
val2 = shamt
if state.gprs.call_stack_err:
    state.stop_at_end_of_cycle(ErrBits.CALL_STACK)
    return

result = val1 >> val2
GPRs[grd] ⇐ to_2s_complement(result)

AND

Bitwise AND.

This instruction is defined in the RV32I instruction set.

Errors

AND might cause the following software errors:

A CALL_STACK error from using x1 as grs1 or grs2 when the call stack is empty.
A CALL_STACK error from using x1 as grd when the call stack is full and neither grs1 nor grs2 is x1.

Syntax

AND <grd>, <grs1>, <grs2>

Operands

Operand	Description
`grd`	Decode as `unsigned(GRD)`
`grs1`	Decode as `unsigned(GRS1)`
`grs2`	Decode as `unsigned(GRS2)`

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
AND	0	0	0	0	0	0	0	GRS2					GRS1					1	1	1	GRD					0	1	1	0	0	1	1

Operation

val1 = GPRs[grs1]
val2 = GPRs[grs2]
if state.gprs.call_stack_err:
    state.stop_at_end_of_cycle(ErrBits.CALL_STACK)
    return

result = val1 & val2
GPRs[grd] ⇐ result

ANDI

Bitwise AND with Immediate.

This instruction is defined in the RV32I instruction set.

Errors

ANDI might cause the following software errors:

A CALL_STACK error from using x1 as grs1 when the call stack is empty.
A CALL_STACK error from using x1 as grd when the call stack is full and grs1 is not x1.

Syntax

ANDI <grd>, <grs1>, <imm>

Operands

Operand Description

grd

Decode as unsigned(GRD)

grs1

Decode as unsigned(GRS1)

imm

Valid range: -2048 to 2047.

Decode as signed(IMM)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
ANDI	IMM												GRS1					1	1	1	GRD					0	0	1	0	0	1	1

Operation

val1 = GPRs[grs1]
val2 = to_2s_complement(imm)
if state.gprs.call_stack_err:
    state.stop_at_end_of_cycle(ErrBits.CALL_STACK)
    return

result = val1 & val2
GPRs[grd] ⇐ result

OR

Bitwise OR.

This instruction is defined in the RV32I instruction set.

Errors

OR might cause the following software errors:

A CALL_STACK error from using x1 as grs1 or grs2 when the call stack is empty.
A CALL_STACK error from using x1 as grd when the call stack is full and neither grs1 nor grs2 is x1.

Syntax

OR <grd>, <grs1>, <grs2>

Operands

Operand	Description
`grd`	Decode as `unsigned(GRD)`
`grs1`	Decode as `unsigned(GRS1)`
`grs2`	Decode as `unsigned(GRS2)`

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
OR	0	0	0	0	0	0	0	GRS2					GRS1					1	1	0	GRD					0	1	1	0	0	1	1

Operation

val1 = GPRs[grs1]
val2 = GPRs[grs2]
if state.gprs.call_stack_err:
    state.stop_at_end_of_cycle(ErrBits.CALL_STACK)
    return

result = val1 | val2
GPRs[grd] ⇐ result

ORI

Bitwise OR with Immediate.

This instruction is defined in the RV32I instruction set.

Errors

ORI might cause the following software errors:

A CALL_STACK error from using x1 as grs1 when the call stack is empty.
A CALL_STACK error from using x1 as grd when the call stack is full and grs1 is not x1.

Syntax

ORI <grd>, <grs1>, <imm>

Operands

Operand Description

grd

Decode as unsigned(GRD)

grs1

Decode as unsigned(GRS1)

imm

Valid range: -2048 to 2047.

Decode as signed(IMM)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
ORI	IMM												GRS1					1	1	0	GRD					0	0	1	0	0	1	1

Operation

val1 = GPRs[grs1]
val2 = to_2s_complement(imm)
if state.gprs.call_stack_err:
    state.stop_at_end_of_cycle(ErrBits.CALL_STACK)
    return

result = val1 | val2
GPRs[grd] ⇐ result

XOR

Bitwise XOR.

This instruction is defined in the RV32I instruction set.

Errors

XOR might cause the following software errors:

A CALL_STACK error from using x1 as grs1 or grs2 when the call stack is empty.
A CALL_STACK error from using x1 as grd when the call stack is full and neither grs1 nor grs2 is x1.

Syntax

XOR <grd>, <grs1>, <grs2>

Operands

Operand	Description
`grd`	Decode as `unsigned(GRD)`
`grs1`	Decode as `unsigned(GRS1)`
`grs2`	Decode as `unsigned(GRS2)`

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
XOR	0	0	0	0	0	0	0	GRS2					GRS1					1	0	0	GRD					0	1	1	0	0	1	1

Operation

val1 = GPRs[grs1]
val2 = GPRs[grs2]
if state.gprs.call_stack_err:
    state.stop_at_end_of_cycle(ErrBits.CALL_STACK)
    return

result = val1 ^ val2
GPRs[grd] ⇐ result

XORI

Bitwise XOR with Immediate.

This instruction is defined in the RV32I instruction set.

Errors

XORI might cause the following software errors:

A CALL_STACK error from using x1 as grs1 when the call stack is empty.
A CALL_STACK error from using x1 as grd when the call stack is full and grs1 is not x1.

Syntax

XORI <grd>, <grs1>, <imm>

Operands

Operand Description

grd

Decode as unsigned(GRD)

grs1

Decode as unsigned(GRS1)

imm

Valid range: -2048 to 2047.

Decode as signed(IMM)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
XORI	IMM												GRS1					1	0	0	GRD					0	0	1	0	0	1	1

Operation

val1 = GPRs[grs1]
val2 = to_2s_complement(imm)
if state.gprs.call_stack_err:
    state.stop_at_end_of_cycle(ErrBits.CALL_STACK)
    return

result = val1 ^ val2
GPRs[grd] ⇐ result

LW

Load Word. Loads a 32b word from address offset + grs1 in data memory, writing the result to grd. Unaligned loads are not supported. Any address that is unaligned or is above the top of memory will result in an error (setting bit bad_data_addr in ERR_BITS). This instruction takes 2 cycles.

This instruction is defined in the RV32I instruction set.

Errors

LW might cause the following software errors:

A CALL_STACK error from using x1 as grs1 when the call stack is empty.
A BAD_DATA_ADDR error if the computed address is not a valid 4-byte aligned DMEM address.
A CALL_STACK error from using x1 as grd when the call stack is full and grs1 is not x1.

Syntax

LW <grd>, <offset>(<grs1>)

Operands

Operand Description

grd

Decode as unsigned(GRD)

offset

Valid range: -2048 to 2047.

Decode as signed(OFF)

grs1

Decode as unsigned(GRS1)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
LW	OFF												GRS1					0	1	0	GRD					0	0	0	0	0	1	1

Operation

# LW executes over two cycles. On the first cycle, we read the base
# address, compute the load address and check it for correctness, then
# perform the load itself, returning the result.
#
# On the second cycle, we write the result to the destination register.

base = GPRs[grs1]
if state.gprs.call_stack_err:
    state.stop_at_end_of_cycle(ErrBits.CALL_STACK)
    return None

addr = (base + offset) & ((1 << 32) - 1)

if not DMEM.is_valid_32b_addr(addr):
    state.stop_at_end_of_cycle(ErrBits.BAD_DATA_ADDR)
    return None

result = DMEM.load_u32(addr)

# Stall for a single cycle for memory to respond
yield None

if result is None:
    state.stop_at_end_of_cycle(ErrBits.DMEM_INTG_VIOLATION)
    return None

GPRs[grd] ⇐ result
return None

SW

Store Word. Stores a 32b word in grs2 to address offset + grs1 in data memory. Unaligned stores are not supported. Any address that is unaligned or is above the top of memory will result in an error (setting bit bad_data_addr in ERR_BITS).

This instruction is defined in the RV32I instruction set.

Errors

SW might cause the following software errors:

A CALL_STACK error from using x1 as grs1 or grs2 when the call stack is empty.
A BAD_DATA_ADDR error if the computed address is not a valid 4-byte aligned DMEM address.

Syntax

SW <grs2>, <offset>(<grs1>)

Operands

Operand Description

Operand	Description
`grs2`	Decode as `unsigned(GRS2)`
`offset`	Valid range: `-2048` to `2047`. Decode as `signed({OFF_1, OFF_0})`
`grs1`	Decode as `unsigned(GRS1)`

grs2

Decode as unsigned(GRS2)

offset

Valid range: -2048 to 2047.

Decode as signed({OFF_1, OFF_0})

grs1

Decode as unsigned(GRS1)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
SW	OFF_1							GRS2					GRS1					0	1	0	OFF_0					0	1	0	0	0	1	1

Operation

base = GPRs[grs1]
addr = (base + offset) & ((1 << 32) - 1)
value = GPRs[grs2]

bad_grs1 = state.gprs.call_stack_err and (grs1 == 1)

saw_err = False

if state.gprs.call_stack_err:
    state.stop_at_end_of_cycle(ErrBits.CALL_STACK)
    saw_err = True

if not DMEM.is_valid_32b_addr(addr) and not bad_grs1:
    state.stop_at_end_of_cycle(ErrBits.BAD_DATA_ADDR)
    saw_err = True

if saw_err:
    return

DMEM.store_u32(addr, value)

BEQ

Branch Equal.

This instruction is defined in the RV32I instruction set.

Errors

BEQ might cause the following software errors:

A CALL_STACK error from using x1 as grs1 or grs2 when the call stack is empty.
A BAD_INSN_ADDR error if the branch is taken and the computed address is not a valid PC.
A LOOP error if this instruction appears as the last instruction of a loop body.

Syntax

BEQ <grs1>, <grs2>, <offset>

Operands

Operand Description

Operand	Description
`grs1`	Decode as `unsigned(GRS1)`
`grs2`	Decode as `unsigned(GRS2)`
`offset`	Valid range: `-4096` to `4094` in steps of `2`. This is encoded PC-relative but appears as an absolute value in assembly. To write a raw value in an assembly file, write something in the range `.-4096` to `.+4094`. Decode as `PC + signed({OFF_3, OFF_2, OFF_1, OFF_0, 1'b0})`

grs1

Decode as unsigned(GRS1)

grs2

Decode as unsigned(GRS2)

offset

Valid range: -4096 to 4094 in steps of 2. This is encoded PC-relative but appears as an absolute value in assembly. To write a raw value in an assembly file, write something in the range .-4096 to .+4094.

Decode as PC + signed({OFF_3, OFF_2, OFF_1, OFF_0, 1'b0})

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BEQ	OFF_3	OFF_1						GRS2					GRS1					0	0	0	OFF_0				OFF_2	1	1	0	0	0	1	1

Operation

val1 = GPRs[grs1]
val2 = GPRs[grs2]
if state.gprs.call_stack_err:
    state.stop_at_end_of_cycle(ErrBits.CALL_STACK)
    return

tgt_pc = offset & ((1 << 32) - 1)
if val1 == val2:
    if not state.is_pc_valid(tgt_pc):
        state.stop_at_end_of_cycle(ErrBits.BAD_INSN_ADDR)
    else:
        PC ⇐ tgt_pc

BNE

Branch Not Equal.

This instruction is defined in the RV32I instruction set.

Errors

BNE might cause the following software errors:

A CALL_STACK error from using x1 as grs1 or grs2 when the call stack is empty.
A BAD_INSN_ADDR error if the branch is taken and the computed address is not a valid PC.
A LOOP error if this instruction appears as the last instruction of a loop body.

Syntax

BNE <grs1>, <grs2>, <offset>

Operands

Operand Description

Operand	Description
`grs1`	Decode as `unsigned(GRS1)`
`grs2`	Decode as `unsigned(GRS2)`
`offset`	Valid range: `-4096` to `4094` in steps of `2`. This is encoded PC-relative but appears as an absolute value in assembly. To write a raw value in an assembly file, write something in the range `.-4096` to `.+4094`. Decode as `PC + signed({OFF_3, OFF_2, OFF_1, OFF_0, 1'b0})`

grs1

Decode as unsigned(GRS1)

grs2

Decode as unsigned(GRS2)

offset

Decode as PC + signed({OFF_3, OFF_2, OFF_1, OFF_0, 1'b0})

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BNE	OFF_3	OFF_1						GRS2					GRS1					0	0	1	OFF_0				OFF_2	1	1	0	0	0	1	1

Operation

val1 = GPRs[grs1]
val2 = GPRs[grs2]

if state.gprs.call_stack_err:
    state.stop_at_end_of_cycle(ErrBits.CALL_STACK)
    return

tgt_pc = offset & ((1 << 32) - 1)
if val1 != val2:
    if not state.is_pc_valid(tgt_pc):
        state.stop_at_end_of_cycle(ErrBits.BAD_INSN_ADDR)
    else:
        PC ⇐ tgt_pc

JAL

Jump And Link. The JAL instruction has the same behavior as in RV32I, jumping by the given offset and writing PC+4 as a link address to the destination register.

OTBN has a hardware managed call stack, accessed through x1, which should be used when calling subroutines. Do so by using x1 as the link register: jal x1, <offset>.

This instruction is defined in the RV32I instruction set.

Errors

JAL might cause the following software errors:

A CALL_STACK error from using x1 as grd when the call stack is full.
A BAD_INSN_ADDR error if the computed address is not a valid PC.
A LOOP error if this instruction appears as the last instruction of a loop body.

Syntax

JAL <grd>, <offset>

Operands

Operand Description

grd

Decode as unsigned(GRD)

offset

Valid range: -1048576 to 1048574 in steps of 2. This is encoded PC-relative but appears as an absolute value in assembly. To write a raw value in an assembly file, write something in the range .-1048576 to .+1048574.

Decode as PC + signed({OFF_3, OFF_2, OFF_1, OFF_0, 1'b0})

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
JAL	OFF_3	OFF_0										OFF_1	OFF_2								GRD					1	1	0	1	1	1	1

Operation

mask32 = ((1 << 32) - 1)
link_pc = (state.pc + 4) & mask32
GPRs[grd] ⇐ link_pc

next_pc = offset & mask32
if not state.is_pc_valid(next_pc):
    state.stop_at_end_of_cycle(ErrBits.BAD_INSN_ADDR)
else:
    PC ⇐ next_pc

JALR

Jump And Link Register. The JALR instruction has the same behavior as in RV32I, jumping by <grs1> + <offset> and writing PC+4 as a link address to the destination register.

OTBN has a hardware managed call stack, accessed through x1, which should be used when calling and returning from subroutines. To return from a subroutine, use jalr x0, x1, 0. This pops a link address from the call stack and branches to it. To call a subroutine through a function pointer, use jalr x1, <grs1>, 0. This jumps to the address in <grs1> and pushes the link address onto the call stack.

This instruction is defined in the RV32I instruction set.

Errors

JALR might cause the following software errors:

A CALL_STACK error from using x1 as grs1 when the call stack is empty.
A CALL_STACK error from using x1 as grd when the call stack is full and grs1 is not x1.
A BAD_INSN_ADDR error if the computed address is not a valid PC.
A LOOP error if this instruction appears as the last instruction of a loop body.

Syntax

JALR <grd>, <grs1>, <offset>

Operands

Operand Description

grd

Decode as unsigned(GRD)

grs1

Decode as unsigned(GRS1)

offset

Valid range: -2048 to 2047.

Decode as signed(OFFSET)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
JALR	OFFSET												GRS1					0	0	0	GRD					1	1	0	0	1	1	1

Operation

val1 = GPRs[grs1]
if state.gprs.call_stack_err:
    state.stop_at_end_of_cycle(ErrBits.CALL_STACK)
    return

mask32 = ((1 << 32) - 1)
link_pc = (state.pc + 4) & mask32

GPRs[grd] ⇐ link_pc

next_pc = (val1 + offset) & mask32
if not state.is_pc_valid(next_pc):
    state.stop_at_end_of_cycle(ErrBits.BAD_INSN_ADDR)
else:
    PC ⇐ next_pc

CSRRS

Atomic Read and Set bits in CSR. Reads the value of the CSR csr, and writes it to the destination GPR grd. The initial value in grs1 is treated as a bit mask that specifies bits to be set in the CSR. Any bit that is high in grs1 will cause the corresponding bit to be set in the CSR, if that CSR bit is writable. Other bits in the CSR are unaffected (though CSRs might have side effects when written).

If csr isn’t the index of a valid CSR, this results in an error (setting bit illegal_insn in ERR_BITS).

This instruction is defined in the RV32I instruction set.

Errors

CSRRS might cause the following software errors:

A CALL_STACK error from using x1 as grs1 when the call stack is empty.
An ILLEGAL_INSN error if csr doesn’t name a valid CSR.

Syntax

CSRRS <grd>, <csr>, <grs1>

Operands

Operand Description

grd

Decode as unsigned(GRD)

csr

Valid range: 0 to 4095.

Decode as unsigned(CSR)

grs1

Decode as unsigned(GRS1)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
CSRRS	CSR												GRS1					0	1	0	GRD					1	1	1	0	0	1	1

Operation

if not state.csrs.check_idx(csr):
    # Invalid CSR index. Stop with an illegal instruction error.
    state.stop_at_end_of_cycle(ErrBits.ILLEGAL_INSN)
    return None

bits_to_set = GPRs[grs1]
if state.gprs.call_stack_err:
    state.stop_at_end_of_cycle(ErrBits.CALL_STACK)
    return None

if csr == 0xfc0:
    # A read from RND. If a RND value is not available, request_value()
    # initiates or continues an EDN request and returns False. If a RND
    # value is available, it returns True.
    while not state.wsrs.RND.request_value():
        # There's a pending EDN request. Stall for a cycle.
        yield None

# At this point, the CSR is ready. Read, update and write back to grs1.
old_val = CSRs[csr]
new_val = old_val | bits_to_set
GPRs[grd] ⇐ old_val
if grs1 != 0:
    CSRs[csr] ⇐ new_val

return None

CSRRW

Atomic Read/Write CSR. Atomically swaps values in the CSR csr with the value in the GPR grs1. Reads the old value of the CSR, and writes it to the GPR grd. Writes the initial value in grs1 to the CSR csr. If grd == x0 the instruction does not read the CSR or cause any read-related side-effects.

If csr isn’t the index of a valid CSR, this results in an error (setting bit illegal_insn in ERR_BITS).

This instruction is defined in the RV32I instruction set.

Errors

CSRRW might cause the following software errors:

A CALL_STACK error from using x1 as grs1 when the call stack is empty.
A CALL_STACK error from using x1 as grd when the call stack is full and grs1 is not x1.
An ILLEGAL_INSN error if csr doesn’t name a valid CSR.

Syntax

CSRRW <grd>, <csr>, <grs1>

Operands

Operand Description

grd

Decode as unsigned(GRD)

csr

Valid range: 0 to 4095.

Decode as unsigned(CSR)

grs1

Decode as unsigned(GRS1)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
CSRRW	CSR												GRS1					0	0	1	GRD					1	1	1	0	0	1	1

Operation

if not state.csrs.check_idx(csr):
    # Invalid CSR index. Stop with an illegal instruction error.
    state.stop_at_end_of_cycle(ErrBits.ILLEGAL_INSN)
    return None

new_val = GPRs[grs1]
if state.gprs.call_stack_err:
    state.stop_at_end_of_cycle(ErrBits.CALL_STACK)
    return None

if csr == 0xfc0 and grd != 0:
    # A read from RND. If a RND value is not available, request_value()
    # initiates or continues an EDN request and returns False. If a RND
    # value is available, it returns True.
    while not state.wsrs.RND.request_value():
        # There's a pending EDN request. Stall for a cycle.
        yield None

# At this point, the CSR is either ready or unneeded. Read it if
# necessary and write to grd, then overwrite with new_val.

if grd != 0:
    old_val = CSRs[csr]
    GPRs[grd] ⇐ old_val

CSRs[csr] ⇐ new_val
return None

ECALL

Environment Call. Triggers the done interrupt to indicate completion of the operation.

This instruction is defined in the RV32I instruction set.

Errors

ECALL cannot cause any software errors.

Syntax

ECALL

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
ECALL	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	0	0	1	1

Operation

# Set INTR_STATE.done and STATUS, reflecting the fact we've stopped.
state.stop_at_end_of_cycle(err_bits=0)

LOOP

Loop (indirect). Repeats a sequence of code multiple times. The number of iterations is read from grs, treated as an unsigned value. The number of instructions in the loop is given in the bodysize immediate.

The LOOP instruction doesn’t support a zero iteration count. If the value in grs is zero, OTBN stops, setting bit loop in ERR_BITS. Starting a loop pushes an entry on to the loop stack. If the stack is already full, OTBN stops, setting bit loop in ERR_BITS.

LOOP, LOOPI, jump and branch instructions are all permitted inside a loop but may not appear as the last instruction in a loop. OTBN will stop on that instruction, setting bit loop in ERR_BITS.

For more information on how to correctly use LOOP see loop nesting.

Errors

LOOP might cause the following software errors:

A CALL_STACK error from using x1 as grs when the call stack is empty.
A LOOP error if the value in grs is zero.
A LOOP error if this instruction appears as the last instruction of a loop body.

Syntax

LOOP <grs>, <bodysize>

Operands

Operand Description

Operand	Description
`grs`	Name of the GPR containing the number of iterations Decode as `unsigned(GRS)`
`bodysize`	Number of instructions in the loop body Valid range: `1` to `4096`. Decode as `unsigned(SZ) + 1`

grs

Name of the GPR containing the number of iterations

Decode as unsigned(GRS)

bodysize

Number of instructions in the loop body

Valid range: 1 to 4096.

Decode as unsigned(SZ) + 1

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
LOOP	SZ												GRS					0	0	0						1	1	1	1	0	1	1

Operation

num_iters = GPRs[grs]
if state.gprs.call_stack_err:
    state.stop_at_end_of_cycle(ErrBits.CALL_STACK)
    return

if num_iters == 0:
    state.stop_at_end_of_cycle(ErrBits.LOOP)
else:
    state.loop_start(num_iters, bodysize)

LOOPI

Loop Immediate. Repeats a sequence of code multiple times. The number of iterations is given in the iterations immediate. The number of instructions in the loop is given in the bodysize immediate.

The LOOPI instruction doesn’t support a zero iteration count. If the value of iterations is zero, OTBN stops with the ErrCodeLoop error. Starting a loop pushes an entry on to the loop stack. If the stack is already full, OTBN stops, setting bit loop in ERR_BITS.

For more information on how to correctly use LOOPI see loop nesting.

Errors

LOOPI might cause the following software errors:

A LOOP error if iterations is zero.
A LOOP error if this instruction appears as the last instruction of a loop body.

Syntax

LOOPI <iterations>, <bodysize>

Operands

Operand Description

iterations

Number of iterations

Valid range: 0 to 1023.

Decode as unsigned({ITERATIONS_1, ITERATIONS_0})

bodysize

Number of instructions in the loop body

Valid range: 1 to 4096.

Decode as unsigned(SZ) + 1

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
LOOPI	SZ												ITERATIONS_1					0	0	1	ITERATIONS_0					1	1	1	1	0	1	1

Operation

if iterations == 0:
    state.stop_at_end_of_cycle(ErrBits.LOOP)
else:
    state.loop_start(iterations, bodysize)

NOP

No Operation. A pseudo-operation that has no effect.

This instruction is defined in the RV32I instruction set.

Syntax

NOP

This instruction is a pseudo-operation and expands to the following instruction sequence:

ADDI x0, x0, 0

LI

Load Immediate. Loads a 32b signed immediate value into a GPR. This uses ADDI and LUI, expanding to one or two instructions, depending on the immediate (small non-negative immediates or immediates with all lower bits zero can be loaded with just ADDI or LUI, respectively; general immediates need a LUI followed by an ADDI).

This instruction is defined in the RV32I instruction set.

Syntax

LI <grd>, <imm>

LA

Load absolute address. Loads an address given by a symbol into a GPR. This is represented as a LUI and an ADDI.

This instruction is defined in the RV32I instruction set.

Syntax

LA <grd>, <imm>

RET

Return from subroutine.

This instruction is defined in the RV32I instruction set.

Syntax

RET

This instruction is a pseudo-operation and expands to the following instruction sequence:

JALR x0, x1, 0

UNIMP

Illegal instruction. Triggers an illegal instruction error and aborts the program execution. Commonly used in code which is meant to be unreachable.

This instruction is defined in the RV32I instruction set.

Syntax

UNIMP

This instruction is a pseudo-operation and expands to the following instruction sequence:

CSRRW x0, 0xC00, x0

Big Number Instruction Subset

All Big Number (BN) instructions operate on the Wide Data Registers (WDRs).

BN.ADD

Add. Adds two WDR values, writes the result to the destination WDR and updates flags. The content of the second source WDR can be shifted by an unsigned immediate before it is consumed by the operation.

Errors

BN.ADD cannot cause any software errors.

Syntax

BN.ADD <wrd>, <wrs1>, <wrs2>[ <shift_type> <shift_bits>][, FG<flag_group>]

Operands

Operand Description

wrd

Name of the destination WDR

Decode as unsigned(WRD)

wrs1

Name of the first source WDR

Decode as unsigned(WRS1)

wrs2

Name of the second source WDR

Decode as unsigned(WRS2)

shift_type

The direction of an optional shift applied to <wrs2>.

Assembly Syntax	Value
`<<`	`0`
`>>`	`1`

Decode as unsigned(ST)

shift_bits

Number of bits by which to shift <wrs2>. Defaults to 0.

Valid range: 0 to 248 in steps of 8.

Decode as unsigned({SB, 3'b0})

flag_group

Flag group to use. Defaults to 0.

Valid range: 0 to 1.

Decode as unsigned(FG)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BN.ADD	FG	ST	SB					WRS2					WRS1					0	0	0	WRD					0	1	0	1	0	1	1

Operation

In the listing below, operand shift_type is referred to by its integer value. The operand table above shows how this corresponds to assembly syntax.

a = WDRs[wrs1]
b = WDRs[wrs2]
b_shifted = logical_byte_shift(b, shift_type, shift_bytes)

full_result = a + b_shifted
mask256 = (1 << 256) - 1
masked_result = full_result & mask256
carry_flag = bool((full_result >> 256) & 1)
flags = FlagReg.mlz_for_result(carry_flag, masked_result)

WDRs[wrd] ⇐ masked_result
FLAGs[flag_group] ⇐ flags

BN.ADDC

Add with Carry. Adds two WDR values and the Carry flag value, writes the result to the destination WDR, and updates the flags. The content of the second source WDR can be shifted by an unsigned immediate before it is consumed by the operation.

Errors

BN.ADDC cannot cause any software errors.

Syntax

BN.ADDC <wrd>, <wrs1>, <wrs2>[ <shift_type> <shift_bits>][, FG<flag_group>]

Operands

Operand Description

wrd

Name of the destination WDR

Decode as unsigned(WRD)

wrs1

Name of the first source WDR

Decode as unsigned(WRS1)

wrs2

Name of the second source WDR

Decode as unsigned(WRS2)

shift_type

The direction of an optional shift applied to <wrs2>.

Assembly Syntax	Value
`<<`	`0`
`>>`	`1`

Decode as unsigned(ST)

shift_bits

Number of bits by which to shift <wrs2>. Defaults to 0.

Valid range: 0 to 248 in steps of 8.

Decode as unsigned({SB, 3'b0})

flag_group

Flag group to use. Defaults to 0.

Valid range: 0 to 1.

Decode as unsigned(FG)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BN.ADDC	FG	ST	SB					WRS2					WRS1					0	1	0	WRD					0	1	0	1	0	1	1

Operation

In the listing below, operand shift_type is referred to by its integer value. The operand table above shows how this corresponds to assembly syntax.

a = WDRs[wrs1]
b = WDRs[wrs2]
b_shifted = logical_byte_shift(b, shift_type, shift_bytes)

carry = int(FLAGs[flag_group].C)
full_result = a + b_shifted + carry
mask256 = (1 << 256) - 1
masked_result = full_result & mask256
carry_flag = bool((full_result >> 256) & 1)
flags = FlagReg.mlz_for_result(carry_flag, masked_result)

WDRs[wrd] ⇐ masked_result
FLAGs[flag_group] ⇐ flags

BN.ADDI

Add Immediate. Adds a zero-extended unsigned immediate to the value of a WDR, writes the result to the destination WDR, and updates the flags.

Errors

BN.ADDI cannot cause any software errors.

Syntax

BN.ADDI <wrd>, <wrs>, <imm>[, FG<flag_group>]

Operands

Operand	Description
`wrd`	Name of the destination WDR Decode as `unsigned(WRD)`
`wrs`	Name of the source WDR Decode as `unsigned(WRS)`
`imm`	Immediate value Valid range: `0` to `1023`. Decode as `unsigned(IMM)`
`flag_group`	Flag group to use. Defaults to 0. Valid range: `0` to `1`. Decode as `unsigned(FG)`

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BN.ADDI	FG	0	IMM										WRS					1	0	0	WRD					0	1	0	1	0	1	1

Operation

a = WDRs[wrs]
b = imm

full_result = a + b
mask256 = (1 << 256) - 1
masked_result = full_result & mask256
carry_flag = bool((full_result >> 256) & 1)
flags = FlagReg.mlz_for_result(carry_flag, masked_result)

WDRs[wrd] ⇐ masked_result
FLAGs[flag_group] ⇐ flags

BN.ADDM

Pseudo-Modulo Add. Add two WDR values, modulo the MOD WSR.

The values in <wrs1> and <wrs2> are summed to get an intermediate result (of width WLEN + 1). If this result is greater than MOD then MOD is subtracted from it. The result is then truncated to 256 bits and stored in <wrd>.

This operation correctly implements addition modulo MOD, providing that the intermediate result is less than 2 * MOD. The intermediate result is small enough if both inputs are less than MOD.

Flags are not used or saved.

Errors

BN.ADDM cannot cause any software errors.

Syntax

BN.ADDM <wrd>, <wrs1>, <wrs2>

Operands

Operand	Description
`wrd`	Decode as `unsigned(WRD)`
`wrs1`	Decode as `unsigned(WRS1)`
`wrs2`	Decode as `unsigned(WRS2)`

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BN.ADDM		0						WRS2					WRS1					1	0	1	WRD					0	1	0	1	0	1	1

Operation

a = WDRs[wrs1]
b = WDRs[wrs2]
result = a + b

mod_val = MOD
if result >= mod_val:
    result -= mod_val

result = result & ((1 << 256) - 1)
WDRs[wrd] ⇐ result

BN.MULQACC

Quarter-word Multiply and Accumulate. Multiplies two WLEN/4 WDR values, shifts the product by acc_shift_imm bits, and adds the result to the accumulator.

For versions of the instruction with writeback, see BN.MULQACC.WO and BN.MULQACC.SO.

Errors

BN.MULQACC cannot cause any software errors.

Syntax

BN.MULQACC[<zero_acc>] <wrs1>.<wrs1_qwsel>, <wrs2>.<wrs2_qwsel>, <acc_shift_imm>

Operands

Operand	Description
`zero_acc`	Zero the accumulator before accumulating the multiply result. To specify, use the literal syntax `.z` Decode as `unsigned(ZA)`
`wrs1`	First source WDR Decode as `unsigned(WRS1)`
`wrs1_qwsel`	Quarter-word select for `<wrs1>`. Valid values: `0`: Select `wrs1[WLEN/4-1:0]` (least significant quarter-word) `1`: Select `wrs1[WLEN/2:WLEN/4]` `2`: Select `wrs1[WLEN/43-1:WLEN/2]` `3`: Select `wrs1[WLEN-1:WLEN/43]` (most significant quarter-word) Valid range: `0` to `3`. Decode as `unsigned(Q1)`
`wrs2`	Second source WDR Decode as `unsigned(WRS2)`
`wrs2_qwsel`	Quarter-word select for `<wrs2>`. Valid values: `0`: Select `wrs1[WLEN/4-1:0]` (least significant quarter-word) `1`: Select `wrs1[WLEN/2:WLEN/4]` `2`: Select `wrs1[WLEN/43-1:WLEN/2]` `3`: Select `wrs1[WLEN-1:WLEN/43]` (most significant quarter-word) Valid range: `0` to `3`. Decode as `unsigned(Q2)`
`acc_shift_imm`	The number of bits to shift the `WLEN/2`-bit multiply result before accumulating. Valid range: `0` to `192` in steps of `64`. Decode as `unsigned({SHIFT, 6'b0})`

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BN.MULQACC		0	0	Q2		Q1		WRS2					WRS1					SHIFT		ZA						0	1	1	1	0	1	1

Operation

In the listing below, operand zero_acc is referred to by its integer value. The operand table above shows how this corresponds to assembly syntax.

a = WDRs[wrs1]
b = WDRs[wrs2]

a_qw = extract_quarter_word(a, wrs1_qwsel)
b_qw = extract_quarter_word(b, wrs2_qwsel)

mul_res = a_qw * b_qw

acc = ACC
if zero_acc:
    acc = 0

acc += (mul_res << acc_shift_imm)

truncated = acc & ((1 << 256) - 1)
ACC ⇐ truncated

BN.MULQACC.WO

Quarter-word Multiply and Accumulate with full-word writeback. Multiplies two WLEN/4 WDR values, shifts the product by acc_shift_imm bits, and adds the result to the accumulator. Writes the resulting accumulator to wrd.

Errors

BN.MULQACC.WO cannot cause any software errors.

Syntax

BN.MULQACC.WO[<zero_acc>] <wrd>, <wrs1>.<wrs1_qwsel>, <wrs2>.<wrs2_qwsel>, <acc_shift_imm>[, FG<flag_group>]

Operands

Operand	Description
`zero_acc`	Zero the accumulator before accumulating the multiply result. To specify, use the literal syntax `.z` Decode as `unsigned(ZA)`
`wrd`	Destination WDR. Decode as `unsigned(WRD)`
`wrs1`	First source WDR Decode as `unsigned(WRS1)`
`wrs1_qwsel`	Quarter-word select for `<wrs1>`. Valid values: `0`: Select `wrs1[WLEN/4-1:0]` (least significant quarter-word) `1`: Select `wrs1[WLEN/2:WLEN/4]` `2`: Select `wrs1[WLEN/43-1:WLEN/2]` `3`: Select `wrs1[WLEN-1:WLEN/43]` (most significant quarter-word) Valid range: `0` to `3`. Decode as `unsigned(Q1)`
`wrs2`	Second source WDR Decode as `unsigned(WRS2)`
`wrs2_qwsel`	Quarter-word select for `<wrs2>`. Valid values: `0`: Select `wrs1[WLEN/4-1:0]` (least significant quarter-word) `1`: Select `wrs1[WLEN/2:WLEN/4]` `2`: Select `wrs1[WLEN/43-1:WLEN/2]` `3`: Select `wrs1[WLEN-1:WLEN/43]` (most significant quarter-word) Valid range: `0` to `3`. Decode as `unsigned(Q2)`
`acc_shift_imm`	The number of bits to shift the `WLEN/2`-bit multiply result before accumulating. Valid range: `0` to `192` in steps of `64`. Decode as `unsigned({SHIFT, 6'b0})`
`flag_group`	Flag group to use. Defaults to 0. Valid range: `0` to `1`. Decode as `unsigned(FG)`

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BN.MULQACC.WO	FG	0	1	Q2		Q1		WRS2					WRS1					SHIFT		ZA	WRD					0	1	1	1	0	1	1

Operation

In the listing below, operand zero_acc is referred to by its integer value. The operand table above shows how this corresponds to assembly syntax.

a = WDRs[wrs1]
b = WDRs[wrs2]

a_qw = extract_quarter_word(a, wrs1_qwsel)
b_qw = extract_quarter_word(b, wrs2_qwsel)

mul_res = a_qw * b_qw

acc = ACC
if zero_acc:
    acc = 0

acc += (mul_res << acc_shift_imm)

truncated = acc & ((1 << 256) - 1)
WDRs[wrd] ⇐ truncated
ACC ⇐ truncated
state.set_mlz_flags(flag_group, truncated)

BN.MULQACC.SO

Quarter-word Multiply and Accumulate with half-word writeback. Multiplies two WLEN/4 WDR values, shifts the product by acc_shift_imm bits and adds the result to the accumulator. Next, shifts the resulting accumulator right by half a word (128 bits). The bits that are shifted out are written to a half-word of wrd, selected with wrd_hwsel.

This instruction never changes the C flag. If wrd_hwsel is zero (so the instruction is updating the lower half-word of wrd), it updates the L and Z flags and leaves M unchanged. The L flag is set iff the bottom bit of the shifted-out result is zero. The Z flag is set iff the shifted-out result is zero.

If wrd_hwsel is one (so the instruction is updating the upper half-word of wrd), it updates the M and Z flags and leaves L unchanged. The M flag is set iff the top bit of the shifted-out result is zero. The Z flag is left unchanged if the shifted-out result is zero and cleared if not.

Errors

BN.MULQACC.SO cannot cause any software errors.

Syntax

BN.MULQACC.SO[<zero_acc>] <wrd>.<wrd_hwsel>, <wrs1>.<wrs1_qwsel>, <wrs2>.<wrs2_qwsel>, <acc_shift_imm>[, FG<flag_group>]

Operands

Operand Description

zero_acc

Zero the accumulator before accumulating the multiply result.

To specify, use the literal syntax .z

Decode as unsigned(ZA)

wrd

Updated WDR.

Decode as unsigned(WRD)

wrd_hwsel

Half-word select for <wrd>. A value of L means the less significant half-word; U means the more significant half-word.

Assembly Syntax	Value
`l`	`0`
`u`	`1`

Decode as unsigned(DH)

wrs1

First source WDR

Decode as unsigned(WRS1)

wrs1_qwsel

Quarter-word select for <wrs1>.

Valid values:

0: Select wrs1[WLEN/4-1:0] (least significant quarter-word)
1: Select wrs1[WLEN/2:WLEN/4]
2: Select wrs1[WLEN/4*3-1:WLEN/2]
3: Select wrs1[WLEN-1:WLEN/4*3] (most significant quarter-word)

Valid range: 0 to 3.

Decode as unsigned(Q1)

wrs2

Second source WDR

Decode as unsigned(WRS2)

wrs2_qwsel

Quarter-word select for <wrs2>.

Valid values:

0: Select wrs1[WLEN/4-1:0] (least significant quarter-word)
1: Select wrs1[WLEN/2:WLEN/4]
2: Select wrs1[WLEN/4*3-1:WLEN/2]
3: Select wrs1[WLEN-1:WLEN/4*3] (most significant quarter-word)

Valid range: 0 to 3.

Decode as unsigned(Q2)

acc_shift_imm

The number of bits to shift the WLEN/2-bit multiply result before accumulating.

Valid range: 0 to 192 in steps of 64.

Decode as unsigned({SHIFT, 6'b0})

flag_group

Flag group to use. Defaults to 0.

Valid range: 0 to 1.

Decode as unsigned(FG)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BN.MULQACC.SO	FG	1	DH	Q2		Q1		WRS2					WRS1					SHIFT		ZA	WRD					0	1	1	1	0	1	1

Operation

In the listing below, operands zero_acc and wrd_hwsel are referred to by their integer value. The operand table above shows how this corresponds to assembly syntax.

a = WDRs[wrs1]
b = WDRs[wrs2]

a_qw = extract_quarter_word(a, wrs1_qwsel)
b_qw = extract_quarter_word(b, wrs2_qwsel)

mul_res = a_qw * b_qw

acc = ACC
if zero_acc:
    acc = 0

acc += (mul_res << acc_shift_imm)
truncated = acc & ((1 << 256) - 1)

# Split the result into low and high parts
lo_part = truncated & ((1 << 128) - 1)
hi_part = truncated >> 128

# Shift out the low part of the result
hw_shift = 128 * wrd_hwsel
hw_mask = ((1 << 128) - 1) << hw_shift
old_wrd = WDRs[wrd]
new_wrd = (old_wrd & ~hw_mask) | (lo_part << hw_shift)
WDRs[wrd] ⇐ new_wrd

# Write back the high part of the result
ACC ⇐ hi_part

old_flags = FLAGs[flag_group]
if wrd_hwsel:
    new_flags = FlagReg(C=old_flags.C,
                        M=bool((lo_part >> 127) & 1),
                        L=old_flags.L,
                        Z=old_flags.Z and lo_part == 0)
else:
    new_flags = FlagReg(C=old_flags.C,
                        M=old_flags.M,
                        L=bool(lo_part & 1),
                        Z=lo_part == 0)
FLAGs[flag_group] ⇐ new_flags

BN.SUB

Subtraction. Subtracts the second WDR value from the first one, writes the result to the destination WDR and updates flags. The content of the second source WDR can be shifted by an unsigned immediate before it is consumed by the operation.

Errors

BN.SUB cannot cause any software errors.

Syntax

BN.SUB <wrd>, <wrs1>, <wrs2>[ <shift_type> <shift_bits>][, FG<flag_group>]

Operands

Operand Description

wrd

Name of the destination WDR

Decode as unsigned(WRD)

wrs1

Name of the first source WDR

Decode as unsigned(WRS1)

wrs2

Name of the second source WDR

Decode as unsigned(WRS2)

shift_type

The direction of an optional shift applied to <wrs2>.

Assembly Syntax	Value
`<<`	`0`
`>>`	`1`

Decode as unsigned(ST)

shift_bits

Number of bits by which to shift <wrs2>. Defaults to 0.

Valid range: 0 to 248 in steps of 8.

Decode as unsigned({SB, 3'b0})

flag_group

Flag group to use. Defaults to 0.

Valid range: 0 to 1.

Decode as unsigned(FG)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BN.SUB	FG	ST	SB					WRS2					WRS1					0	0	1	WRD					0	1	0	1	0	1	1

Operation

In the listing below, operand shift_type is referred to by its integer value. The operand table above shows how this corresponds to assembly syntax.

a = WDRs[wrs1]
b = WDRs[wrs2]
b_shifted = logical_byte_shift(b, shift_type, shift_bytes)

full_result = a - b_shifted
mask256 = (1 << 256) - 1
masked_result = full_result & mask256
carry_flag = bool((full_result >> 256) & 1)
flags = FlagReg.mlz_for_result(carry_flag, masked_result)

WDRs[wrd] ⇐ masked_result
FLAGs[flag_group] ⇐ flags

BN.SUBB

Subtract with borrow. Subtracts the second WDR value and the Carry from the first one, writes the result to the destination WDR, and updates the flags. The content of the second source WDR can be shifted by an unsigned immediate before it is consumed by the operation.

Syntax

BN.SUBB <wrd>, <wrs1>, <wrs2>[ <shift_type> <shift_bits>][, FG<flag_group>]

Operands

Operand Description

wrd

Name of the destination WDR

Decode as unsigned(WRD)

wrs1

Name of the first source WDR

Decode as unsigned(WRS1)

wrs2

Name of the second source WDR

Decode as unsigned(WRS2)

shift_type

The direction of an optional shift applied to <wrs2>.

Assembly Syntax	Value
`<<`	`0`
`>>`	`1`

Decode as unsigned(ST)

shift_bits

Number of bits by which to shift <wrs2>. Defaults to 0.

Valid range: 0 to 248 in steps of 8.

Decode as unsigned({SB, 3'b0})

flag_group

Flag group to use. Defaults to 0.

Valid range: 0 to 1.

Decode as unsigned(FG)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BN.SUBB	FG	ST	SB					WRS2					WRS1					0	1	1	WRD					0	1	0	1	0	1	1

Operation

In the listing below, operand shift_type is referred to by its integer value. The operand table above shows how this corresponds to assembly syntax.

a = WDRs[wrs1]
b = WDRs[wrs2]
b_shifted = logical_byte_shift(b, shift_type, shift_bytes)
borrow = int(FLAGs[flag_group].C)

full_result = a - b_shifted - borrow
mask256 = (1 << 256) - 1
masked_result = full_result & mask256
carry_flag = bool((full_result >> 256) & 1)
flags = FlagReg.mlz_for_result(carry_flag, masked_result)

WDRs[wrd] ⇐ masked_result
FLAGs[flag_group] ⇐ flags

BN.SUBI

Subtract Immediate. Subtracts a zero-extended unsigned immediate from the value of a WDR, writes the result to the destination WDR, and updates the flags.

Errors

BN.SUBI cannot cause any software errors.

Syntax

BN.SUBI <wrd>, <wrs>, <imm>[, FG<flag_group>]

Operands

Operand	Description
`wrd`	Name of the destination WDR Decode as `unsigned(WRD)`
`wrs`	Name of the source WDR Decode as `unsigned(WRS)`
`imm`	Immediate value Valid range: `0` to `1023`. Decode as `unsigned(IMM)`
`flag_group`	Flag group to use. Defaults to 0. Valid range: `0` to `1`. Decode as `unsigned(FG)`

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BN.SUBI	FG	1	IMM										WRS					1	0	0	WRD					0	1	0	1	0	1	1

Operation

a = WDRs[wrs]
b = imm

full_result = a - b
mask256 = (1 << 256) - 1
masked_result = full_result & mask256
carry_flag = bool((full_result >> 256) & 1)
flags = FlagReg.mlz_for_result(carry_flag, masked_result)

WDRs[wrd] ⇐ masked_result
FLAGs[flag_group] ⇐ flags

BN.SUBM

Pseudo-modulo subtraction. Subtract <wrs2> from <wrs1>, modulo the MOD WSR.

The intermediate result is treated as a signed number (of width WLEN + 1). If it is negative, MOD is added to it. The 2’s-complement result is then truncated to 256 bits and stored in <wrd>.

This operation correctly implements subtraction modulo MOD, providing that the intermediate result at least -MOD and at most MOD - 1. This is guaranteed if both inputs are less than MOD.

Flags are not used or saved.

Errors

BN.SUBM cannot cause any software errors.

Syntax

BN.SUBM <wrd>, <wrs1>, <wrs2>

Operands

Operand	Description
`wrd`	Decode as `unsigned(WRD)`
`wrs1`	Decode as `unsigned(WRS1)`
`wrs2`	Decode as `unsigned(WRS2)`

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BN.SUBM		1						WRS2					WRS1					1	0	1	WRD					0	1	0	1	0	1	1

Operation

a = WDRs[wrs1]
b = WDRs[wrs2]

mod_val = MOD

diff = a - b
if diff < 0:
    diff += mod_val

result = diff & ((1 << 256) - 1)
WDRs[wrd] ⇐ result

BN.AND

Bitwise AND. Performs a bitwise and operation. Takes the values stored in registers referenced by wrs1 and wrs2 and stores the result in the register referenced by wrd. The content of the second source register can be shifted by an immediate before it is consumed by the operation. The M, L and Z flags in flag group flag_group are updated with the result of the operation.

Errors

BN.AND cannot cause any software errors.

Syntax

BN.AND <wrd>, <wrs1>, <wrs2>[ <shift_type> <shift_bits>][, FG<flag_group>]

Operands

Operand Description

wrd

Name of the destination WDR

Decode as unsigned(WRD)

wrs1

Name of the first source WDR

Decode as unsigned(WRS1)

wrs2

Name of the second source WDR

Decode as unsigned(WRS2)

shift_type

The direction of an optional shift applied to <wrs2>.

Assembly Syntax	Value
`<<`	`0`
`>>`	`1`

Decode as unsigned(ST)

shift_bits

Number of bits by which to shift <wrs2>. Defaults to 0.

Valid range: 0 to 248 in steps of 8.

Decode as unsigned({SB, 3'b0})

flag_group

Flag group to use. Defaults to 0.

Valid range: 0 to 1.

Decode as unsigned(FG)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BN.AND	FG	ST	SB					WRS2					WRS1					0	1	0	WRD					1	1	1	1	0	1	1

Operation

In the listing below, operand shift_type is referred to by its integer value. The operand table above shows how this corresponds to assembly syntax.

a = WDRs[wrs1]
b = WDRs[wrs2]
b_shifted = logical_byte_shift(b, shift_type, shift_bytes)

result = a & b_shifted
WDRs[wrd] ⇐ result
state.set_mlz_flags(flag_group, result)

BN.OR

Bitwise OR. Performs a bitwise or operation. Takes the values stored in WDRs referenced by wrs1 and wrs2 and stores the result in the WDR referenced by wrd. The content of the second source WDR can be shifted by an immediate before it is consumed by the operation. The M, L and Z flags in flag group flag_group are updated with the result of the operation.

Errors

BN.OR cannot cause any software errors.

Syntax

BN.OR <wrd>, <wrs1>, <wrs2>[ <shift_type> <shift_bits>][, FG<flag_group>]

Operands

Operand Description

wrd

Name of the destination WDR

Decode as unsigned(WRD)

wrs1

Name of the first source WDR

Decode as unsigned(WRS1)

wrs2

Name of the second source WDR

Decode as unsigned(WRS2)

shift_type

The direction of an optional shift applied to <wrs2>.

Assembly Syntax	Value
`<<`	`0`
`>>`	`1`

Decode as unsigned(ST)

shift_bits

Number of bits by which to shift <wrs2>. Defaults to 0.

Valid range: 0 to 248 in steps of 8.

Decode as unsigned({SB, 3'b0})

flag_group

Flag group to use. Defaults to 0.

Valid range: 0 to 1.

Decode as unsigned(FG)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BN.OR	FG	ST	SB					WRS2					WRS1					1	0	0	WRD					1	1	1	1	0	1	1

Operation

In the listing below, operand shift_type is referred to by its integer value. The operand table above shows how this corresponds to assembly syntax.

a = WDRs[wrs1]
b = WDRs[wrs2]
b_shifted = logical_byte_shift(b, shift_type, shift_bytes)

result = a | b_shifted
WDRs[wrd] ⇐ result
state.set_mlz_flags(flag_group, result)

BN.NOT

Bitwise NOT. Negates the value in wrs and stores the result in the register referenced by wrd. The source value can be shifted by an immediate before it is consumed by the operation. The M, L and Z flags in flag group flag_group are updated with the result of the operation.

Errors

BN.NOT cannot cause any software errors.

Syntax

BN.NOT <wrd>, <wrs>[ <shift_type> <shift_bits>][, FG<flag_group>]

Operands

Operand Description

wrd

Name of the destination WDR

Decode as unsigned(WRD)

wrs

Name of the source WDR

Decode as unsigned(WRS)

shift_type

The direction of an optional shift applied to <wrs2>.

Assembly Syntax	Value
`<<`	`0`
`>>`	`1`

Decode as unsigned(ST)

shift_bits

Number of bits by which to shift <wrs2>. Defaults to 0.

Valid range: 0 to 248 in steps of 8.

Decode as unsigned({SB, 3'b0})

flag_group

Flag group to use. Defaults to 0.

Valid range: 0 to 1.

Decode as unsigned(FG)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BN.NOT	FG	ST	SB					WRS										1	0	1	WRD					1	1	1	1	0	1	1

Operation

In the listing below, operand shift_type is referred to by its integer value. The operand table above shows how this corresponds to assembly syntax.

a = WDRs[wrs]
a_shifted = logical_byte_shift(a, shift_type, shift_bytes)

result = a_shifted ^ ((1 << 256) - 1)
WDRs[wrd] ⇐ result
state.set_mlz_flags(flag_group, result)

BN.XOR

Bitwise XOR. Performs a bitwise xor operation. Takes the values stored in WDRs referenced by wrs1 and wrs2 and stores the result in the WDR referenced by wrd. The content of the second source WDR can be shifted by an immediate before it is consumed by the operation. The M, L and Z flags in flag group flag_group are updated with the result of the operation.

Errors

BN.XOR cannot cause any software errors.

Syntax

BN.XOR <wrd>, <wrs1>, <wrs2>[ <shift_type> <shift_bits>][, FG<flag_group>]

Operands

Operand Description

wrd

Name of the destination WDR

Decode as unsigned(WRD)

wrs1

Name of the first source WDR

Decode as unsigned(WRS1)

wrs2

Name of the second source WDR

Decode as unsigned(WRS2)

shift_type

The direction of an optional shift applied to <wrs2>.

Assembly Syntax	Value
`<<`	`0`
`>>`	`1`

Decode as unsigned(ST)

shift_bits

Number of bits by which to shift <wrs2>. Defaults to 0.

Valid range: 0 to 248 in steps of 8.

Decode as unsigned({SB, 3'b0})

flag_group

Flag group to use. Defaults to 0.

Valid range: 0 to 1.

Decode as unsigned(FG)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BN.XOR	FG	ST	SB					WRS2					WRS1					1	1	0	WRD					1	1	1	1	0	1	1

Operation

In the listing below, operand shift_type is referred to by its integer value. The operand table above shows how this corresponds to assembly syntax.

a = WDRs[wrs1]
b = WDRs[wrs2]
b_shifted = logical_byte_shift(b, shift_type, shift_bytes)

result = a ^ b_shifted
WDRs[wrd] ⇐ result
state.set_mlz_flags(flag_group, result)

BN.RSHI

Concatenate and right shift immediate. Concatenates the content of WDRs referenced by wrs1 and wrs2 (wrs1 forms the upper part), shifts it right by an immediate value and truncates to WLEN bit. The result is stored in the WDR referenced by wrd.

Errors

BN.RSHI cannot cause any software errors.

Syntax

BN.RSHI <wrd>, <wrs1>, <wrs2> >> <imm>

Operands

Operand	Description
`wrd`	Name of the destination WDR Decode as `unsigned(WRD)`
`wrs1`	Name of the first source WDR Decode as `unsigned(WRS1)`
`wrs2`	Name of the second source WDR Decode as `unsigned(WRS2)`
`imm`	Number of bits to shift the second source register by. Valid range: 0..(WLEN-1). Valid range: `0` to `255`. Decode as `unsigned({IMM_1, IMM_0})`

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BN.RSHI	IMM_1							WRS2					WRS1					IMM_0	1	1	WRD					1	1	1	1	0	1	1

Operation

a = WDRs[wrs1]
b = WDRs[wrs2]

result = (((a << 256) | b) >> imm) & ((1 << 256) - 1)
WDRs[wrd] ⇐ result

BN.SEL

Flag Select. Returns in the destination WDR the value of the first source WDR if the flag in the chosen flag group is set, otherwise returns the value of the second source WDR.

Errors

BN.SEL cannot cause any software errors.

Syntax

BN.SEL <wrd>, <wrs1>, <wrs2>, [FG<flag_group>.]<flag>

Operands

Operand Description

wrd

Name of the destination WDR

Decode as unsigned(WRD)

wrs1

Name of the first source WDR

Decode as unsigned(WRS1)

wrs2

Name of the second source WDR

Decode as unsigned(WRS2)

flag_group

Flag group to use. Defaults to 0.

Valid range: 0 to 1.

Decode as unsigned(FG)

flag

Flag to check. Valid values:

C: Carry flag
M: MSB flag
L: LSB flag
Z: Zero flag

Assembly Syntax	Value
`c`	`0`
`m`	`1`
`l`	`2`
`z`	`3`

Decode as unsigned(FLAG)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BN.SEL	FG					FLAG		WRS2					WRS1					0	0	0	WRD					0	0	0	1	0	1	1

Operation

In the listing below, operand flag is referred to by its integer value. The operand table above shows how this corresponds to assembly syntax.

flag_is_set = FLAGs[flag_group].get_by_idx(flag)
wrs = wrs1 if flag_is_set else wrs2
value = WDRs[wrs]
WDRs[wrd] ⇐ value

BN.CMP

Compare. Subtracts the second WDR value from the first one and updates flags. This instruction is identical to BN.SUB, except that no result register is written.

Errors

BN.CMP cannot cause any software errors.

Syntax

BN.CMP <wrs1>, <wrs2>[ <shift_type> <shift_bits>][, FG<flag_group>]

Operands

Operand Description

wrs1

Name of the first source WDR

Decode as unsigned(WRS1)

wrs2

Name of the second source WDR

Decode as unsigned(WRS2)

shift_type

The direction of an optional shift applied to <wrs2>.

Assembly Syntax	Value
`<<`	`0`
`>>`	`1`

Decode as unsigned(ST)

shift_bits

Number of bits by which to shift <wrs2>. Defaults to 0.

Valid range: 0 to 248 in steps of 8.

Decode as unsigned({SB, 3'b0})

flag_group

Flag group to use. Defaults to 0.

Valid range: 0 to 1.

Decode as unsigned(FG)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BN.CMP	FG	ST	SB					WRS2					WRS1					0	0	1						0	0	0	1	0	1	1

Operation

In the listing below, operand shift_type is referred to by its integer value. The operand table above shows how this corresponds to assembly syntax.

a = WDRs[wrs1]
b = WDRs[wrs2]
b_shifted = logical_byte_shift(b, shift_type, shift_bytes)

full_result = a - b_shifted
mask256 = (1 << 256) - 1
masked_result = full_result & mask256
carry_flag = bool((full_result >> 256) & 1)
flags = FlagReg.mlz_for_result(carry_flag, masked_result)

FLAGs[flag_group] ⇐ flags

BN.CMPB

Compare with Borrow. Subtracts the second WDR value from the first one and updates flags. This instruction is identical to BN.SUBB, except that no result register is written.

Errors

BN.CMPB cannot cause any software errors.

Syntax

BN.CMPB <wrs1>, <wrs2>[ <shift_type> <shift_bits>][, FG<flag_group>]

Operands

Operand Description

wrs1

Name of the first source WDR

Decode as unsigned(WRS1)

wrs2

Name of the second source WDR

Decode as unsigned(WRS2)

shift_type

The direction of an optional shift applied to <wrs2>.

Assembly Syntax	Value
`<<`	`0`
`>>`	`1`

Decode as unsigned(ST)

shift_bits

Number of bits by which to shift <wrs2>. Defaults to 0.

Valid range: 0 to 248 in steps of 8.

Decode as unsigned({SB, 3'b0})

flag_group

Flag group to use. Defaults to 0.

Valid range: 0 to 1.

Decode as unsigned(FG)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BN.CMPB	FG	ST	SB					WRS2					WRS1					0	1	1						0	0	0	1	0	1	1

Operation

In the listing below, operand shift_type is referred to by its integer value. The operand table above shows how this corresponds to assembly syntax.

a = WDRs[wrs1]
b = WDRs[wrs2]
b_shifted = logical_byte_shift(b, shift_type, shift_bytes)
borrow = int(FLAGs[flag_group].C)

full_result = a - b_shifted - borrow
mask256 = (1 << 256) - 1
masked_result = full_result & mask256
carry_flag = bool((full_result >> 256) & 1)
flags = FlagReg.mlz_for_result(carry_flag, masked_result)

FLAGs[flag_group] ⇐ flags

BN.LID

Load Word (indirect source, indirect destination). Load a WLEN-bit little-endian value from data memory.

The load address is offset plus the value in the GPR grs1. The loaded value is stored into the WDR given by the bottom 5 bits of the GPR grd.

After the operation, either the value in the GPR grs1, or the value in grd can be optionally incremented. Specifying both grd_inc and grs1_inc results in an error (with error code ErrCodeIllegalInsn).

If grd_inc is set, grd is updated to be *grd + 1.
If grs1_inc is set, the value in grs1 is incremented by value WLEN/8 (one word).

The memory address must be aligned to WLEN bits. Any address that is unaligned or is above the top of memory results in an error (setting bit bad_data_addr in ERR_BITS). Any *grd value greater than 31 before executing the instruction results in an error (setting bit illegal_insn in ERR_BITS) and no load or optional increment occurring. This instruction takes 2 cycles.

Errors

BN.LID might cause the following software errors:

A CALL_STACK error from using x1 as grs1 or grd when the call stack is empty.
An ILLEGAL_INSN error if both grd_inc and grs1_inc are set.
An ILLEGAL_INSN error if the value in GPR grd is greater than 31.
A BAD_DATA_ADDR error if the computed address is not a valid DMEM address aligned to WLEN bits.

Syntax

BN.LID <grd>[<grd_inc>], <offset>(<grs1>[<grs1_inc>])

Operands

Operand	Description
`grd`	Name of the GPR referencing the destination WDR Decode as `unsigned(GRD)`
`grs1`	Name of the GPR containing the memory byte address. The value contained in the referenced GPR must be WLEN-aligned. Decode as `unsigned(GRS1)`
`offset`	Offset value. Must be WLEN-aligned. Valid range: `-16384` to `16352` in steps of `32`. Decode as `signed({OFF_1, OFF_0, 5'b0})`
`grs1_inc`	Increment the value in `<grs1>` by WLEN/8 (one word). Cannot be specified together with `grd_inc`. To specify, use the literal syntax `++` Decode as `unsigned(INC1)`
`grd_inc`	Increment the value in `<grd>` by one. Cannot be specified together with `grs1_inc`. To specify, use the literal syntax `++` Decode as `unsigned(INCD)`

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BN.LID	OFF_0							GRD					GRS1					1	0	0	OFF_1			INC1	INCD	0	0	0	1	0	1	1

Operation

In the listing below, operands grs1_inc and grd_inc are referred to by their integer value. The operand table above shows how this corresponds to assembly syntax.

# BN.LID executes over two cycles. On the first cycle, we read the base
# address, compute the load address and check it for correctness,
# increment any GPRs, then perform the load itself. On the second
# cycle, update the WDR with the result.

if grs1_inc and grd_inc:
    state.stop_at_end_of_cycle(ErrBits.ILLEGAL_INSN)
    return None

grs1_val = GPRs[grs1]
addr = (grs1_val + offset) & ((1 << 32) - 1)
grd_val = GPRs[grd]

bad_grs1 = state.gprs.call_stack_err and (grs1 == 1)
bad_grd = state.gprs.call_stack_err and (grd == 1)

saw_err = False

if state.gprs.call_stack_err:
    state.stop_at_end_of_cycle(ErrBits.CALL_STACK)
    saw_err = True

if grd_val > 31 and not bad_grd:
    state.stop_at_end_of_cycle(ErrBits.ILLEGAL_INSN)
    saw_err = True

if not DMEM.is_valid_256b_addr(addr) and not bad_grs1:
    state.stop_at_end_of_cycle(ErrBits.BAD_DATA_ADDR)
    saw_err = True

if saw_err:
    return None

wrd = grd_val & 0x1f
value = DMEM.load_u256(addr)

if grd_inc:
    new_grd_val = grd_val + 1
    GPRs[grd] ⇐ new_grd_val

if grs1_inc:
    new_grs1_val = (grs1_val + 32) & ((1 << 32) - 1)
    GPRs[grs1] ⇐ new_grs1_val

# Stall for a single cycle for memory to respond
yield None

if value is None:
    state.stop_at_end_of_cycle(ErrBits.DMEM_INTG_VIOLATION)
    return None

WDRs[wrd] ⇐ value
return None

BN.SID

Store Word (indirect source, indirect destination). Store a WDR to memory as a WLEN-bit little-endian value.

The store address is offset plus the value in the GPR grs1. The value to store is taken from the WDR given by the bottom 5 bits of the GPR grs2.

After the operation, either the value in the GPR grs1, or the value in grs2 can be optionally incremented. Specifying both grs1_inc and grs2_inc results in an error (with error code ErrCodeIllegalInsn).

If grs1_inc is set, the value in grs1 is incremented by the value WLEN/8 (one word).
If grs2_inc is set, the value in grs2 is updated to be *grs2 + 1.

The memory address must be aligned to WLEN bits. Any address that is unaligned or is above the top of memory results in an error (setting bit bad_data_addr in ERR_BITS). Any *grs2 value greater than 31 before executing the instruction results in an error (setting bit illegal_insn in ERR_BITS) and no store or optional increment occurring.

Errors

BN.SID might cause the following software errors:

A CALL_STACK error from using x1 as grs1 or grs2 when the call stack is empty.
An ILLEGAL_INSN error if both grs1_inc and grs2_inc are set.
An ILLEGAL_INSN error if the value in GPR grs2 is greater than 31.
A BAD_DATA_ADDR error if the computed address is not a valid DMEM address aligned to WLEN bits.

Syntax

BN.SID <grs2>[<grs2_inc>], <offset>(<grs1>[<grs1_inc>])

Operands

Operand	Description
`grs1`	Name of the GPR containing the memory byte address. The value contained in the referenced GPR must be WLEN-aligned. Decode as `unsigned(GRS1)`
`grs2`	Name of the GPR referencing the source WDR. Decode as `unsigned(GRS2)`
`offset`	Offset value. Must be WLEN-aligned. Valid range: `-16384` to `16352` in steps of `32`. Decode as `signed({OFF_1, OFF_0, 5'b0})`
`grs1_inc`	Increment the value in `<grs1>` by WLEN/8 (one word). Cannot be specified together with `grs2_inc`. To specify, use the literal syntax `++` Decode as `unsigned(INC1)`
`grs2_inc`	Increment the value in `<grs2>` by one. Cannot be specified together with `grs1_inc`. To specify, use the literal syntax `++` Decode as `unsigned(INC2)`

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BN.SID	OFF_0							GRS2					GRS1					1	0	1	OFF_1			INC1	INC2	0	0	0	1	0	1	1

Operation

In the listing below, operands grs1_inc and grs2_inc are referred to by their integer value. The operand table above shows how this corresponds to assembly syntax.

if grs1_inc and grs2_inc:
    state.stop_at_end_of_cycle(ErrBits.ILLEGAL_INSN)
    return None

grs1_val = GPRs[grs1]
addr = (grs1_val + offset) & ((1 << 32) - 1)
grs2_val = GPRs[grs2]

bad_grs1 = state.gprs.call_stack_err and (grs1 == 1)
bad_grs2 = state.gprs.call_stack_err and (grs2 == 1)

saw_err = False

if state.gprs.call_stack_err:
    state.stop_at_end_of_cycle(ErrBits.CALL_STACK)
    saw_err = True

if grs2_val > 31 and not bad_grs2:
    state.stop_at_end_of_cycle(ErrBits.ILLEGAL_INSN)
    saw_err = True

if not DMEM.is_valid_256b_addr(addr) and not bad_grs1:
    state.stop_at_end_of_cycle(ErrBits.BAD_DATA_ADDR)
    saw_err = True

if saw_err:
    return None

if grs1_inc:
    new_grs1_val = (grs1_val + 32) & ((1 << 32) - 1)
    GPRs[grs1] ⇐ new_grs1_val

if grs2_inc:
    new_grs2_val = grs2_val + 1
    GPRs[grs2] ⇐ new_grs2_val

yield None

wrs = grs2_val & 0x1f
wrs_val = WDRs[wrs]

DMEM.store_u256(addr, wrs_val)
return None

BN.MOV

Copy content between WDRs (direct addressing).

Errors

BN.MOV cannot cause any software errors.

Syntax

BN.MOV <wrd>, <wrs>

Operands

Operand	Description
`wrd`	Decode as `unsigned(WRD)`
`wrs`	Decode as `unsigned(WRS)`

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BN.MOV	0												WRS					1	1	0	WRD					0	0	0	1	0	1	1

Operation

value = WDRs[wrs]
WDRs[wrd] ⇐ value

BN.MOVR

Copy content between WDRs (register-indirect addressing). Copies WDR contents between registers with indirect addressing.

After the operation, either the value in the GPR grd, or the value in grs can be optionally incremented. Specifying both grd_inc and grs_inc results in an error (with error code ErrCodeIllegalInsn).

If grd_inc is set, grd is updated to be *grd + 1.
If grs_inc is set, grs is updated to be *grs + 1.

Any *grd or *grs value greater than 31 results in an error (setting bit illegal_insn in ERR_BITS)

Errors

BN.MOVR might cause the following software errors:

A CALL_STACK error from using x1 as grs or grd when the call stack is empty.
An ILLEGAL_INSN error if either the value in GPR grd or the value in GPR grs is greater than 31.
An ILLEGAL_INSN error if both grs_inc and grd_inc are set.

Syntax

BN.MOVR <grd>[<grd_inc>], <grs>[<grs_inc>]

Operands

Operand	Description
`grd`	Name of the GPR containing the destination WDR. Decode as `unsigned(GRD)`
`grs`	Name of the GPR referencing the source WDR. Decode as `unsigned(GRS)`
`grd_inc`	Increment the value in `<grd>` by one. Cannot be specified together with `grs_inc`. To specify, use the literal syntax `++` Decode as `unsigned(GRD_INC)`
`grs_inc`	Increment the value in `<grs>` by one. Cannot be specified together with `grd_inc`. To specify, use the literal syntax `++` Decode as `unsigned(GRS_INC)`

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BN.MOVR	1							GRD					GRS					1	1	0			GRS_INC		GRD_INC	0	0	0	1	0	1	1

Operation

In the listing below, operands grd_inc and grs_inc are referred to by their integer value. The operand table above shows how this corresponds to assembly syntax.

if grs_inc and grd_inc:
    state.stop_at_end_of_cycle(ErrBits.ILLEGAL_INSN)
    return None

grd_val = GPRs[grd]
grs_val = GPRs[grs]

bad_grs = state.gprs.call_stack_err and (grs == 1)
bad_grd = state.gprs.call_stack_err and (grd == 1)

saw_err = False

if state.gprs.call_stack_err:
    state.stop_at_end_of_cycle(ErrBits.CALL_STACK)
    saw_err = True

if grd_val > 31 and not bad_grd:
    state.stop_at_end_of_cycle(ErrBits.ILLEGAL_INSN)
    saw_err = True

if grs_val > 31 and not bad_grs:
    state.stop_at_end_of_cycle(ErrBits.ILLEGAL_INSN)
    saw_err = True

if saw_err:
    return None

wrd = grd_val & 0x1f
wrs = grs_val & 0x1f

if grd_inc:
    new_grd_val = grd_val + 1
    GPRs[grd] ⇐ new_grd_val

if grs_inc:
    new_grs_val = grs_val + 1
    GPRs[grs] ⇐ new_grs_val

yield None

value = WDRs[wrs]
WDRs[wrd] ⇐ value
return None

BN.WSRR

Read WSR to register. Reads a WSR to a WDR. If wsr isn’t the index of a valid WSR, this results in an error (setting bit illegal_insn in ERR_BITS).

Errors

BN.WSRR might cause the following software errors:

An ILLEGAL_INSN error if wsr doesn’t name a valid WSR.

Syntax

BN.WSRR <wrd>, <wsr>

Operands

Operand Description

wrd

Destination WDR

Decode as unsigned(WRD)

wsr

The WSR to read

Valid range: 0 to 255.

Decode as unsigned(WSR)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BN.WSRR	0				WSR													1	1	1	WRD					0	0	0	1	0	1	1

Operation

# The first, and possibly only, cycle of execution.
if not state.wsrs.check_idx(wsr):
    # Invalid WSR index. Stop with an illegal instruction error.
    state.stop_at_end_of_cycle(ErrBits.ILLEGAL_INSN)
    return None

if wsr == 0x1:
    # A read from RND. If a RND value is not available, request_value()
    # initiates or continues an EDN request and returns False. If a RND
    # value is available, it returns True.
    while not state.wsrs.RND.request_value():
        # There's a pending EDN request. Stall for a cycle.
        yield None

# At this point, the WSR is ready. Does it have a valid value? (It
# might not if this is a sideload key register and keymgr hasn't
# provided us with a value). If not, fail with a KEY_INVALID error.
if not state.wsrs.has_value_at_idx(wsr):
    state.stop_at_end_of_cycle(ErrBits.KEY_INVALID)
    return None

# The WSR is ready and has a value. Read it.
val = WSRs[wsr]
WDRs[wrd] ⇐ val
return None

BN.WSRW

Write WSR from register. Writes a WDR to a WSR. If wsr isn’t the index of a valid WSR, this results in an error (setting bit illegal_insn in ERR_BITS).

Errors

BN.WSRW might cause the following software errors:

An ILLEGAL_INSN error if wsr doesn’t name a valid WSR.

Syntax

BN.WSRW <wsr>, <wrs>

Operands

Operand Description

wsr

The WSR to read

Valid range: 0 to 255.

Decode as unsigned(WSR)

wrs

Source WDR

Decode as unsigned(WRS)

Encoding

	31	30	29	28	27	26	25	24	23	22	21	20	19	18	17	16	15	14	13	12	11	10	9	8	7	6	5	4	3	2	1	0
BN.WSRW	1				WSR								WRS					1	1	1						0	0	0	1	0	1	1

Operation

val = WDRs[wrs]
WSRs[wsr] ⇐ val