|
Whitepaper
Suggested ARM Assembler Directives to
Allow Assembly Language Code to be
Independent of Processor Mode
Kade Hansson BComp(Hons.)
Kasoft Software, Australia
Document Number: KW10208,02
First Issues: Aug 2002
Issued: Jan 2003, 2nd Edition
Copyright 2002, 2003 Kasoft Software
All rights reserved
Abstract
This technical whitepaper discusses a possible solution to the ARM code
migration problem facing software developers who wish to take advantage of the
significant improvements in ARM processors which are based on the ARM version
5 architectures. In particular, Pace Micro Technology, Plc, have finally
committed RISC OS to a 32-bit PC mode future, with no backwards compatibility
with 26-bit PC mode code. Therefore, now is the time for all RISC OS code to
become PC width neutral.
Disclaimer
All particulars of the products referenced in this whitepaper are given by
Kasoft in good faith. However, all warranties implied or expressed, including
but not limited to implied warranties or merchantability, or fitness for
purpose, are excluded.
This whitepaper is intended only to assist the reader in the migration of
legacy 26-bit PC mode ARM source code to a PC width neutral form. Kasoft shall
not be liable for any loss or damage arising from the use of any information
in this whitepaper, or any error or omission in such information.
Kasoft Software has no connection with ARM Limited, and merely has expertise
in assembly programming on ARM devices. Kasoft Software was founded in 1986,
and has made no releases of commercial software. Kasoft specializes in
compilers, language systems and programmer-level utilities. Some applications
and libraries developed by Kasoft to run under RISC OS have had limited
freeware releases. Kasoft's work to this point could largely be categorized as
research, and much of the software developed has yet to be formed into
releasable products, although this is the long term goal.
Kasoft also has no connection with Pace Micro Technology, Plc.
ARM is a trademark of Advanced RISC Machines Limited.
Kasoft is a trademark of Kade Hansson in the category of Computer Software.
RISC OS is a trademark of Pace Micro Technology, Plc.
Definitions
- ARM version 2
- The sole characteristic of ARMv2 that this paper is interested
in is it exclusive use of a 26-bit program counter (PC) and
combined status register in 32 bits, and all the instructions
which rely on this. The absence of MSR and MRS instructions is
relevant also.
- 26-bit PC
- The only possible configuration of the program counter (PC) in
ARMv2. See ARM version 2.
- ARM version 3
- The sole characteristic of ARMv3 that this paper is interested
in is its recommendation for code to become 32-bit neutral
using new modes which expand the program counter (PC) to 32-
bits and store current and saved flags in seperate registers
(the PSRs.)
- 32-bit PC
- A new possible configuration of the program counter (PC) in
ARMv3. See ARM version 3.
- 32-bit or PC width neutral code
- Code which does not rely on there being a particular number or
arrangement of bits in the program counter (PC).
- 26-bit or 26-bit PC processor modes
- SVC26, USR26, IRQ26, FIQ26.
- 32-bit or 32-bit PC processor modes
- SVC32, USR32, IRQ32, FIQ32, ABT32, UND32 and SYS32.
- banked registers
- R8-R12, SP, LR, PC and SPSR.
- unbanked registers
- R0-R7 and CPSR.
- S bit instructions
- Instructions like MOVS (where PC is the target) or LDM...^
(where PC is in the register list). For the purpose of this
paper we ignore the use of the S flag to cause artithmetic
instructions to affect the flags.
- SVC26
- Supervisor26. A mode which uses a 26-bit program counter (PC)
and combined status register in 32-bits. Instructions like
MOVS and LDM...^ are encouraged to allow SVC26 code to be
reentrant. Most of RISC OS (before version 5.00, at least)
runs in SVC26.
- USR26
- User26. A mode which uses a 26-bit program counter (PC) and
combined status register in 32-bits. Instructions like MOVS
and LDM...^ are permitted, and can make recursive subroutines
easier to write and debug. User26 is preferable to SVC26 for
all but low-level hardware device drivers and OS memory
management routines, because it facilitates stronger memory
protection.
- SVC32
- Supervisor32. A mode which uses a 32-bit program counter (PC),
separate CPSR (CPSR) for the current flags and has its own
separate SPSR (SPSR_svc) to save the caller's flags. Most of
RISC OS 5.00 runs in SVC32. Instructions like MOVS and LDM...^
are discouraged because they alone do not allow for reentrancy.
- USR32
- User32. A mode which uses a 32-bit program counter (PC)
and separate CPSR (CPSR) for the current flags. Instructions
like MOVS and LDM...^ are UNPREDICTABLE, and must not be used.
To save the caller's flags requires the use of a main register
(if recursion is not on the cards) or the stack. USR32 is
preferable to SVC32 for all but low-level hardware device
drivers and OS memory management routines, because it
facilitates stronger memory protection.
- callee saves flags
- RISC OS versions before the Select versions maintain that the
callee should save the flags. That is, all OS entry points,
including those added by third party relocatable modules,
should use the SPSR (saved flags in R14) to return the
caller's state unchanged, or some defined result dependent on
the routine (e.g. V bit set to indicate an error condition.)
- callee corrupts flags
- RISC OS versions from the Select scheme, and the new Pace
5.00 version, maintain that the callee may corrupt the flags
in the SPSR and return state via the CPSR. That is, all OS
entry points, including those added by third party modules,
MAY corrupt the SPSR (saved flags in R14 or SPSR_svc), but
should return with V in the CPSR clear unless indicating an
error condition. All Select versions at the time of writing
use V from the SPSR (saved flags in R14) and no routines in
these Select distributions take opportunity corrupt the flags.
OS routines in RISC OS 5.00 must run in SVC32 mode, so they
necessarily corrupt the flags in the SPSR (SPSR_svc) and
use the CPSR to return state.
- SVC/USR-adaptable subroutine
- An entry point which can simultaneously be an OS entry point,
entered in SVC26 and saving flags, or a library entry point,
entered in USR26 and saving flags. This mode neutrality is
not possible without processor introspection (and significant
overhead) when we consider the 32-bit program counter (PC)
analogues of these modes: SVC32 and USR32. Unless, at least,
we adopt callee corrupts flags.
- AddressDevice DA
- A Kasoft project, not yet considered release ready, which had
as one of its initial goals to run relocatable module code
outside 26-bit space. This goal has now been realised within
RISC OS 5.00, so this goal has been modified and it is now to
allow relocatable module code outside RMA. AddressDevice DA
is a memory area filing system for RISC OS which blurs the
distinction between files and memory areas.
- Kasm
- Kasoft ARM Assembler. An assembler used internally at Kasoft
which has little likelihood of release in its current form.
It supports the macros described in this document directly in
its preprocessor.
The Problem
By 1995, Kasoft had made a significant investment in producing low-level code
for the ARM version 2 architecture. In 1995, it was apparent that the ARM
version 2 architecture was likely to be completely unsupported within 5 years.
Kasoft therefore obtained version 3 hardware to facilitate the beginning of
the migration of legacy code to the new class of ARM devices. Much of this
code consists of routines which run in SVC26 mode, which posed unique
challenges as code in this mode relies on features which are unique in the
ARM architectures up to version 3 and not supported in ARM architectures
beyond version 3.
The ARM version 3 architecture added support for pure 32-bit processor modes.
Kasoft wanted to adopt these modes in new software as soon as possible, but
as RISC OS was the supporting operating system for all Kasoft ARM code, and it
had not adopted the new modes, it was not possible to amend sources in a
backwards compatible manner- one that simultaneously, without processor
introspection, satisfies the 26-bit PC only processors and the 32-bit PC only
processors and retains the RISC OS callee saves flags semantics.
It is worth pointing out at this point that its was Kasoft's situation of
being reliant of SVC26 and the "for free" compatibilty of such routines with
USR26 that lead us to the conclusion that Kasoft assembly routines were
intended (at least subconsciously) to be mode neutral. The challenge with
architecture version 3 was that it lead to a dichotomy between SVC32 (and
other supervisor and abort modes) and USR32, at least if one assumed that
callee saves flags would continue to prevail in the new environment. If we had
known then that callee saves flags was doomed, we may have simply redefined
our APIs and removed all the S bit usage from our subroutine entry and exit
veneers.
The 32-bit processor modes are significantly different in their configuration
from their 26-bit counterparts, and there are very few SVC/USR-adaptable
subroutine entry and exit veneers using 26-bit PC ARM compatible instruction
sequences which can be directly executed in a 32-bit mode. The central reason
for this is the introduction of two new registers (to facilitate the
restoration and alteration of status flag and mode state), one of which is
banked, and the altered interpretation of PC[27:31]. Only routines which do
not attempt to modify the status flags in the 26-bit link register or PC
during their exit veneer will succeed without modification.
It was apparent that every entry and exit veneer to SVC/USR-adaptable
subroutines would need to be changed in order, at a minimum, to allow that
code to run in 32-bit configuration. Rather than recoding every veneer by
hand, it was decided to standardize subroutine entry and return by using user-
defined assembler macros which would expand to either a 26-bit PC compatible
sequence (to allow code to continue to operate on RISC OS) or a 32-bit PC
compatible sequence (assuming SVC32 and a caller saves flags semantics). The
macros created allow a macro assembler (like extASM) to produce the correct
code based upon the setting of an assembly source variable which is reflects
the target processor mode for the object code. The macros interrogate this
"mode" variable and produce the correct entry or exit veneer.
The in-house development of AddressDevice DA at one point allowed our modified
RISC OS relocatable modules to run in SVC32 mode and use callee saves flags
semantics, though with some restrictions. No RISC OS modules outside Kasoft
will ever run in this configuration, however, as RISC OS has since been
changed so that all APIs are cast in the callee corrupts flags mould. In line
with this, the macros now adapt all code sequences which use the S flag in
their instructions to not use that facility when the "mode" variable is set to
"USR32". To differentiate modules assembled with callee saves flags semantics
from those assembled with callee corrupts flags semantics, the designation
"[SVC32]" is applied to the module help string after the version date, and in
the reverse, the designation "[USR32]" is applied. The designation "[SVC26]"
is applied to backwards compatible modules, but these are no longer
maintained.
The Kasm Solution
The Kasoft ARM Assembler (Kasm) adds three pseudo-instructions (or assembler
macros or directives) to cater for the constuction of assembly routines which
can ignore the processor mode in which they are running. These are SBR, which
caters for subroutine entry, and RET and RTS, which cater for subroutine exit.
In some cases, the expansion of SBR, RET or RTS requires a temporary register
in order to access the CPSR or SPSR. This temporary register is allocated from
the pool of those designated by TEMP directives, and not subsequently removed
by LOCK directives. (The TEMP and LOCK directives are borrowed from extASM.)
The directive MODE is used to specify the processor mode of the assembly
source code following it. Which expansion of SBR, RET or RTS is used depends
on this directive preceding any code containing any of them. e.g. MODE USR32
declares that the following code is to be executed in USR32 mode. You can
simulate the MODE directive in extASM by setting up a variable "mode" by using
#set mode="USR32" .
Most expansions of SBR, RET and RTS utilize the stack of the callee mode. In
Kasm, you can specify the type of stack (e.g. full descending) by using an
STYPE directive. e.g. STYPE FD sets a full descending stack as the default
stack type for all subsequent STM, LDM, SBR, RET and RTS instructions.
It is the programmer's responsibility to deal with the non-atomic nature of
the macro expansions in some cases. For example, the possibilities of
interrupts or abort conditions affecting the SPSR need to be considered and
dealt with appropriately- perhaps by disabling interrupts at the point a macro
is used. The macros which manipulate the SPSR are best suited to an operating
environment where the callee saves flags semantic prevails.
SBR—SuBRoutine entry
The assembly syntax is as follows:
SBR{cond} Rn!,<Rlist>
An SBR instruction has the same form as a STM instruction under the Kasm
assembler, except that while the base register increment/decrement semantics
are optionally specified for a STM, these are always implicit for an SBR. The
definition of cond, Rn and Rlist are as given in the ARM data sheets.
Writeback must always be used or it is an error.
One instruction (being STM Rn!,<Rlist>) is generated if this directive occurs
within USR32 mode code.
Otherwise, a SBR instruction assembles into that same instruction in 26-bit
PC mode code, and into three instructions in SVC32, IRQ32, FIQ32, ABT32,
UND32 and SYS32 code. In the one instruction case, SBR is equivalent to STMDB
(after an STYPE FD directive), STMIA (after an STYPE EA), STMDA (after an
STYPE ED), or STMIB (after an STYPE FA) instruction of the same form.
One example of the three instruction case for a full descending stack is:
STMFD <as for SBR instruction>
MRS rtemp,SPSR
STR rtemp,[sp,#-4]!; Stack SPSR
RET—subroutine RETurn
The assembly syntax is as follows:
RET{cond}{S} {Rn!,<Rlist>{^}}
A RET instruction usually has the same form as a LDM instruction under the
Kasm assembler, except that while the base register increment/decrement
semantics are optionally specified for a LDM, these are always implicit for a
RET. The definition of cond, Rn and Rlist are as given in the ARM data sheets.
Writeback must always be used and PC must appear in Rlist or it is an error.
The S bit determines whether it is the SPSR or the CPSR which is returned to
the caller. i.e. the S bit can only be set when returning from SVC, IRQ, FIQ,
ABT32 or UND32 modes.
The S bit is ignored and one instruction (being LDM Rn!,<Rlist>) generated
if this directive occurs within USR32 mode code.
Otherwise, a RET instruction assembles into one instruction if the S bit is
unset, or if the S bit is set and it appears in 26-bit PC mode code, and into
three instructions when it is set. In the one instruction case, RET is
equivalent to LDMIA (after an STYPE FD directive), LDMDB (after an STYPE EA),
LDMIB (after an STYPE ED), or LDMDA (after an STYPE FA) instruction of the
same form.
One example of the three instruction case for a full descending stack is:
LDR rtemp,sp,#4
MSR SPSR,rtemp; Restore SPSR
LDMFD <as for RET instruction>
It is an error for either of these forms to include the letter "S" in the
mnemonic. They must use the "^" symbol to indicate the S bit is set.
A RET instruction with no parameters restores PC from the link register and,
if the S bit is set, restores the CPSR from the SPSR of the callee mode. The
former case assembles to MOV PC,lr, and the latter case assembles as MOVS PC,
lr.
RTS—subroutine ReTurn with flags Set
The assembly syntax is as follows:
RTS{cond}{S} {Rn!,<Rlist>{^},}<#flags>
A RTS instruction usually has a similar form to a LDM instruction under the
Kasm assembler, except that while the base register increment/decrement
semantics are optionally specified for a LDM, these are always implicit for a
RET. One notable addition is the inclusion of an immediate constant, #flags,
which is used to set the status flags. The definition of cond, Rn and Rlist
are as given in the ARM data sheets. Writeback must always be used and PC must
appear in Rlist or it is an error. The S bit determines whether it is the SPSR
or the CPSR which is returned to the caller. i.e. the S bit can only be set
when returning from SVC, IRQ, FIQ, ABT32 or UND32 modes.
The S bit is ignored and two instructions (being MSR CPSR_flg,#flags and LDM
Rn!,<Rlist>) generated if this directive occurs within USR32 mode code and
a register list is used.
Otherwise, a RTS instruction assembles into those same instructions if the S
bit is unset, or if the S bit is set and it appears in 26-bit PC mode code,
and into four instructions when it is set. In the 32-bit mode two instruction
case, RTS is a MSR CPSR_flg,#flags instruction, followed LDMIA (after an STYPE
FD directive), LDMDB (after an STYPE EA), LDMIB (after an STYPE ED), or LDMDA
(after an STYPE FA) instruction with Rn as the base register, the writeback
flag set, the S bit unset and the register list as for the RTS. In the 26-bit
mode two instruction case, RTS is a LDMIA (after an STYPE FD directive), LDMDB
(after an STYPE EA), LDMIB (after an STYPE ED), or LDMDA (after an STYPE FA)
instruction of the same form but with lr substituted where PC appears in
Rlist, followed by an ORRS PC,lr,#flags instruction. If a RTS instruction is
used in a 26-bit PC mode with the S bit unset, it is an error.
One example of the four instruction case for a full descending stack is:
LDR rtemp,sp,#4
ORR rtemp,rtemp,#flags; Change flags
MSR SPSR,rtemp; Restore SPSR with changed flags
LDMFD <as for RET instruction>
It is an error for either of these forms to include the letter "S" in the
mnemonic. They must use the "^" symbol to indicate the S bit is set.
A RTS instruction with just an immediate constant restores PC from the link
register and, if the S bit is set, restores the CPSR from the SPSR of the
callee mode, but affects the CPSR of the caller by updating the flags, as
indicated by #flags, in either case. The former case assembles to:
MSR CPSR_flg,#flags; Change CPSR flags
MOV PC,lr
In 32-bit PC mode (except USR32), the latter case assembles as:
MSR SPSR_flg,#flags; Change SPSR flags
MOVS PC,lr; Restore changed SPSR to CPSR
In 26-bit mode:
ORRS PC,lr,#flags
Appendix—Example Usage of SBR, RET and RTS
SBR{cond} Rn!,<Rlist>
e.g. SBR sp!,{lr}; Push link
SBR sp!,{lr}^; Push link and SPSR
SBR sp!,{R0,lr}; Push R0 and link
SBR sp!,{R0,lr}^; Push R0, link and SPSR
Note: Because extASM doesn't support macros in this form, you will need
to create a class of instructions for each possible configuration
of the register list (S bit can be inferred from processor mode).
e.g. (SubroutiNe entry)
SNA one register besides link
SNB two registers besides link
SNC three registers besides link
SND four registers besides link
SNE five registers besides link
SNO one range and one register besides link
SNP one range and two registers besides link
SNQ one range and three registers besides link
SNR one range besides link
SNS two ranges besides link
SNT two ranges and one register besides link
SNV one range, one register, another range, besides link
SNZ zero registers besides link
RET{cond}{S} {Rn!,<Rlist>{^}}
e.g. RET; Return to link address
RETS; Restore SPSR to CPSR of calling mode and
; return to link address
RET sp!,{pc}; Return to popped link address
RET sp!,{pc}^; Restore popped SPSR to CPSR of calling
; mode and return to popped link address
RETVC sp!,{R0,pc}; Pop R0 and link, then return to link
; address
RETVC sp!,{R0,pc}^; Pop R0, link and SPSR, then restore SPSR
; to CPSR of calling mode during return to
; link address
Note: Because extASM doesn't support macros in this form, you will need
to create a class of instructions for each possible configuration
of the register list and S bit.
e.g. (S unset- REturn)
REA one register besides PC
REB two registers besides PC
REC three registers besides PC
RED four registers besides PC
REE five registers besides PC
REO one range and one register besides PC
REP one range and two registers besides PC
REQ one range and three registers besides PC
RER one range besides PC
RES two ranges besides PC
RET two ranges and one register besides PC
REV one range, one register, another range, besides PC
REZ zero registers besides PC
(S set- Status Return)
SRA one register besides PC
SRB two registers besides PC
SRC three registers besides PC
SRD four registers besides PC
SRE five registers besides PC
SRO one range and one register besides PC
SRP one range and two registers besides PC
SRQ one range and three registers besides PC
SRR one range besides PC
SRS two ranges besides PC
SRT two ranges and one register besides PC
SRV one range, one register, another range, besides PC
SRZ zero registers besides PC
RTS{cond}{S} {Rn!,<Rlist>{^},}<#flags>
e.g. RTS #vbit; Return to link address and set CPSR V
; bit
RTSS #vbit; Return to link address, but set SPSR V
bit before return
RTSVS sp!,{pc},#vbit; Pop link, sets CPSR V bit and return to
; link address
RTSVS sp!,{pc}^,#vbit; Pop link and SPSR, set SPSR V bit, then
; restore SPSR to CPSR of calling mode
; during return to link address
RTSVS sp!,{r0,pc},#vbit; Pop R0, link, sets CPSR V bit and return
; to link address
RTSVS sp!,{r0,pc}^,#vbit; Pop R0, link and SPSR, set SPSR V bit,
; then restore SPSR to CPSR of calling mode
; during return to link address
Note: Because extASM doesn't support macros in this form, you will need
to create a class of instructions for each possible configuration
of the register list and immediate constant (S bit can be inferred
from processor mode.)
e.g. (V set in immediate constant- Return with V set)
RVA one register besides PC
RVB two registers besides PC
RVC three registers besides PC
RVD four registers besides PC
RVE five registers besides PC
RVO one range and one register besides PC
RVP one range and two registers besides PC
RVQ one range and three registers besides PC
RVR one range besides PC
RVS two ranges besides PC
RVT two ranges and one register besides PC
RVV one range, one register, another range, besides PC
RVZ zero registers besides PC
|