Kasoft


KASOFT
Documentation

Whitepaper

Suggested ARM Assembler Directives to Allow Assembly Language Code to be Independent of Processor Mode

Kade Hansson BComp(Hons.)
Kasoft Software, Australia

Document Number: KW10208,02
First Issues: Aug 2002
Issued: Jan 2003, 2nd Edition
Copyright 2002, 2003 Kasoft Software
All rights reserved

Abstract

This technical whitepaper discusses a possible solution to the ARM code migration problem facing software developers who wish to take advantage of the significant improvements in ARM processors which are based on the ARM version 5 architectures. In particular, Pace Micro Technology, Plc, have finally committed RISC OS to a 32-bit PC mode future, with no backwards compatibility with 26-bit PC mode code. Therefore, now is the time for all RISC OS code to become PC width neutral.

Disclaimer

All particulars of the products referenced in this whitepaper are given by Kasoft in good faith. However, all warranties implied or expressed, including but not limited to implied warranties or merchantability, or fitness for purpose, are excluded.

This whitepaper is intended only to assist the reader in the migration of legacy 26-bit PC mode ARM source code to a PC width neutral form. Kasoft shall not be liable for any loss or damage arising from the use of any information in this whitepaper, or any error or omission in such information.

Kasoft Software has no connection with ARM Limited, and merely has expertise in assembly programming on ARM devices. Kasoft Software was founded in 1986, and has made no releases of commercial software. Kasoft specializes in compilers, language systems and programmer-level utilities. Some applications and libraries developed by Kasoft to run under RISC OS have had limited freeware releases. Kasoft's work to this point could largely be categorized as research, and much of the software developed has yet to be formed into releasable products, although this is the long term goal.

Kasoft also has no connection with Pace Micro Technology, Plc.

ARM is a trademark of Advanced RISC Machines Limited.

Kasoft is a trademark of Kade Hansson in the category of Computer Software.

RISC OS is a trademark of Pace Micro Technology, Plc.

Definitions

ARM version 2
The sole characteristic of ARMv2 that this paper is interested in is it exclusive use of a 26-bit program counter (PC) and combined status register in 32 bits, and all the instructions which rely on this. The absence of MSR and MRS instructions is relevant also.
26-bit PC
The only possible configuration of the program counter (PC) in ARMv2. See ARM version 2.
ARM version 3
The sole characteristic of ARMv3 that this paper is interested in is its recommendation for code to become 32-bit neutral using new modes which expand the program counter (PC) to 32- bits and store current and saved flags in seperate registers (the PSRs.)
32-bit PC
A new possible configuration of the program counter (PC) in ARMv3. See ARM version 3.
32-bit or PC width neutral code
Code which does not rely on there being a particular number or arrangement of bits in the program counter (PC).
26-bit or 26-bit PC processor modes
SVC26, USR26, IRQ26, FIQ26.
32-bit or 32-bit PC processor modes
SVC32, USR32, IRQ32, FIQ32, ABT32, UND32 and SYS32.
banked registers
R8-R12, SP, LR, PC and SPSR.
unbanked registers
R0-R7 and CPSR.
S bit instructions
Instructions like MOVS (where PC is the target) or LDM...^ (where PC is in the register list). For the purpose of this paper we ignore the use of the S flag to cause artithmetic instructions to affect the flags.
SVC26
Supervisor26. A mode which uses a 26-bit program counter (PC) and combined status register in 32-bits. Instructions like MOVS and LDM...^ are encouraged to allow SVC26 code to be reentrant. Most of RISC OS (before version 5.00, at least) runs in SVC26.
USR26
User26. A mode which uses a 26-bit program counter (PC) and combined status register in 32-bits. Instructions like MOVS and LDM...^ are permitted, and can make recursive subroutines easier to write and debug. User26 is preferable to SVC26 for all but low-level hardware device drivers and OS memory management routines, because it facilitates stronger memory protection.
SVC32
Supervisor32. A mode which uses a 32-bit program counter (PC), separate CPSR (CPSR) for the current flags and has its own separate SPSR (SPSR_svc) to save the caller's flags. Most of RISC OS 5.00 runs in SVC32. Instructions like MOVS and LDM...^ are discouraged because they alone do not allow for reentrancy.
USR32
User32. A mode which uses a 32-bit program counter (PC) and separate CPSR (CPSR) for the current flags. Instructions like MOVS and LDM...^ are UNPREDICTABLE, and must not be used. To save the caller's flags requires the use of a main register (if recursion is not on the cards) or the stack. USR32 is preferable to SVC32 for all but low-level hardware device drivers and OS memory management routines, because it facilitates stronger memory protection.
callee saves flags
RISC OS versions before the Select versions maintain that the callee should save the flags. That is, all OS entry points, including those added by third party relocatable modules, should use the SPSR (saved flags in R14) to return the caller's state unchanged, or some defined result dependent on the routine (e.g. V bit set to indicate an error condition.)
callee corrupts flags
RISC OS versions from the Select scheme, and the new Pace 5.00 version, maintain that the callee may corrupt the flags in the SPSR and return state via the CPSR. That is, all OS entry points, including those added by third party modules, MAY corrupt the SPSR (saved flags in R14 or SPSR_svc), but should return with V in the CPSR clear unless indicating an error condition. All Select versions at the time of writing use V from the SPSR (saved flags in R14) and no routines in these Select distributions take opportunity corrupt the flags. OS routines in RISC OS 5.00 must run in SVC32 mode, so they necessarily corrupt the flags in the SPSR (SPSR_svc) and use the CPSR to return state.
SVC/USR-adaptable subroutine
An entry point which can simultaneously be an OS entry point, entered in SVC26 and saving flags, or a library entry point, entered in USR26 and saving flags. This mode neutrality is not possible without processor introspection (and significant overhead) when we consider the 32-bit program counter (PC) analogues of these modes: SVC32 and USR32. Unless, at least, we adopt callee corrupts flags.
AddressDevice DA
A Kasoft project, not yet considered release ready, which had as one of its initial goals to run relocatable module code outside 26-bit space. This goal has now been realised within RISC OS 5.00, so this goal has been modified and it is now to allow relocatable module code outside RMA. AddressDevice DA is a memory area filing system for RISC OS which blurs the distinction between files and memory areas.
Kasm
Kasoft ARM Assembler. An assembler used internally at Kasoft which has little likelihood of release in its current form. It supports the macros described in this document directly in its preprocessor.

The Problem

By 1995, Kasoft had made a significant investment in producing low-level code for the ARM version 2 architecture. In 1995, it was apparent that the ARM version 2 architecture was likely to be completely unsupported within 5 years. Kasoft therefore obtained version 3 hardware to facilitate the beginning of the migration of legacy code to the new class of ARM devices. Much of this code consists of routines which run in SVC26 mode, which posed unique challenges as code in this mode relies on features which are unique in the ARM architectures up to version 3 and not supported in ARM architectures beyond version 3.

The ARM version 3 architecture added support for pure 32-bit processor modes. Kasoft wanted to adopt these modes in new software as soon as possible, but as RISC OS was the supporting operating system for all Kasoft ARM code, and it had not adopted the new modes, it was not possible to amend sources in a backwards compatible manner- one that simultaneously, without processor introspection, satisfies the 26-bit PC only processors and the 32-bit PC only processors and retains the RISC OS callee saves flags semantics.

It is worth pointing out at this point that its was Kasoft's situation of being reliant of SVC26 and the "for free" compatibilty of such routines with USR26 that lead us to the conclusion that Kasoft assembly routines were intended (at least subconsciously) to be mode neutral. The challenge with architecture version 3 was that it lead to a dichotomy between SVC32 (and other supervisor and abort modes) and USR32, at least if one assumed that callee saves flags would continue to prevail in the new environment. If we had known then that callee saves flags was doomed, we may have simply redefined our APIs and removed all the S bit usage from our subroutine entry and exit veneers.

The 32-bit processor modes are significantly different in their configuration from their 26-bit counterparts, and there are very few SVC/USR-adaptable subroutine entry and exit veneers using 26-bit PC ARM compatible instruction sequences which can be directly executed in a 32-bit mode. The central reason for this is the introduction of two new registers (to facilitate the restoration and alteration of status flag and mode state), one of which is banked, and the altered interpretation of PC[27:31]. Only routines which do not attempt to modify the status flags in the 26-bit link register or PC during their exit veneer will succeed without modification.

It was apparent that every entry and exit veneer to SVC/USR-adaptable subroutines would need to be changed in order, at a minimum, to allow that code to run in 32-bit configuration. Rather than recoding every veneer by hand, it was decided to standardize subroutine entry and return by using user- defined assembler macros which would expand to either a 26-bit PC compatible sequence (to allow code to continue to operate on RISC OS) or a 32-bit PC compatible sequence (assuming SVC32 and a caller saves flags semantics). The macros created allow a macro assembler (like extASM) to produce the correct code based upon the setting of an assembly source variable which is reflects the target processor mode for the object code. The macros interrogate this "mode" variable and produce the correct entry or exit veneer.

The in-house development of AddressDevice DA at one point allowed our modified RISC OS relocatable modules to run in SVC32 mode and use callee saves flags semantics, though with some restrictions. No RISC OS modules outside Kasoft will ever run in this configuration, however, as RISC OS has since been changed so that all APIs are cast in the callee corrupts flags mould. In line with this, the macros now adapt all code sequences which use the S flag in their instructions to not use that facility when the "mode" variable is set to "USR32". To differentiate modules assembled with callee saves flags semantics from those assembled with callee corrupts flags semantics, the designation "[SVC32]" is applied to the module help string after the version date, and in the reverse, the designation "[USR32]" is applied. The designation "[SVC26]" is applied to backwards compatible modules, but these are no longer maintained.

The Kasm Solution

The Kasoft ARM Assembler (Kasm) adds three pseudo-instructions (or assembler macros or directives) to cater for the constuction of assembly routines which can ignore the processor mode in which they are running. These are SBR, which caters for subroutine entry, and RET and RTS, which cater for subroutine exit.

In some cases, the expansion of SBR, RET or RTS requires a temporary register in order to access the CPSR or SPSR. This temporary register is allocated from the pool of those designated by TEMP directives, and not subsequently removed by LOCK directives. (The TEMP and LOCK directives are borrowed from extASM.)

The directive MODE is used to specify the processor mode of the assembly source code following it. Which expansion of SBR, RET or RTS is used depends on this directive preceding any code containing any of them. e.g. MODE USR32 declares that the following code is to be executed in USR32 mode. You can simulate the MODE directive in extASM by setting up a variable "mode" by using #set mode="USR32".

Most expansions of SBR, RET and RTS utilize the stack of the callee mode. In Kasm, you can specify the type of stack (e.g. full descending) by using an STYPE directive. e.g. STYPE FD sets a full descending stack as the default stack type for all subsequent STM, LDM, SBR, RET and RTS instructions. It is the programmer's responsibility to deal with the non-atomic nature of the macro expansions in some cases. For example, the possibilities of interrupts or abort conditions affecting the SPSR need to be considered and dealt with appropriately- perhaps by disabling interrupts at the point a macro is used. The macros which manipulate the SPSR are best suited to an operating environment where the callee saves flags semantic prevails.

SBR—SuBRoutine entry

The assembly syntax is as follows:

SBR{cond} Rn!,<Rlist>

An SBR instruction has the same form as a STM instruction under the Kasm assembler, except that while the base register increment/decrement semantics are optionally specified for a STM, these are always implicit for an SBR. The definition of cond, Rn and Rlist are as given in the ARM data sheets. Writeback must always be used or it is an error.

One instruction (being STM Rn!,<Rlist>) is generated if this directive occurs within USR32 mode code.

Otherwise, a SBR instruction assembles into that same instruction in 26-bit PC mode code, and into three instructions in SVC32, IRQ32, FIQ32, ABT32, UND32 and SYS32 code. In the one instruction case, SBR is equivalent to STMDB (after an STYPE FD directive), STMIA (after an STYPE EA), STMDA (after an STYPE ED), or STMIB (after an STYPE FA) instruction of the same form.

One example of the three instruction case for a full descending stack is:

        STMFD   <as for SBR instruction>
        MRS     rtemp,SPSR
        STR     rtemp,[sp,#-4]!;                Stack SPSR
RET—subroutine RETurn

The assembly syntax is as follows:

RET{cond}{S} {Rn!,<Rlist>{^}}

A RET instruction usually has the same form as a LDM instruction under the Kasm assembler, except that while the base register increment/decrement semantics are optionally specified for a LDM, these are always implicit for a RET. The definition of cond, Rn and Rlist are as given in the ARM data sheets. Writeback must always be used and PC must appear in Rlist or it is an error. The S bit determines whether it is the SPSR or the CPSR which is returned to the caller. i.e. the S bit can only be set when returning from SVC, IRQ, FIQ, ABT32 or UND32 modes.

The S bit is ignored and one instruction (being LDM Rn!,<Rlist>) generated if this directive occurs within USR32 mode code.

Otherwise, a RET instruction assembles into one instruction if the S bit is unset, or if the S bit is set and it appears in 26-bit PC mode code, and into three instructions when it is set. In the one instruction case, RET is equivalent to LDMIA (after an STYPE FD directive), LDMDB (after an STYPE EA), LDMIB (after an STYPE ED), or LDMDA (after an STYPE FA) instruction of the same form.

One example of the three instruction case for a full descending stack is:

        LDR     rtemp,sp,#4
        MSR     SPSR,rtemp;                     Restore SPSR
        LDMFD   <as for RET instruction>

It is an error for either of these forms to include the letter "S" in the mnemonic. They must use the "^" symbol to indicate the S bit is set.

A RET instruction with no parameters restores PC from the link register and, if the S bit is set, restores the CPSR from the SPSR of the callee mode. The former case assembles to MOV PC,lr, and the latter case assembles as MOVS PC, lr.

RTS—subroutine ReTurn with flags Set

The assembly syntax is as follows:

RTS{cond}{S} {Rn!,<Rlist>{^},}<#flags>

A RTS instruction usually has a similar form to a LDM instruction under the Kasm assembler, except that while the base register increment/decrement semantics are optionally specified for a LDM, these are always implicit for a RET. One notable addition is the inclusion of an immediate constant, #flags, which is used to set the status flags. The definition of cond, Rn and Rlist are as given in the ARM data sheets. Writeback must always be used and PC must appear in Rlist or it is an error. The S bit determines whether it is the SPSR or the CPSR which is returned to the caller. i.e. the S bit can only be set when returning from SVC, IRQ, FIQ, ABT32 or UND32 modes.

The S bit is ignored and two instructions (being MSR CPSR_flg,#flags and LDM Rn!,<Rlist>) generated if this directive occurs within USR32 mode code and a register list is used.

Otherwise, a RTS instruction assembles into those same instructions if the S bit is unset, or if the S bit is set and it appears in 26-bit PC mode code, and into four instructions when it is set. In the 32-bit mode two instruction case, RTS is a MSR CPSR_flg,#flags instruction, followed LDMIA (after an STYPE FD directive), LDMDB (after an STYPE EA), LDMIB (after an STYPE ED), or LDMDA (after an STYPE FA) instruction with Rn as the base register, the writeback flag set, the S bit unset and the register list as for the RTS. In the 26-bit mode two instruction case, RTS is a LDMIA (after an STYPE FD directive), LDMDB (after an STYPE EA), LDMIB (after an STYPE ED), or LDMDA (after an STYPE FA) instruction of the same form but with lr substituted where PC appears in Rlist, followed by an ORRS PC,lr,#flags instruction. If a RTS instruction is used in a 26-bit PC mode with the S bit unset, it is an error.

One example of the four instruction case for a full descending stack is:

        LDR     rtemp,sp,#4
        ORR     rtemp,rtemp,#flags;             Change flags
        MSR     SPSR,rtemp;                     Restore SPSR with changed flags
        LDMFD   <as for RET instruction>

It is an error for either of these forms to include the letter "S" in the mnemonic. They must use the "^" symbol to indicate the S bit is set.

A RTS instruction with just an immediate constant restores PC from the link register and, if the S bit is set, restores the CPSR from the SPSR of the callee mode, but affects the CPSR of the caller by updating the flags, as indicated by #flags, in either case. The former case assembles to:

        MSR     CPSR_flg,#flags;                Change CPSR flags
        MOV     PC,lr
In 32-bit PC mode (except USR32), the latter case assembles as:
        MSR     SPSR_flg,#flags;                Change SPSR flags
        MOVS    PC,lr;                          Restore changed SPSR to CPSR
In 26-bit mode:
        ORRS    PC,lr,#flags

Appendix—Example Usage of SBR, RET and RTS

SBR{cond} Rn!,<Rlist>
e.g.    SBR     sp!,{lr};               Push link
        SBR     sp!,{lr}^;              Push link and SPSR
        SBR     sp!,{R0,lr};            Push R0 and link
        SBR     sp!,{R0,lr}^;           Push R0, link and SPSR

Note: Because extASM doesn't support macros in this form, you will need to create a class of instructions for each possible configuration of the register list (S bit can be inferred from processor mode).

e.g. (SubroutiNe entry)
SNA     one register besides link
SNB     two registers besides link
SNC     three registers besides link
SND     four registers besides link
SNE     five registers besides link
SNO     one range and one register besides link
SNP     one range and two registers besides link
SNQ     one range and three registers besides link
SNR     one range besides link
SNS     two ranges besides link
SNT     two ranges and one register besides link
SNV     one range, one register, another range, besides link
SNZ     zero registers besides link
RET{cond}{S} {Rn!,<Rlist>{^}}
e.g.    RET;                            Return to link address
        RETS;                           Restore SPSR to CPSR of calling mode and
        ;                               return to link address
        RET     sp!,{pc};               Return to popped link address
        RET     sp!,{pc}^;              Restore popped SPSR to CPSR of calling
        ;                               mode and return to popped link address
        RETVC   sp!,{R0,pc};            Pop R0 and link, then return to link 
        ;                               address
        RETVC   sp!,{R0,pc}^;           Pop R0, link and SPSR, then restore SPSR
        ;                               to CPSR of calling mode during return to
        ;                               link address

Note: Because extASM doesn't support macros in this form, you will need to create a class of instructions for each possible configuration of the register list and S bit.

e.g. (S unset- REturn)
REA     one register besides PC
REB     two registers besides PC
REC     three registers besides PC
RED     four registers besides PC
REE     five registers besides PC
REO     one range and one register besides PC
REP     one range and two registers besides PC
REQ     one range and three registers besides PC
RER     one range besides PC
RES     two ranges besides PC
RET     two ranges and one register besides PC
REV     one range, one register, another range, besides PC
REZ     zero registers besides PC

(S set- Status Return)
SRA     one register besides PC
SRB     two registers besides PC
SRC     three registers besides PC
SRD     four registers besides PC
SRE     five registers besides PC
SRO     one range and one register besides PC
SRP     one range and two registers besides PC
SRQ     one range and three registers besides PC
SRR     one range besides PC
SRS     two ranges besides PC
SRT     two ranges and one register besides PC
SRV     one range, one register, another range, besides PC
SRZ     zero registers besides PC
RTS{cond}{S} {Rn!,<Rlist>{^},}<#flags>
e.g.    RTS     #vbit;                  Return to link address and set CPSR V
        ;                               bit
        RTSS    #vbit;                  Return to link address, but set SPSR V
                                        bit before return
        RTSVS   sp!,{pc},#vbit;         Pop link, sets CPSR V bit and return to
        ;                               link address
        RTSVS   sp!,{pc}^,#vbit;        Pop link and SPSR, set SPSR V bit, then
        ;                               restore SPSR to CPSR of calling mode 
        ;                               during return to link address
        RTSVS   sp!,{r0,pc},#vbit;      Pop R0, link, sets CPSR V bit and return
        ;                               to link address
        RTSVS   sp!,{r0,pc}^,#vbit;     Pop R0, link and SPSR, set SPSR V bit,
        ;                               then restore SPSR to CPSR of calling mode
        ;                               during return to link address

Note: Because extASM doesn't support macros in this form, you will need to create a class of instructions for each possible configuration of the register list and immediate constant (S bit can be inferred from processor mode.)

e.g. (V set in immediate constant- Return with V set)
RVA     one register besides PC
RVB     two registers besides PC
RVC     three registers besides PC
RVD     four registers besides PC
RVE     five registers besides PC
RVO     one range and one register besides PC
RVP     one range and two registers besides PC
RVQ     one range and three registers besides PC
RVR     one range besides PC
RVS     two ranges besides PC
RVT     two ranges and one register besides PC
RVV     one range, one register, another range, besides PC
RVZ     zero registers besides PC
Maintained by Kade Hansson; e-mail: archer@kasoft.info; Updated on 27 January 2003; XHTML 1.0 Transitional.