特别说明:
1. 本教程是安富莱电子原创。
2. 安富莱STM32F407开发板资料已经全部开源,开源地址:地址链接
3. 当前共配套300多个实例,4套用户手册。第10章 μCOS-III在高版本MDK开启FPU方案
由于官方提供的μCOS-III移植工程中对于浮点寄存器的入栈和出栈处理是错误的,所以网上就流传了各种修正版本。但是这些修正的代码只能在MDK4.7以下版本中可以正常的运行,MDK4.7及其以上的版本无法正常运行。本期教程为此而生。本期教程提供的方案只有任务使用了浮点寄存器(也就是做了浮点运算)才需要将其入栈,没有使用浮点寄存器的任务不需要进行入栈,认识到这点很重要。此方案在MDK4.54、4.73、5.10以及IAR6.3、6.7上面测试均通过。
10.1 官方移植方案
10.2 开启FPU解决方案
10.3 开启FPU的优劣
10.4 总结
10.1 官方移植方案
官方提供的移植工程里面,只有IAR工程里面才有浮点寄存器的入栈和出栈处理函数,MDK工程里面是没有的。下面这个是os_cpu_c.c文件夹中的函数:/* ********************************************************************************************************* * INITIALIZE A TASK'S STACK * * Description: This function is called by either OSTaskCreate() orOSTaskCreateExt() to initialize the * stack frame of thetask being created. This function ishighly processor specific. * * Arguments : task is a pointer to the task code * * p_arg is a pointer to a user supplied dataarea that will be passed to the task * whenthe task first executes. * * ptos is a pointer to the top ofstack. It is assumed that 'ptos' pointsto * a'free' entry on the task stack. IfOS_STK_GROWTH is set to 1 then * 'ptos'will contain the HIGHEST valid address of the stack. Similarly, if * OS_STK_GROWTH is set to 0, the 'ptos' willcontains the LOWEST valid address * of thestack. * * opt specifies options that can be usedto alter the behavior of OSTaskStkInit(). * (see uCOS_II.H for OS_TASK_OPT_xxx). * * Returns : Always returns thelocation of the new top-of-stack once the processor registers have * been placed on thestack in the proper order. * * Note(s) : (1) Interrupts areenabled when task starts executing. * * (2) All tasks run inThread mode, using process stack. * * (3) There are twodifferent stack frames depending on whether the Floating-Point(FP) * co-processor isenabled or not. * * (a) The stack frame shown in thediagram is used when the FP co-processor is not present and * OS_TASK_OPT_SAVE_FP is disabled. In this case, the FP registers and FPStatus Control * register arenot saved in the stack frame. * * (b) If the FPco-processor is present but the OS_TASK_OPT_SAVE_FP is not set, then the stack * frame issaved as shown in diagram (a). Moreover, if OS_TASK_OPT_SAVE_FP is set, thenthe * FP registers and FP StatusControl register are saved in the stack frame. * * (1) Whenenabling the FP co-processor, make sure to clear bits ASPEN and LSPEN in the * Floating-Point Context Control Register (FPCCR). * * +------------+ +------------+ * | | | | * +------------+ +------------+ * | xPSR | | xPSR | * +------------+ +------------+ * |Return Addr| |Return Addr | * +------------+ +------------+ * | LR(R14) | | LR(R14) | * +------------+ +------------+ * | R12 | | R12 | * +------------+ +------------+ * | R3 | | R3 | * +------------+ +------------+ * | R2 | | R0 | * +------------+ +------------+ * | R1 | | R1 | * +------------+ +------------+ * | R0 | | R0 | * +------------+ +------------+ * | R11 | | R11 | * +------------+ +------------+ * | R10 | | R10 | * +------------+ +------------+ * | R9 | | R9 | * +------------+ +------------+ * | R8 | | R8 | * +------------+ +------------+ * | R7 | | R7 | * +------------+ +------------+ * | R6 | | R6 | * +------------+ +------------+ * | R5 | | R5 | * +------------+ +------------+ * | R4 | | R4 | * +------------+ +------------+ * (a) | FPSCR | * +------------+ * | S31 | * +------------+ * . * . * . * +------------+ * | S1 | +------------+ * | S0 | * +------------+ * (b) * * (4) The SP must be 8-bytealigned in conforming to the Procedure Call Standard for the ARM architecture * * (a) Section 2.1of the ABI for the ARM ArchitectureAdvisory Note. SP must be 8-byte aligned * on entry toAAPCS-Conforming functions states : * * TheProcedure Call Standard for the ARM Architecture [AAPCS] requires primitive * data typesto be naturally aligned according to their sizes (for size = 1, 2, 4, 8 bytes). * Doingotherwise creates more problems than it solves. * * In returnfor preserving the natural alignment of data, conforming code is permitted * to rely onthat alignment. To support aligning data allocated on the stack, the stack * pointer(SP) is required to be 8-byte aligned on entry to a conforming function. In * practicethis requirement is met if: * * (1) At each call site, the current size ofthe calling function抯 stack frame is a multiple of 8 bytes. * Thisplaces an obligation on compilers and assembly language programmers. * * (2) SPis a multiple of 8 when control first enters a program. * Thisplaces an obligation on authors of low level OS, RTOS, and runtime library * codeto align SP at all points at which control first enters * abody of (AAPCS-conforming) code. * * In turn,this requires the value of SP to be aligned to 0 modulo 8: * * (3) Byexception handlers, before calling AAPCS-conforming code. * * (4) ByOS/RTOS/run-time system code, before giving control to an application. * * (b) Section 2.3.1corrective steps from the the SP must be 8-byte aligned on entry * to AAPCS-conformingfunctions advisory note also states. * * " Thisrequirement extends to operating systems and run-time code for all architectureversions * prior toARMV7 and to the A, R and M architecture profiles thereafter. Specialconsiderations * associatedwith ARMV7M are discussed in ?.3.3" * * (1) Even ifthe SP 8-byte aligment is not a requirement for the ARMv7M profile, the stackis aligned * to 8-byte boundaries to support legacyexecution enviroments. * * (c) Section5.2.1.2 from the Procedure Call Standard for the ARM * architecturestates : "The stack must alsoconform to the following * constraint at a public interface: * * (1) SP mod 8 =0. The stack must be double-word aligned" * * (d) From the ARMTechnical Support Knowledge Base. 8 Byte stack aligment. * * "8 bytestack alignment is a requirement of the ARM Architecture Procedure * Call Standard[AAPCS]. This specifies that functions must maintain an 8 byte * aligned stackaddress (e.g. 0x00, 0x08, 0x10, 0x18, 0x20) on all external * interfaces.In practice this requirement is met if: * * (1) At eachexternal interface, the current stack pointer * is amultiple of 8 bytes. * * (2) Your OS maintains8 byte stack alignment on its external interfaces * e.g. ontask switches" * ********************************************************************************************************** */ OS_STK *OSTaskStkInit (void (*task)(void *p_arg), void *p_arg, OS_STK*ptos, INT16U opt) { OS_STK *p_stk; p_stk = ptos + 1u; /* Loadstack pointer */ /* Align the stack to 8-bytes. */ p_stk = (OS_STK *)((OS_STK)(p_stk) &0xFFFFFFF8u); /* Registers stacked as if auto-saved on exception */ *(--p_stk) =(OS_STK)0x01000000uL; /* xPSR */ *(--p_stk) = (OS_STK)task; /* EntryPoint */ *(--p_stk) =(OS_STK)OS_TaskReturn; /* R14 (LR) */ *(--p_stk) =(OS_STK)0x12121212uL; /* R12 */ *(--p_stk) =(OS_STK)0x03030303uL; /* R3 */ *(--p_stk) =(OS_STK)0x02020202uL; /* R2 */ *(--p_stk) =(OS_STK)0x01010101uL; /* R1 */ *(--p_stk) = (OS_STK)p_arg; /* R0 :argument */ /* Remaining registers saved on process stack */ *(--p_stk) =(OS_STK)0x11111111uL; /* R11 */ *(--p_stk) =(OS_STK)0x10101010uL; /* R10 */ *(--p_stk) =(OS_STK)0x09090909uL; /* R9 */ *(--p_stk) =(OS_STK)0x08080808uL; /* R8 */ *(--p_stk) =(OS_STK)0x07070707uL; /* R7 */ *(--p_stk) = (OS_STK)0x06060606uL; /* R6 */ *(--p_stk) =(OS_STK)0x05050505uL; /* R5 */ *(--p_stk) = (OS_STK)0x04040404uL; /* R4 */ #if (OS_CPU_ARM_FP_EN > 0u) if ((opt &OS_TASK_OPT_SAVE_FP) != (INT16U)0) { *--p_stk =(OS_STK)0x02000000u; /* FPSCR */ /* Initialize S0-S31 floating point registers */ *--p_stk =(OS_STK)0x41F80000u; /* S31 */ *--p_stk =(OS_STK)0x41F00000u; /* S30 */ *--p_stk =(OS_STK)0x41E80000u; /* S29 */ *--p_stk =(OS_STK)0x41E00000u; /* S28 */ *--p_stk = (OS_STK)0x41D80000u; /* S27 */ *--p_stk =(OS_STK)0x41D00000u; /* S26 */ *--p_stk = (OS_STK)0x41C80000u; /* S25 */ *--p_stk =(OS_STK)0x41C00000u; /* S24 */ *--p_stk =(OS_STK)0x41B80000u; /* S23 */ *--p_stk =(OS_STK)0x41B00000u; /* S22 */ *--p_stk =(OS_STK)0x41A80000u; /* S21 */ *--p_stk =(OS_STK)0x41A00000u; /* S20 */ *--p_stk =(OS_STK)0x41980000u; /* S19 */ *--p_stk =(OS_STK)0x41900000u; /* S18 */ *--p_stk =(OS_STK)0x41880000u; /* S17 */ *--p_stk =(OS_STK)0x41800000u; /* S16 */ *--p_stk =(OS_STK)0x41700000u; /* S15 */ *--p_stk =(OS_STK)0x41600000u; /* S14 */ *--p_stk =(OS_STK)0x41500000u; /* S13 */ *--p_stk =(OS_STK)0x41400000u; /* S12 */ *--p_stk =(OS_STK)0x41300000u; /* S11 */ *--p_stk =(OS_STK)0x41200000u; /* S10 */ *--p_stk =(OS_STK)0x41100000u; /* S9 */ *--p_stk =(OS_STK)0x41000000u; /* S8 */ *--p_stk =(OS_STK)0x40E00000u; /* S7 */ *--p_stk =(OS_STK)0x40C00000u; /* S6 */ *--p_stk =(OS_STK)0x40A00000u; /* S5 */ *--p_stk =(OS_STK)0x40800000u; /* S4 */ *--p_stk =(OS_STK)0x40400000u; /* S3 */ *--p_stk =(OS_STK)0x40000000u; /* S2 */ *--p_stk =(OS_STK)0x3F800000u; /* S1 */ *--p_stk = (OS_STK)0x00000000u; /* S0 */ } #endif return (p_stk); }
官方提供的这个堆栈初始化是错误的的,为什么是错误的?因为这个不符合浮点寄存器的入栈和出栈顺序。还有一部分代码在os_cpu_a.asm文件中,内容如下:
- #ifdef __ARMVFP__
- PUBLIC OS_CPU_FP_Reg_Push
- PUBLIC OS_CPU_FP_Reg_Pop
- #endif
-
- ;********************************************************************************************************
- ; EQUATES
- ;********************************************************************************************************
-
- NVIC_INT_CTRL EQU 0xE000ED04 ; Interrupt control state register.
- NVIC_SYSPRI14 EQU 0xE000ED22 ; System priority register (priority 14).
- NVIC_PENDSV_PRI EQU 0xFF ; PendSV priority value (lowest).
- NVIC_PENDSVSET EQU 0x10000000 ; Value to trigger PendSV exception.
-
-
- ;********************************************************************************************************
- ; CODE GENERATION DIRECTIVES
- ;********************************************************************************************************
-
- RSEG CODE:CODE:NOROOT(2)
- THUMB
-
- #ifdef __ARMVFP__
- ;********************************************************************************************************
- ; FLOATING POINT REGISTERS PUSH
- ; void OS_CPU_FP_Reg_Push (OS_STK *stkPtr)
- ;
- ; Note(s) : 1) This function saves S0-S31, and FPSCR registers of the Floating Point Unit.
- ;
- ; 2) Pseudo-code is:
- ; a) Get FPSCR register value;
- ; b) Push value on process stack;
- ; c) Push remaining regs S0-S31 on process stack;
- ; d) Update OSTCBCur->OSTCBStkPtr;
- ;********************************************************************************************************
-
- OS_CPU_FP_Reg_Push
- MRS R1, PSP ; PSP is process stack pointer
- CBZ R1, OS_CPU_FP_nosave ; Skip FP register save the first time
-
- VMRS R1, FPSCR
- STR R1, [R0, #-4]!
- VSTMDB R0!, {S0-S31}
- LDR R1, =OSTCBCur
- LDR R2, [R1]
- STR R0, [R2]
- OS_CPU_FP_nosave
- BX LR
-
- ;********************************************************************************************************
- ; FLOATING POINT REGISTERS POP
- ; void OS_CPU_FP_Reg_Pop (OS_STK *stkPtr)
- ;
- ; Note(s) : 1) This function restores S0-S31, and FPSCR registers of the Floating Point Unit.
- ;
- ; 2) Pseudo-code is:
- ; a) Restore regs S0-S31 of new process stack;
- ; b) Restore FPSCR reg value
- ; c) Update OSTCBHighRdy->OSTCBStkPtr pointer of new proces stack;
- ;********************************************************************************************************
-
- OS_CPU_FP_Reg_Pop
- VLDMIA R0!, {S0-S31}
- LDMIA R0!, {R1}
- VMSR FPSCR, R1
- LDR R1, =OSTCBHighRdy
- LDR R2, [R1]
- STR R0, [R2]
- BX LR
- #endif
复制代码
如果不理解为什么这个浮点寄存的入栈和出栈是错误的,需要认真学习一下第5章:任务切换设计。第5章对于这个问题有深入的讲解。
10.2 开启FPU解决方案
为了解决FPU的问题,有两个函数需要修改:一个是CPU_STK *OSTaskStkInit(),另一个是PendSV中断。
10.2.1 修改函数CPU_STK *OSTaskStkInit()
函数所在的位置如下:
下面是修改后的内容:- CPU_STK *OSTaskStkInit (OS_TASK_PTR p_task,
- void *p_arg,
- CPU_STK *p_stk_base,
- CPU_STK *p_stk_limit,
- CPU_STK_SIZE stk_size,
- OS_OPT opt)
- {
- CPU_STK *p_stk;
-
-
- (void)opt; /* Prevent compiler warning */
-
- p_stk = &p_stk_base[stk_size]; /* Load stack pointer */
- /* Align the stack to 8-bytes. */
- p_stk = (CPU_STK *)((CPU_STK)(p_stk) & 0xFFFFFFF8);
- /* Registers stacked as if auto-saved on exception */
-
- *--p_stk = (CPU_STK)0x01000000u; /* xPSR */
- *--p_stk = (CPU_STK)p_task; /* Entry Point */
- *--p_stk = (CPU_STK)OS_TaskReturn; /* R14 (LR) */
- *--p_stk = (CPU_STK)0x12121212u; /* R12 */
- *--p_stk = (CPU_STK)0x03030303u; /* R3 */
- *--p_stk = (CPU_STK)0x02020202u; /* R2 */
- *--p_stk = (CPU_STK)p_stk_limit; /* R1 */
- *--p_stk = (CPU_STK)p_arg; /* R0 : argument */
- /* Remaining registers saved on process stack */
- *--p_stk = (CPU_STK)0x11111111u; /* R11 */
- *--p_stk = (CPU_STK)0x10101010u; /* R10 */
- *--p_stk = (CPU_STK)0x09090909u; /* R9 */
- *--p_stk = (CPU_STK)0x08080808u; /* R8 */
- *--p_stk = (CPU_STK)0x07070707u; /* R7 */
- *--p_stk = (CPU_STK)0x06060606u; /* R6 */
- *--p_stk = (CPU_STK)0x05050505u; /* R5 */
- *--p_stk = (CPU_STK)0x04040404u; /* R4 */
-
- *--p_stk = (CPU_STK)0xFFFFFFFDUL; (1)
-
- return (p_stk);
- }
复制代码
1. 这句话最重要,这里是将EXC_RETURN也进行了入栈处理。关于EXC_RETURN在前面4.2.5 特殊功能寄存器讲解。这里要补充一点,对于M4内核,EXC_RETURN的bit4也是有意义的。
当bit4 = 1时,8个寄存器自动入栈,还有8个寄存器需要手动入栈
当bit4 = 0时,18个浮点寄存器+8个寄存器自动入栈,还有16个浮点寄存器+8个寄存器需要手动入栈。
这些寄存器在第4章和第5章有详细的讲解,这里就不再赘述了。
10.2.2 修改函数OS_CPU_PendSVHandler
函数所在的位置如下:
PendSV中断需要修改的地方如下:- OS_CPU_PendSVHandler
- CPSID I ; Prevent interruption during context switch
- MRS R0, PSP ; PSP is process stack pointer
- CBZ R0, OS_CPU_PendSVHandler_nosave ; Skip register save the first time
-
- TST LR, #0x10 (1)
- IT EQ
- VSTMDBEQ R0!, {S16-S31}
-
- MOV R3, LR (2)
- STMDB R0!,{R3-R11}
-
- LDR R1, =OSTCBCurPtr ; OSTCBCurPtr->OSTCBStkPtr = SP;
- LDR R1, [R1]
- STR R0, [R1] ; R0 is SP of process being switched out
-
- ; At this point, entire context of process has been saved
- OS_CPU_PendSVHandler_nosave
- PUSH {R14} ; Save LR exc_return value
- LDR R0, =OSTaskSwHook ; OSTaskSwHook();
- BLX R0
- POP {R14}
-
- LDR R0, =OSPrioCur ; OSPrioCur = OSPrioHighRdy;
- LDR R1, =OSPrioHighRdy
- LDRB R2, [R1]
- STRB R2, [R0]
-
- LDR R0, =OSTCBCurPtr ; OSTCBCurPtr = OSTCBHighRdyPtr;
- LDR R1, =OSTCBHighRdyPtr
- LDR R2, [R1]
- STR R2, [R0]
- LDR R0, [R2] ; R0 is new process SP; SP = OSTCBHighRdyPtr->StkPtr;
-
-
- LDMIA R0!,{R3-R11} (3)
- MOV LR, R3
-
- TST LR, #0x10 (4)
- IT EQ
- VLDMIAEQ R0!, {S16-S31}
-
- MSR PSP, R0 ; Load PSP with new process SP
-
- CPSIE I
- BX LR ; Exception return will restore remaining context
-
- END
复制代码
1. 通过检测EXC_RETURN(LR)的bit4来看这个任务是否使用了浮点寄存器,如果使用了需要将剩余的16个浮点寄存器入栈。
TST LR, #0x10:
TST指令通常与EQ,NE条件码配合使用。当所有测试位均为0时,EQ有效。而只要有一个测试位不为0,则NE有效。这里LR和0x10的值按位作逻辑“与”操作。
IT EQ :
这里IT指令就是IF-THEN的缩写。IF‐THEN(IT)指令围起一个块,里面最多有4条指令,它里面的指令可以条件执行。 IT已经带了一个“T”,因此还可以最多再带3个“T”或者“E”。并且对T和E的顺序没有要求。其中T对应条件成立时执行的语句,E对应条件不成立时执行的语句。在If‐Then块中的指令必须加上条件后缀,且T对应的指令必须使用和IT指令中相同的条件,E对应的指令必须使用和IT指令中相反的条件。
IT的使用形式总结如下:
IT<cond> ;围起1条指令的IF-THEN块
IT<x><cond> ;围起2条指令的IF-THEN块
IT<x><y> <cond> ;围起3条指令的IF-THEN块
IT<x><y><z><cond> ;围起4条指令的IF-THEN块
其中<x>, <y>,<z>的取值可以是“T”或者“E”。而<cond>则是在下表中列出的条件(AL除外)。
这么说还不够形象,下面举一个简单的例子,用IT指令优化C伪代码:
if(R0==R1)
{
R3= R4 + R5;
R3= R3 / 2;
}
else
{
R3= R6 + R7;
R3= R3 / 2;
}
可以写作:
CMP R0, R1 ; 比较R0和R1
ITTEEEQ ; 如果R0== R1, Then-Then-Else-Else
ADDEQR3,R4, R5 ; 相等时加法
ASREQR3,R3, #1 ; 相等时算术右移
ADDNER3,R6, R7 ; 不等时加法
ASRNER3,R3, #1 ; 不等时算术右移
有了上面这些基础知识后再看下面的指令。
VSTMDBEQ R0!, {S16-S31}:
结合上面的IT EQ,这里的意思就是 if LR & 0x10 == 0 then VSTMDB R0!, {S16-S31}。也就是咱们前面所说的如果EXC_RETURN(LR)的bit4= 0就表示使用浮点寄存器了,这里需要入栈。
2. 这里比较好理解,只不过也将EXC_RETURN进行了入栈。
3. 参考上面的第二条,只不过这里是出栈。
4. 参考上面的第一条,只不过这里是出栈。
10.2.3 开启FPU
修改了上面的两个地方后别忘了开启FPU:
通过上面这三部就完成了相关的修改。
10.3 开启FPU的优劣
开启FPU的好处就是加快浮点运算的执行速度,缺点就是增加任务堆栈的大小,因为34个浮点寄存器也需要入栈。同时也增加了任务的切换时间。下面是在μCOS-III的GUI任务和启动任务中使用了浮点运算的对比(App Task GUI和App Task Start)。
第一个截图是开启了FPU:
第二个截图是没有开启FPU:
特别对比下使用了浮点运行的两个任务。
10.4 总结
本期教程提供的方案在MDK4.54、4.73、5.10以及IAR6.3、6.7上面测试均通过,用户只需按照上面讲的三个地方做修改即可。 |