Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. Advice on designing a simple task scheduler in embedded C, that can handle asynchronuous tasks?

Advice on designing a simple task scheduler in embedded C, that can handle asynchronuous tasks?

Scheduled Pinned Locked Moved C / C++ / MFC
hardwareadobebusinesstutorialquestion
8 Posts 3 Posters 1 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A Offline
    A Offline
    arnold_w
    wrote on last edited by
    #1

    I am working on an embedded product with an ARM-processor and I need to come up with an approach on how scheduling of tasks is going to work and then implement it. The following requirements apply:

    • Standard C (no assembler).
    • Can run on "any" microcontroller (of course, it needs to have a certain amount of RAM and flash).
    • No use of malloc.
    • If running on ARM and there's nothing to do and all timers are turned off, then it should enter stop mode (=system clock is turned off).
    • If running on ARM and there's nothing to do but at least one timer is running, then it should enter sleep mode (=some peripherals are turned off, but the system clock and timers are still working).
    • Needs to have some sort of prioritization, for example, low-level driver tasks needs to be handled before higher level tasks (such as tasks that arisen due to, for example, a push button pressed event) may be handled.
    • It needs to, in some way, support calling of asynchronous functions, such as starting a DMA-transfer and then you get an interrupt when it's finished and in between it should execute other tasks.

    My project is fairly limited in scope (not a lot things going on simultaneously), but I'm still struggling a bit to meet all the requirements. I will share below some of my thoughts. I was thinking of something like a modified round-robin scheduler with priorities with a number of queues:

    void (*taskFunctionPtr_t)(uint32_t arg1, uint32_t arg2)
    enum queueSelector_e {LOW_LEVEL_DRIVERS, USER_EVENTS};
    addTaskToQueue(taskFunctionPtr_t, queueSelector_e);

    In main.c:

    while (TRUE) {
    if (0 < numQueuedTasks(LOW_LEVEL_DRIVERS)) {
    handleTask(LOW_LEVEL_DRIVERS);
    continue;
    }
    if (0 < numQueuedTasks(USER_EVENTS)) {
    handleTask(USER_EVENTS);
    continue;
    }
    if (anyTimerRunning()) {
    enterIdleMode(); // System clock is still running, but some peripherals (not the timers!) are turned off
    } else {
    enterStopMode(); // System clock is turned off and can only be woken up user event (external int)
    }
    }

    In order to support asynchronous tasks, I probably need to make my tasks and queues more advanced:

    void (*asynchronousTaskFinishedFunctionPtr)(asynchronousTaskResult_e);
    void someAsynchronousFunction(asynchronousTaskFinishedFunctionPtr);
    enum asyncTaskStatus_e {TASK_NOT_STARTED, TASK_IN_WAIT_PHASE, TASK_FINISHED_NEEDS_REPEAT_DUE_TO_ERROR, TASK_SUCCESSFULLY_FIN

    L S 2 Replies Last reply
    0
    • A arnold_w

      I am working on an embedded product with an ARM-processor and I need to come up with an approach on how scheduling of tasks is going to work and then implement it. The following requirements apply:

      • Standard C (no assembler).
      • Can run on "any" microcontroller (of course, it needs to have a certain amount of RAM and flash).
      • No use of malloc.
      • If running on ARM and there's nothing to do and all timers are turned off, then it should enter stop mode (=system clock is turned off).
      • If running on ARM and there's nothing to do but at least one timer is running, then it should enter sleep mode (=some peripherals are turned off, but the system clock and timers are still working).
      • Needs to have some sort of prioritization, for example, low-level driver tasks needs to be handled before higher level tasks (such as tasks that arisen due to, for example, a push button pressed event) may be handled.
      • It needs to, in some way, support calling of asynchronous functions, such as starting a DMA-transfer and then you get an interrupt when it's finished and in between it should execute other tasks.

      My project is fairly limited in scope (not a lot things going on simultaneously), but I'm still struggling a bit to meet all the requirements. I will share below some of my thoughts. I was thinking of something like a modified round-robin scheduler with priorities with a number of queues:

      void (*taskFunctionPtr_t)(uint32_t arg1, uint32_t arg2)
      enum queueSelector_e {LOW_LEVEL_DRIVERS, USER_EVENTS};
      addTaskToQueue(taskFunctionPtr_t, queueSelector_e);

      In main.c:

      while (TRUE) {
      if (0 < numQueuedTasks(LOW_LEVEL_DRIVERS)) {
      handleTask(LOW_LEVEL_DRIVERS);
      continue;
      }
      if (0 < numQueuedTasks(USER_EVENTS)) {
      handleTask(USER_EVENTS);
      continue;
      }
      if (anyTimerRunning()) {
      enterIdleMode(); // System clock is still running, but some peripherals (not the timers!) are turned off
      } else {
      enterStopMode(); // System clock is turned off and can only be woken up user event (external int)
      }
      }

      In order to support asynchronous tasks, I probably need to make my tasks and queues more advanced:

      void (*asynchronousTaskFinishedFunctionPtr)(asynchronousTaskResult_e);
      void someAsynchronousFunction(asynchronousTaskFinishedFunctionPtr);
      enum asyncTaskStatus_e {TASK_NOT_STARTED, TASK_IN_WAIT_PHASE, TASK_FINISHED_NEEDS_REPEAT_DUE_TO_ERROR, TASK_SUCCESSFULLY_FIN

      L Offline
      L Offline
      leon de boer
      wrote on last edited by
      #2

      All of your code you have done is a co-operative system for the situation of longer tasks you need to do pre-emptive context switches and for the asynchronous order problem you need semaphores,mutex or spinlock implementation. For pre-emptive context switching you usually have a timer interrupt it triggers 10-1000 times a second. When the interrupt triggers it looks to see what task should run next in a scheduler scheme, it then forcibly saves every cpu register and every fpu register to a context stack so the program can resume from that point. Next it loads the cpu registers and fpu registers for the next task to run from a context stack and then it proceeds to run that task until a subsequent timer tick stops it passing control to another task. You will require critical section process which is small areas of code when it enters the task switcher can not interrupt and often that is as simple as switching off interrupts so the timer interrupt doesn't fire. Semaphores or spinlocks are the usual way of dealing with sharing the I/O functionality. So in your DMA example the use of the DMA would have an aquire and release process. So any function wanting to do a DMA transfer would first ask for ownership of the DMA, if the DMA is in use it will already be aquired and the caller must wait for whoever has it to release. So anyone that has the DMA then releases and the act of doing so then allows waiting callers to then execute so it looks like.

      void DMA_Write ( /.* some variables */)
      {
      SpinLock_Acquire ();
      ActualDMAFunction(/.* some variables */);
      SpinLock_Release ();
      }

      For the processor you are on if you search for "ARM Synchronization Primitives" and ARM will have a whitepaper on typical setups and minimum requirements. It will usually involve the use of the special opcodes LDREX/STREX (as well as WFE/WFI on multicore systems if you want to sleep the core while waiting). You will also find the code for the context switch by a simple search of "ARM Context switch" with the processor name and you will get an ARM white paper describing it. They will all generally work on entering the call with one register pointing to the where to save all the current registers and another to where to load the registers from. There are some really simple AVR task switcher projects which would be easy to adapt .. you simply need to change the context_switch assembler and replace the ATOMIC_BLOCK store and release with the interrupt enable/disable (it is all it generally does on smaller AVR processo

      A 1 Reply Last reply
      0
      • L leon de boer

        All of your code you have done is a co-operative system for the situation of longer tasks you need to do pre-emptive context switches and for the asynchronous order problem you need semaphores,mutex or spinlock implementation. For pre-emptive context switching you usually have a timer interrupt it triggers 10-1000 times a second. When the interrupt triggers it looks to see what task should run next in a scheduler scheme, it then forcibly saves every cpu register and every fpu register to a context stack so the program can resume from that point. Next it loads the cpu registers and fpu registers for the next task to run from a context stack and then it proceeds to run that task until a subsequent timer tick stops it passing control to another task. You will require critical section process which is small areas of code when it enters the task switcher can not interrupt and often that is as simple as switching off interrupts so the timer interrupt doesn't fire. Semaphores or spinlocks are the usual way of dealing with sharing the I/O functionality. So in your DMA example the use of the DMA would have an aquire and release process. So any function wanting to do a DMA transfer would first ask for ownership of the DMA, if the DMA is in use it will already be aquired and the caller must wait for whoever has it to release. So anyone that has the DMA then releases and the act of doing so then allows waiting callers to then execute so it looks like.

        void DMA_Write ( /.* some variables */)
        {
        SpinLock_Acquire ();
        ActualDMAFunction(/.* some variables */);
        SpinLock_Release ();
        }

        For the processor you are on if you search for "ARM Synchronization Primitives" and ARM will have a whitepaper on typical setups and minimum requirements. It will usually involve the use of the special opcodes LDREX/STREX (as well as WFE/WFI on multicore systems if you want to sleep the core while waiting). You will also find the code for the context switch by a simple search of "ARM Context switch" with the processor name and you will get an ARM white paper describing it. They will all generally work on entering the call with one register pointing to the where to save all the current registers and another to where to load the registers from. There are some really simple AVR task switcher projects which would be easy to adapt .. you simply need to change the context_switch assembler and replace the ATOMIC_BLOCK store and release with the interrupt enable/disable (it is all it generally does on smaller AVR processo

        A Offline
        A Offline
        arnold_w
        wrote on last edited by
        #3

        How is this different from using an off-the-shelf RTOS such as FreeRTOS?

        L 1 Reply Last reply
        0
        • A arnold_w

          How is this different from using an off-the-shelf RTOS such as FreeRTOS?

          L Offline
          L Offline
          leon de boer
          wrote on last edited by
          #4

          It's similar if you look only at the very basic Kernel but things can be much more complex, painful and they may already have known problems Some examples 1.) ARM Security/protection ring system usually at least two levels EL0 and EL1 and possibly EL2 & EL3 on an ARM8. As an example FreeRTOS was not designed with those in mind, none of the existing ports will utilize the scheme with any intelligence. 2.) Multicore support on ARM cpus like ARM7 & ARM8. You usually have to bring the MMU up to get cache coherency and that often means memory virtualization will need to be operating. For example on Free-RTOS I know of only a simple SMP port on a multicore system, no AMP and no BMP implementations. So FreeRTOS was not designed in an era with multicore support in mind. 3.) Multiple implementations which adds a whole complexity as your implementation becomes one of x number and not the most efficient or best suited to your CPU. You may struggle with massive code complexity on code that may not even compile in your implementation If you are on a multicore cpu like a cortex-a53 points 1 & 2 will probably cause you a great deal of problem. That is why there are no real fully functional ports of Free-RTOS on things like Raspberry Pi 3 and beagle board blacks (there are a couple of half functional ports). So Free-RTOS is easy to port for simple single CPU it is harder to port and often at odds with how you want to operate complex multicore CPU's. Hence we come back to the problem what CPU are you talking about? I would also add FreeRTOS has some obvious shortcomings of features you may like, like no concept of task aging etc (Aging (scheduling) - Wikipedia[^] As with everything you are better of playing and working out what is a good system rather than mindlessly porting some system and not understanding it properly. So I might for example port FreeRTOS on a simple ARM5 or 6 cpu but I wouldn't bother about it on an ARM8 the later being far to different for what Free-RTOS was designed around. Again I say if we know what CPU we are dealing with it is easier to make more detailed and helpful suggestions.

          In vino veritas

          A 1 Reply Last reply
          0
          • L leon de boer

            It's similar if you look only at the very basic Kernel but things can be much more complex, painful and they may already have known problems Some examples 1.) ARM Security/protection ring system usually at least two levels EL0 and EL1 and possibly EL2 & EL3 on an ARM8. As an example FreeRTOS was not designed with those in mind, none of the existing ports will utilize the scheme with any intelligence. 2.) Multicore support on ARM cpus like ARM7 & ARM8. You usually have to bring the MMU up to get cache coherency and that often means memory virtualization will need to be operating. For example on Free-RTOS I know of only a simple SMP port on a multicore system, no AMP and no BMP implementations. So FreeRTOS was not designed in an era with multicore support in mind. 3.) Multiple implementations which adds a whole complexity as your implementation becomes one of x number and not the most efficient or best suited to your CPU. You may struggle with massive code complexity on code that may not even compile in your implementation If you are on a multicore cpu like a cortex-a53 points 1 & 2 will probably cause you a great deal of problem. That is why there are no real fully functional ports of Free-RTOS on things like Raspberry Pi 3 and beagle board blacks (there are a couple of half functional ports). So Free-RTOS is easy to port for simple single CPU it is harder to port and often at odds with how you want to operate complex multicore CPU's. Hence we come back to the problem what CPU are you talking about? I would also add FreeRTOS has some obvious shortcomings of features you may like, like no concept of task aging etc (Aging (scheduling) - Wikipedia[^] As with everything you are better of playing and working out what is a good system rather than mindlessly porting some system and not understanding it properly. So I might for example port FreeRTOS on a simple ARM5 or 6 cpu but I wouldn't bother about it on an ARM8 the later being far to different for what Free-RTOS was designed around. Again I say if we know what CPU we are dealing with it is easier to make more detailed and helpful suggestions.

            In vino veritas

            A Offline
            A Offline
            arnold_w
            wrote on last edited by
            #5

            leon de boer wrote:

            Again I say if we know what CPU we are dealing with it is easier to make more detailed and helpful suggestions.

            Single core (STM32F4 and STM32F7). Since I'm working with rather basic microcontrollers and my applications really aren't that heavy, it's seems a bit overkill to do the context switching like an RTOS would do.

            L 1 Reply Last reply
            0
            • A arnold_w

              leon de boer wrote:

              Again I say if we know what CPU we are dealing with it is easier to make more detailed and helpful suggestions.

              Single core (STM32F4 and STM32F7). Since I'm working with rather basic microcontrollers and my applications really aren't that heavy, it's seems a bit overkill to do the context switching like an RTOS would do.

              L Offline
              L Offline
              leon de boer
              wrote on last edited by
              #6

              You can't reach your goals without some switcher, I believe you already worked that our yourself it gets more and more complex. There is actually less code in writing the switcher it's a couple hundred lines of code and it's faster as your CPU is designed to context switch :-) From "The Definitive Guide to ARM Cortex M3 and Cortex-M4" that is a 4 task round robin switcher in 150 lines of code. You just need to change the scheduler (SysTick_Handler) to a priority based one .. you will find dozens on the net You can extend the number of task etc, the code is very very obvious and easier than your current scheme you have reached.

              #include "stm32f4xx.h"
              // Keil::Device:STM32Cube HAL:Common

              #define LED0 (1<<7)
              #define LED1 (1<<8)
              #define LED2 (1<<9)
              #define LED3 (1<<10)
              
              /\* Macros for word accesses \*/
              #define HW32\_REG(ADDRESS) (\*((volatile unsigned long \*)(ADDRESS)))
              /\* Use Breakpoint to stop when error is detected
              (KEIL MDK specific intrinsic) \*/
              /\* it can be changed to while(1) XXif needed \*/
              #define stop\_cpu \_\_breakpoint(0)
              void LED\_initialize(void); // Initialize LED
              void task0(void); // Toggle LED0
              void task1(void); // Toggle LED1
              void task2(void); // Toggle LED2
              void task3(void); // Toggle LED3
              // Event to tasks
              volatile uint32\_t systick\_count=0;
              // Stack for each task (8Kbytes each - 1024 x 8 bytes)
              long long task0\_stack\[1024\], task1\_stack\[1024\],
              task2\_stack\[1024\], task3\_stack\[1024\];
              // Data use by OS
              uint32\_t curr\_task=0; // Current task
              uint32\_t next\_task=1; // Next task
              uint32\_t PSP\_array\[4\]; // Process Stack Pointer for each task
              // -------------------------------------------------------------
              // Start of main program
              int main(void)
              {
              SCB->CCR |= SCB\_CCR\_STKALIGN\_Msk; // Enable double word stack alignment
              //(recommended in Cortex-M3 r1p1, default in Cortex-M3 r2px and Cortex-M4)
              LED\_initialize();
              // Starting the task scheduler
              // Create stack frame for task0
              PSP\_array\[0\] = ((unsigned int) task0\_stack)
              + (sizeof task0\_stack) - 16\*4;
              HW32\_REG((PSP\_array\[0\] + (14<<2))) = (unsigned long) task0;
              // initial Program Counter
              HW32\_REG((PSP\_array\[0\] + (15<<2))) = 0x01000000; // initial xPSR
              // Create stack frame for task1
              PSP\_array\[1\] = ((unsigned int) task1\_stack)
              + (sizeof task1\_stack) - 16\*4;
              HW32\_REG((PSP\_array\[1\] + (14<<2))) = (unsigned long) task1;
              // initial Program Counter
              HW32
              
              A 1 Reply Last reply
              0
              • L leon de boer

                You can't reach your goals without some switcher, I believe you already worked that our yourself it gets more and more complex. There is actually less code in writing the switcher it's a couple hundred lines of code and it's faster as your CPU is designed to context switch :-) From "The Definitive Guide to ARM Cortex M3 and Cortex-M4" that is a 4 task round robin switcher in 150 lines of code. You just need to change the scheduler (SysTick_Handler) to a priority based one .. you will find dozens on the net You can extend the number of task etc, the code is very very obvious and easier than your current scheme you have reached.

                #include "stm32f4xx.h"
                // Keil::Device:STM32Cube HAL:Common

                #define LED0 (1<<7)
                #define LED1 (1<<8)
                #define LED2 (1<<9)
                #define LED3 (1<<10)
                
                /\* Macros for word accesses \*/
                #define HW32\_REG(ADDRESS) (\*((volatile unsigned long \*)(ADDRESS)))
                /\* Use Breakpoint to stop when error is detected
                (KEIL MDK specific intrinsic) \*/
                /\* it can be changed to while(1) XXif needed \*/
                #define stop\_cpu \_\_breakpoint(0)
                void LED\_initialize(void); // Initialize LED
                void task0(void); // Toggle LED0
                void task1(void); // Toggle LED1
                void task2(void); // Toggle LED2
                void task3(void); // Toggle LED3
                // Event to tasks
                volatile uint32\_t systick\_count=0;
                // Stack for each task (8Kbytes each - 1024 x 8 bytes)
                long long task0\_stack\[1024\], task1\_stack\[1024\],
                task2\_stack\[1024\], task3\_stack\[1024\];
                // Data use by OS
                uint32\_t curr\_task=0; // Current task
                uint32\_t next\_task=1; // Next task
                uint32\_t PSP\_array\[4\]; // Process Stack Pointer for each task
                // -------------------------------------------------------------
                // Start of main program
                int main(void)
                {
                SCB->CCR |= SCB\_CCR\_STKALIGN\_Msk; // Enable double word stack alignment
                //(recommended in Cortex-M3 r1p1, default in Cortex-M3 r2px and Cortex-M4)
                LED\_initialize();
                // Starting the task scheduler
                // Create stack frame for task0
                PSP\_array\[0\] = ((unsigned int) task0\_stack)
                + (sizeof task0\_stack) - 16\*4;
                HW32\_REG((PSP\_array\[0\] + (14<<2))) = (unsigned long) task0;
                // initial Program Counter
                HW32\_REG((PSP\_array\[0\] + (15<<2))) = 0x01000000; // initial xPSR
                // Create stack frame for task1
                PSP\_array\[1\] = ((unsigned int) task1\_stack)
                + (sizeof task1\_stack) - 16\*4;
                HW32\_REG((PSP\_array\[1\] + (14<<2))) = (unsigned long) task1;
                // initial Program Counter
                HW32
                
                A Offline
                A Offline
                arnold_w
                wrote on last edited by
                #7

                Thank you for your very elaborate answers.

                1 Reply Last reply
                0
                • A arnold_w

                  I am working on an embedded product with an ARM-processor and I need to come up with an approach on how scheduling of tasks is going to work and then implement it. The following requirements apply:

                  • Standard C (no assembler).
                  • Can run on "any" microcontroller (of course, it needs to have a certain amount of RAM and flash).
                  • No use of malloc.
                  • If running on ARM and there's nothing to do and all timers are turned off, then it should enter stop mode (=system clock is turned off).
                  • If running on ARM and there's nothing to do but at least one timer is running, then it should enter sleep mode (=some peripherals are turned off, but the system clock and timers are still working).
                  • Needs to have some sort of prioritization, for example, low-level driver tasks needs to be handled before higher level tasks (such as tasks that arisen due to, for example, a push button pressed event) may be handled.
                  • It needs to, in some way, support calling of asynchronous functions, such as starting a DMA-transfer and then you get an interrupt when it's finished and in between it should execute other tasks.

                  My project is fairly limited in scope (not a lot things going on simultaneously), but I'm still struggling a bit to meet all the requirements. I will share below some of my thoughts. I was thinking of something like a modified round-robin scheduler with priorities with a number of queues:

                  void (*taskFunctionPtr_t)(uint32_t arg1, uint32_t arg2)
                  enum queueSelector_e {LOW_LEVEL_DRIVERS, USER_EVENTS};
                  addTaskToQueue(taskFunctionPtr_t, queueSelector_e);

                  In main.c:

                  while (TRUE) {
                  if (0 < numQueuedTasks(LOW_LEVEL_DRIVERS)) {
                  handleTask(LOW_LEVEL_DRIVERS);
                  continue;
                  }
                  if (0 < numQueuedTasks(USER_EVENTS)) {
                  handleTask(USER_EVENTS);
                  continue;
                  }
                  if (anyTimerRunning()) {
                  enterIdleMode(); // System clock is still running, but some peripherals (not the timers!) are turned off
                  } else {
                  enterStopMode(); // System clock is turned off and can only be woken up user event (external int)
                  }
                  }

                  In order to support asynchronous tasks, I probably need to make my tasks and queues more advanced:

                  void (*asynchronousTaskFinishedFunctionPtr)(asynchronousTaskResult_e);
                  void someAsynchronousFunction(asynchronousTaskFinishedFunctionPtr);
                  enum asyncTaskStatus_e {TASK_NOT_STARTED, TASK_IN_WAIT_PHASE, TASK_FINISHED_NEEDS_REPEAT_DUE_TO_ERROR, TASK_SUCCESSFULLY_FIN

                  S Offline
                  S Offline
                  supercat9
                  wrote on last edited by
                  #8

                  I've done cooperative multitaskers on a number of platforms (not ARM, but I don't think it should present any unusual problems) and found them to be extremely handy in cases where the set of tasks is fixed. Even if a task would spend most of its time in a loop:

                  while(!x\_ready)
                    task\_spin();
                  

                  the cost to switch to the task, check x_ready, and then switch to the next task may be less than the cost of a more complicated scheduler trying to decide if it should switch to that task. The biggest design issue with a cooperative task switcher is deciding what invariants are going to hold any time code does a task_spin(). While a preemptive multitasker would require that functions acquire locks before breaking any invariant even temporarily, and re-establish the invariant before releasing the lock, cooperative task switching doesn't require that. More significantly, it doesn't require that tasks do anything special to handle the fact that a lock isn't available. The gotcha is that if an invariant can't be upheld during some operation, and the time required to perform that operation could grow beyond the maximum amount of time one wants to go without a task_spin(), handling that situation may be complicated.

                  1 Reply Last reply
                  0
                  Reply
                  • Reply as topic
                  Log in to reply
                  • Oldest to Newest
                  • Newest to Oldest
                  • Most Votes


                  • Login

                  • Don't have an account? Register

                  • Login or register to search.
                  • First post
                    Last post
                  0
                  • Categories
                  • Recent
                  • Tags
                  • Popular
                  • World
                  • Users
                  • Groups