Learning operating system development using Linux kernel and Raspberry Pi

View on GitHub

4.4: Scheduler

We have already learned a lot of details about the Linux scheduler inner workings, so there is not so much left for us. To make the whole picture complete in this chapter we will take a look at 2 important scheduler entry points:

  1. scheduler_tick() function, which is called at each timer interrupt.
  2. schedule() function, which is called each time when the current task needs to be rescheduled.

The third major thing that we are going to investigate in this chapter is the concept of context switch. A context switch is the process that suspends the current task and runs another task instead - this process is highly architecture specific and closely correlates with what we have been doing when working with RPi OS.


This function is important for 2 reasons:

  1. It provides a way for the scheduler to update time statistics and runtime information for the current task.
  2. Runtime information then is used to determine whether the current task needs to be preempted, and if so schedule() function is called.

As well as most of the previously explored functions, scheduler_tick is too complex to be fully explained - instead, as usual, I will just highlight the most important parts.

  1. The main work is done inside CFS method task_tick_fair. This method calls entity_tick for the sched_entity corresponding to the current task. When looking at the source code, you may be wondering why instead of just calling entry_tick for the current sched_entry, for_each_sched_entity macro is used instead? for_each_sched_entity doesn’t iterate over all sched_entry in the system. Instead, it only traverses the sched_entry inheritance tree up to the root. This is useful when tasks are grouped - after updating runtime information for a particular task, sched_entry corresponding to the whole group is also updated.

  2. entity_tick does 2 main things:

    • Calls update_curr, which is responsible for updating task’s vruntime as well as runqueue’s min_vruntime. An important thing to remember here is that vruntime is always based on 2 things: how long task has actually been executed and tasks priority.
    • Calls check_preempt_tick, which checks whether the current task needs to be preempted. Preemption happens in 2 cases:
    1. If the current task has been running for too long (the comparison is made using normal time, not vruntime). link
    2. If there is a task with smaller vruntime and the difference between vruntime values is greater than some threshold. link

    In both cases the current task is marked for preemption by calling resched_curr function.

We have already seen in the previous chapter how calling resched_curr leads to TIF_NEED_RESCHED flag being set for the current task and eventually schedule being called.

That’s it about schedule_tick now we are finally ready to take a look at the schedule function.


We have already seen so many examples were schedule is used, so now you are probably anxious to see how this function actually works. You will be surprised to know that the internals of this function are rather simple.

  1. The main work is done inside __schedule function.
  2. __schedule calls pick_next_task which redirect most of the work to the pick_next_task_fair method of the CFS scheduler.
  3. As you might expect pick_next_task_fair in a normal case just selects the leftmost element from the red-black tree and returns it. It happens here.
  4. __schedule calls context_switch, which does some preparation work and calls architecture specific __switch_to function, where low-level arch specific task parameters are prepared to the switch.
  5. __switch_to first switches some additional task components, like, for example, TLS (Thread-local Store) and saved floating point and NEON registers.
  6. Actual switch takes place in the assembler cpu_switch_to function. This function should be already familiar to you because I copied it almost without any changes to the RPi OS. As you might remember, this function switches callee-saved registers and task stack. After it returns, the new task will be running using its own kernel stack.


Now we are done with the Linux scheduler. The good thing is that it appears to be not so difficult if you focus only on the very basic workflow. After you understand the basic workflow you probably might want to to make another path through the schedule code and pay more attention to the details, because there are so many of them. But for now, we are happy with our current understanding and ready to move to the following lesson, which describes user processes and system calls.

Previous Page

4.3 Process scheduler: Forking a task

Next Page

4.5 Process scheduler: Exercises