Learning operating system development using Linux kernel and Raspberry Pi

View on GitHub

4.3: Forking a task

Scheduling is all about selecting a proper task to run from the list of available tasks. But before the scheduler will be able to do its job we need to somehow fill this list. The way in which new tasks can be created is the main topic of this chapter.

For now, we want to focus only on kernel threads and postpone the discussion of user-mode functionality till the next lesson. However, not everywhere it will be possible, so be prepared to learn a little bit about executing tasks in user mode as well.

Init task

When the kernel is started there is a single task running: init task. The corresponding task_struct is defined here and is initialized by INIT_TASK macro. This task is critical for the system because all other tasks in the system are derived from it.

Creating new tasks

In Linux it is not possible to create a new task from scratch - instead, all tasks are forked from a currently running task. Now, as we’ve seen from were the initial task came from, we can try to explore how new tasks can be created from it.

There are 4 ways in which a new task can be created.

  1. fork system call creates a full copy of the current process, including its virtual memory and it is used to create new processes (not threads). This syscall is defined here.
  2. vfork system call is similar to fork but it differs in that the child reuses parent virtual memory as well as stack, and the parent is blocked until the child finished execution. The definition of this syscall can be found here.
  3. clone system call is the most flexible one - it also copies the current task but it allows to customize the process using flags parameter and allows to configure the entry point for the child task. In the next lesson, we will see how glibc clone wrapper function is implemented - this wrapper allows to use clone syscall to create new threads.
  4. Finally, kernel_thread function can be used to create new kernel threads.

All of the above functions calls _do_fork, which accept the following arguments.

Fork procedure

Next, I want to highlight the most important events that take place during _do_fork execution, preserving their order.

  1. _do_fork calls copy_process copy_process is responsible for configuring new task_struct.
  2. copy_process calls dup_task_struct, which allocates new task_struct and copies all fields from the original one. Actual copying takes place in the architecture-specific arch_dup_task_struct
  3. New kernel stack is allocated. If CONFIG_VMAP_STACK is enabled the kernel uses virtually mapped stacks to protect against kernel stack overflow. link
  4. Task’s credentials are copied. link
  5. The scheduler is notified that a new task is forked. link
  6. task_fork_fair method of the CFS scheduler class is called. This method updates vruntime value for the currently running task (this is done inside update_curr function) and updates min_vruntime value for the current runqueue (inside update_min_vruntime). Then min_vruntime value is assigned to the forked task - this ensures that this task will be picked up next. Note, that at this point of time new task still hasn’t been added to the task_timeline.
  7. A lot of different properties, such as information about filesystems, open files, virtual memory, signals, namespaces, are either reused or copied from the current task. The decision whether to copy something or reuse current property is usually made based on the clone_flags parameter. link
  8. copy_thread_tls is called which in turn calls architecture specific copy_thread function. This function deserves a special attention because it works as a prototype for the copy_process function in the RPi OS, and I want to investigate it deeper.

copy_thread

The whole function is listed below.

int copy_thread(unsigned long clone_flags, unsigned long stack_start,
        unsigned long stk_sz, struct task_struct *p)
{
    struct pt_regs *childregs = task_pt_regs(p);

    memset(&p->thread.cpu_context, 0, sizeof(struct cpu_context));

    if (likely(!(p->flags & PF_KTHREAD))) {
        *childregs = *current_pt_regs();
        childregs->regs[0] = 0;

        /*
         * Read the current TLS pointer from tpidr_el0 as it may be
         * out-of-sync with the saved value.
         */
        *task_user_tls(p) = read_sysreg(tpidr_el0);

        if (stack_start) {
            if (is_compat_thread(task_thread_info(p)))
                childregs->compat_sp = stack_start;
            else
                childregs->sp = stack_start;
        }

        /*
         * If a TLS pointer was passed to clone (4th argument), use it
         * for the new thread.
         */
        if (clone_flags & CLONE_SETTLS)
            p->thread.tp_value = childregs->regs[3];
    } else {
        memset(childregs, 0, sizeof(struct pt_regs));
        childregs->pstate = PSR_MODE_EL1h;
        if (IS_ENABLED(CONFIG_ARM64_UAO) &&
            cpus_have_const_cap(ARM64_HAS_UAO))
            childregs->pstate |= PSR_UAO_BIT;
        p->thread.cpu_context.x19 = stack_start;
        p->thread.cpu_context.x20 = stk_sz;
    }
    p->thread.cpu_context.pc = (unsigned long)ret_from_fork;
    p->thread.cpu_context.sp = (unsigned long)childregs;

    ptrace_hw_copy_thread(p);

    return 0;
}

Some of this code can be already a little bit familiar to you. Let’s dig dipper into it.

struct pt_regs *childregs = task_pt_regs(p);

The function starts with allocating new pt_regs struct. This struct is used to provide access to the registers, saved during kernel_entry. childregs variable then can be used to prepare whatever state we need for the newly created task. If the task then decides to move to user mode the state will be restored by the kernel_exit macro. An important thing to understand here is that task_pt_regs macro doesn’t allocate anything - it just calculate the position on the kernel stack, were kernel_entry stores registers, and for the newly created task, this position will always be at the top of the kernel stack.

memset(&p->thread.cpu_context, 0, sizeof(struct cpu_context));

Next, forked task cpu_context is cleared.

if (likely(!(p->flags & PF_KTHREAD))) {

Then a check is made to determine whether we are creating a kernel or a user thread. For now, we are interested only in kernel thread case and we will discuss the second option in the next lesson.

  memset(childregs, 0, sizeof(struct pt_regs));
  childregs->pstate = PSR_MODE_EL1h;
  if (IS_ENABLED(CONFIG_ARM64_UAO) &&
      cpus_have_const_cap(ARM64_HAS_UAO))
          childregs->pstate |= PSR_UAO_BIT;
  p->thread.cpu_context.x19 = stack_start;
  p->thread.cpu_context.x20 = stk_sz;

If we are creating a kernel thread x19 and x20 registers of the cpu_context are set to point to the function that needs to be executed (stack_start) and its argument (stk_sz). After CPU will be switched to the forked task, ret_from_fork will use those registers to jump to the needed function. (I don’t quite understand why do we also need to set childregs->pstate here. ret_from_fork will not call kernel_exit before jumping to the function stored in x19, and even if the kernel thread decides to move to the user mode childregs will be overwritten anyway. Any ideas?)

p->thread.cpu_context.pc = (unsigned long)ret_from_fork;
p->thread.cpu_context.sp = (unsigned long)childregs;

Next cpu_context.pc is set to ret_from_fork pointer - this ensures that we return to the ret_from_fork after the first context switch. cpu_context.sp is set to the location just below the childregs. We still need childregs at the top of the stack because after the kernel thread finishes its execution the task will be moved to user mode and childregs structure will be used. In the next lesson, we will discuss in details how this happens.

That’s it about copy_thread function. Now let’s return to the place in the fork procedure from where we left.

Fork procedure (continued)

  1. After copy_process succsesfully prepares task_struct for the forked task _do_fork can now run it by calling wake_up_new_task. This is done here. Then task state is changed to TASK_RUNNING and enqueue_task_fair CFS method is called, wich triggers execution of the __enqueue_entity that actually adds task to the task_timeline red-black tree.

  2. At this line, check_preempt_curr is called, which in turn calls check_preempt_wakeup CFS method. This method is responsible for checking whether the current task should be preempted by some other task. That is exactly what is going to happen because we have just put a new task on the timeline that has minimal possible vruntime. So resched_curr function is triggered, which sets TIF_NEED_RESCHED flag for the current task.

  3. TIF_NEED_RESCHED is checked just before the current task exit from an exception handler (fork, vfork and clone are all system call, and each system call is a special type of exception.). The check is made here. Note that _TIF_WORK_MASK includes _TIF_NEED_RESCHED. It is also important to understand that in case of a kernel thread creation, the new thread will not be started until the next timer tick or until the parent task volatirely calls schedule().

  4. If the current task needs to be rescheduled, do_notify_resume is triggered, which in turn calls schedule. Finally we reached the point where task scheduling is triggered, and we are going to stop at this point.

Conclusion

Now that you understand how new tasks are created and added to the scheduler, it is time to take a look on how the scheduler itself works and how context switch is implemented. That is something we are going to explore in the next chapter.

Previous Page

4.2 Process scheduler: Scheduler basic structures

Next Page

4.4 Process scheduler: Scheduler