Zephyr内核之CPU使用率

本文分析说明Zephyr中Thread的CPU使用率统计实现方式。

Zephyr LTS2 release后,大约在2021年11月引入了thread usage功能https://github.com/zephyrproject-rtos/zephyr/pull/38876这对调试分析多线程系统性能很有必要。本文分析thread usage的实现方式,同时说明如何使用shell和thread 分析模块来显示thread cpu loading,并分析说明目前显示thread cpu loading的问题和改进办法。

使用率数据收集 链接到标题

线程调度时间的计算 链接到标题

Zephyr内核在切换上下文时记录CPU执行的cycle变化量,该变化量就是被切出线程最近一次的调度时间。 Zephyr通过do_swap切换上下文,向下的调用顺序为:zephyr/kernel/include/kswap.h do_swap->z_sched_usage_switch

static inline void z_sched_usage_switch(struct k_thread *thread)
{
	ARG_UNUSED(thread);
#ifdef CONFIG_SCHED_THREAD_USAGE
	z_sched_usage_stop();
	z_sched_usage_start(thread);
#endif
}

z_sched_usage_stop计算当前要被切出的线程被调度了多长时间

void z_sched_usage_stop(void)
{
	k_spinlock_key_t k   = k_spin_lock(&usage_lock);

	struct _cpu     *cpu = _current_cpu;

    //usage0在上一次上下文切换时调用z_sched_usage_start中被赋值
	uint32_t u0 = cpu->usage0;

	if (u0 != 0) {
        //此时计算距离上一次上下文切换过去了多少cycles,这个cycles也就是当前要被切出线程被调度的时间
		uint32_t cycles = usage_now() - u0;

        //将线程被调度的时间更新到当前线程中
		if (cpu->current->base.usage.track_usage) {
			sched_thread_update_usage(cpu->current, cycles);
		}

        //将线程被调度的时间更新到当前cpu中
		sched_cpu_update_usage(cpu, cycles);
	}

    //清空usage0记录
	cpu->usage0 = 0;
	k_spin_unlock(&usage_lock, k);
}

z_sched_usage_switchthread参数为即将被调度的线程,z_sched_usage_start记录被调度线程的起始cycle数

void z_sched_usage_start(struct k_thread *thread)
{
#ifdef CONFIG_SCHED_THREAD_USAGE_ANALYSIS
	k_spinlock_key_t  key;

	key = k_spin_lock(&usage_lock);

    //记录当前cycle数,也就是thread调度一开始的数量
	_current_cpu->usage0 = usage_now();   /* Always update */

    //num_windows记录thread被调度的次数
    //current记录thread最近一次被调度的时间长度,这里才开始调度因此被清0
	if (thread->base.usage.track_usage) {
		thread->base.usage.num_windows++;
		thread->base.usage.current = 0;
	}

	k_spin_unlock(&usage_lock, key);
#else
	/* One write through a volatile pointer doesn't require
	 * synchronization as long as _usage() treats it as volatile
	 * (we can't race with _stop() by design).
	 */

	_current_cpu->usage0 = usage_now();
#endif
}

调度时间的更新 链接到标题

无论是cpu还是线程都使用zephyr/include/zephyr/kernel/stats.h内的struct k_cycle_stats记录执行

struct k_cycle_stats {
	uint64_t  total;        /* total usage in cycles */
#ifdef CONFIG_SCHED_THREAD_USAGE_ANALYSIS
	uint64_t  current;      /* # of cycles in current usage window */
	uint64_t  longest;      /* # of cycles in longest usage window */
	uint32_t  num_windows;  /* # of usage windows */
#endif
	bool      track_usage;  /* true if gathering usage stats */
};

线程调度时间 链接到标题

线程被调度的时间以cycle为单位记录在下面的结构体关系中

struct k_thread->struct _thread_base base->struct k_cycle_stats  usage;

针对线程的usage通过void sched_thread_update_usage(struct k_thread *thread, uint32_t cycles)将线程被调度的cycle数更新到线程的struct k_cycle_stats usage中,对于线程来说,该结构体各个字段的含义如下

  • total 该线程总计执行了多少个cycle
  • current 该线程最近一次调度执行了多少个cycle
  • longest 该线程多次调度中最长执行了的cycle数
  • num_windows 线程被调度的次数 包括idle线程在内的所有线程都会记录以上数据

CPU时间 链接到标题

CPU执行的时间以cycle为单位记录在下面的结构体中

struct _cpu -> struct k_cycle_stats usage;

针对CPU的usage通过void sched_cpu_update_usage(struct _cpu *cpu, uint32_t cycles)将线程被调度的cycle数更新到cpu的struct k_cycle_stats usage中,对于cpu来说,该结构体各个字段的含义如下

  • total 示该cpu总计执行了多少个cycle,不包含idle线程的
  • current 距离上一次idle该cpu执行了多少个cycle
  • longest 两次idle之间该cpu最长执行了的cycle数
  • num_windows 该cpu上Idle多少次调度

运行时获取使用率数据 链接到标题

无论是cpu还是线程都使用zephyr/include/zephyr/kernel/thread.h内的struct k_thread_runtime_stats来存储返回运行时数据

typedef struct k_thread_runtime_stats {
#ifdef CONFIG_SCHED_THREAD_USAGE
	uint64_t execution_cycles;
	uint64_t total_cycles;        /* total # of non-idle cycles */
	/*
	 * In the context of thread statistics, [execution_cycles] is the same
	 * as the total # of non-idle cycles. In the context of CPU statistics,
	 * it refers to the sum of non-idle + idle cycles.
	 */
#endif

#ifdef CONFIG_SCHED_THREAD_USAGE_ANALYSIS
	/*
	 * For threads, the following fields refer to the time spent executing
	 * as bounded by when the thread was scheduled in and scheduled out.
	 * For CPUs, the same fields refer to the time spent executing
	 * non-idle threads as bounded by the idle thread(s).
	 */

	uint64_t current_cycles;      /* current # of non-idle cycles */
	uint64_t peak_cycles;         /* peak # of non-idle cycles */
	uint64_t average_cycles;      /* average # of non-idle cycles */
#endif

#ifdef CONFIG_SCHED_THREAD_USAGE_ALL
	/*
	 * This field is always zero for individual threads. It only comes
	 * into play when gathering statistics for the CPU. In that case it
	 * represents the total number of cycles spent idling.
	 */

	uint64_t idle_cycles;
#endif

#if defined(__cplusplus) && !defined(CONFIG_SCHED_THREAD_USAGE) &&                                 \
	!defined(CONFIG_SCHED_THREAD_USAGE_ANALYSIS) && !defined(CONFIG_SCHED_THREAD_USAGE_ALL)
	/* If none of the above Kconfig values are defined, this struct will have a size 0 in C
	 * which is not allowed in C++ (it'll have a size 1). To prevent this, we add a 1 byte dummy
	 * variable when the struct would otherwise be empty.
	 */
	uint8_t dummy;
#endif
}  k_thread_runtime_stats_t;

运行时线程数据获取 链接到标题

z_sched_thread_usage用于获取线程的usage,如果是线程自己调用该API,会计算到调用这一刻

void z_sched_thread_usage(struct k_thread *thread,
			  struct k_thread_runtime_stats *stats)
{
	struct _cpu *cpu;
	k_spinlock_key_t  key;

	key = k_spin_lock(&usage_lock);
	cpu = _current_cpu;

	//如果是当前线程,计算从开始调度到这一刻的cycles并进行更新
	if (thread == cpu->current) {
		uint32_t now = usage_now();
		uint32_t cycles = now - cpu->usage0;

		/*
		 * Getting stats for the current thread. Update both the
		 * current thread stats and its CPU stats as the CPU's
		 * [usage0] field will also get updated. This keeps all
		 * that information up-to-date.
		 */

		if (thread->base.usage.track_usage) {
			sched_thread_update_usage(thread, cycles);
		}

		sched_cpu_update_usage(cpu, cycles);

		cpu->usage0 = now;
	}

    //转换为运行时统计数据
	stats->execution_cycles = thread->base.usage.total;
	stats->total_cycles     = thread->base.usage.total;

	/* Copy-out the thread's usage stats */

#ifdef CONFIG_SCHED_THREAD_USAGE_ANALYSIS
	stats->current_cycles = thread->base.usage.current;
	stats->peak_cycles    = thread->base.usage.longest;

    //计算平局调度时间
	if (thread->base.usage.num_windows == 0) {
		stats->average_cycles = 0;
	} else {
		stats->average_cycles = stats->total_cycles /
					thread->base.usage.num_windows;
	}
#endif

#ifdef CONFIG_SCHED_THREAD_USAGE_ALL
	stats->idle_cycles = 0;
#endif
	stats->execution_cycles = thread->base.usage.total;

	k_spin_unlock(&usage_lock, key);
}

对于单个线程来说 k_thread_runtime_stats_t各字段的含义如下

  • execution_cycles 该线程总计调度的cycle数
  • total_cycles 该线程总计调度的cycle数
  • current_cycles 该线程最近一次调度的cycle数
  • peak_cycles 线程最长一次调度的cycle数
  • average_cycles 线程调度平均的cycle数
  • idle_cycles 为0,无效

运行时CPU数据获取 链接到标题

z_sched_cpu_usage用于获取CPU的usage,会计算到调用API这一刻

void z_sched_cpu_usage(uint8_t cpu_id, struct k_thread_runtime_stats *stats)
{
	k_spinlock_key_t  key;
	struct _cpu *cpu;

	key = k_spin_lock(&usage_lock);
	cpu = _current_cpu;

	//计算从开始调度到这一刻的cycles并进行更新
	if (&_kernel.cpus[cpu_id] == cpu) {
		uint32_t  now = usage_now();
		uint32_t cycles = now - cpu->usage0;

		/*
		 * Getting stats for the current CPU. Update both its
		 * current thread stats and the CPU stats as the CPU's
		 * [usage0] field will also get updated. This keeps all
		 * that information up-to-date.
		 */

		if (cpu->current->base.usage.track_usage) {
			sched_thread_update_usage(cpu->current, cycles);
		}

		sched_cpu_update_usage(cpu, cycles);

		cpu->usage0 = now;
	}

	stats->total_cycles     = cpu->usage.total;
#ifdef CONFIG_SCHED_THREAD_USAGE_ANALYSIS
	stats->current_cycles   = cpu->usage.current;
	stats->peak_cycles      = cpu->usage.longest;

    //计算该CPU上调度的平均执行时间
	if (cpu->usage.num_windows == 0) {
		stats->average_cycles = 0;
	} else {
		stats->average_cycles = stats->total_cycles /
					cpu->usage.num_windows;
	}
#endif

	stats->idle_cycles =
		_kernel.cpus[cpu_id].idle_thread->base.usage.total;

	stats->execution_cycles = stats->total_cycles + stats->idle_cycles;

	k_spin_unlock(&usage_lock, key);
}

对于CPU来说 k_thread_runtime_stats_t各字段的含义如下

  • total_cycles cpu总计调度的cycle,不包含idle
  • idle_cycles cpu上的idle线程执总计执行了多少tick
  • execution_cycles cpu总计执行了多少cycles,包含idle和非idle
  • current_cycles 距离上一次idle该cpu执行了多少个cycle
  • peak_cycles 表示cpu上两次idle之间该cpu最长多少个cycle
  • average_cycles 该cpu上平均调度时间

usage的使用 链接到标题

在zephyr中进行如下配置可以启动usage的统计

CONFIG_SCHED_THREAD_USAGE_ANALYSIS=y
CONFIG_SCHED_THREAD_USAGE_ALL=y

shell中usage的使用 链接到标题

开启如下配置项会有cpu Loading和usage出现

CONFIG_SHELL=y
CONFIG_KERNEL_SHELL=y
CONFIG_THREAD_RUNTIME_STATS=y

执行kernel threads可以看到

uart:~$ kernel threads
Scheduler: 38 since last call
Threads:
 0x3fc90ac0 temp_task_id
        options: 0x0, priority: 3 timeout: 0
        state: pending, entry: 0x4201277c
        Total execution cycles: 217 (0 %)
        Current execution cycles: 217
        Peak execution cycles: 217
        Average execution cycles: 217
        stack size 2048, unused 1864, usage 184 / 2048 (8 %)

 0x3fc90568 sys_stress
        options: 0x0, priority: 14 timeout: 52
        state: suspended, entry: 0x4201121c
        Total execution cycles: 2920152707 (92 %)
        Current execution cycles: 800216
        Peak execution cycles: 802088
        Average execution cycles: 255291
        stack size 2048, unused 1740, usage 308 / 2048 (15 %)

 0x3fc90480 sys_ps
        options: 0x0, priority: 0 timeout: 499
        state: suspended, entry: 0x4201115e
        Total execution cycles: 491107 (0 %)
        Current execution cycles: 835
        Peak execution cycles: 4225
        Average execution cycles: 626
        stack size 2048, unused 1688, usage 360 / 2048 (17 %)

 0x3fc938d0 input
        options: 0x0, priority: 14 timeout: 0
        state: pending, entry: 0x4201f8ec
        Total execution cycles: 216 (0 %)
        Current execution cycles: 216
        Peak execution cycles: 216
        Average execution cycles: 216
        stack size 512, unused 344, usage 168 / 512 (32 %)

线程分析功能 链接到标题

zephyr的debug子系统中有一个线程分析功能zephyr/subsys/debug/thread_analyzer.c,提供的功能和shell命令类似,主要是用于没有开启shell的情况下。 当配置CONFIG_THREAD_ANALYZER=y后就可以调用thread_analyzer_print()显示线程信息,其中报告cpu的usage。当多配置CONFIG_THREAD_ANALYZER_AUTO=y后线程分析会多开一个thread,每隔CONFIG_THREAD_ANALYZER_AUTO_INTERVAL调用一次thread_analyzer_print()来自动显示线程信息,如下

Thread analyze:
 temp_task_id        : STACK: unused 1864 usage 184 / 2048 (8 %); CPU: 0 %
      : Total CPU cycles used: 204
         - Current Frame: 204; Longest Frame: 204; Average Frame: 204
 sys_stress          : STACK: unused 1868 usage 180 / 2048 (8 %); CPU: 0 %
      : Total CPU cycles used: 13642
         - Current Frame: 124; Longest Frame: 322; Average Frame: 128
 input               : STACK: unused 344 usage 168 / 512 (32 %); CPU: 0 %
      : Total CPU cycles used: 204
         - Current Frame: 204; Longest Frame: 204; Average Frame: 204
 display_task_id     : STACK: unused 1880 usage 168 / 2048 (8 %); CPU: 0 %
      : Total CPU cycles used: 160
         - Current Frame: 160; Longest Frame: 160; Average Frame: 160
 check_task_id       : STACK: unused 1880 usage 168 / 2048 (8 %); CPU: 0 %
      : Total CPU cycles used: 513
         - Current Frame: 513; Longest Frame: 513; Average Frame: 513
 sysworkq            : STACK: unused 840 usage 184 / 1024 (17 %); CPU: 0 %
      : Total CPU cycles used: 60
         - Current Frame: 60; Longest Frame: 60; Average Frame: 60
 shell_uart          : STACK: unused 968 usage 1080 / 2048 (52 %); CPU: 0 %
      : Total CPU cycles used: 227831
         - Current Frame: 21963; Longest Frame: 21963; Average Frame: 240
 logging             : STACK: unused 456 usage 312 / 768 (40 %); CPU: 0 %
      : Total CPU cycles used: 28826
         - Current Frame: 143; Longest Frame: 3228; Average Frame: 134
 idle                : STACK: unused 300 usage 212 / 512 (41 %); CPU: 99 %
      : Total CPU cycles used: 169747177
         - Current Frame: 40345; Longest Frame: 815888; Average Frame: 218184
 ISR0                : STACK: unused 0 usage 2048 / 2048 (100 %)

usage功能的问题和处理 链接到标题

shell命令中线程的cpu loading百分比计算方式如下:

pcnt = (rt_stats_thread.execution_cycles * 100U) /
		       rt_stats_all.execution_cycles;

在线程分析中线程的cpu loading百分比计算方式如下:

info.utilization = (info.usage.execution_cycles * 100U) /
			rt_stats_all.execution_cycles;

二者都是以线程的总计被调度cycle去除于cpu的总计执行cycle,也就是说线程的cpu loading是从开机到当前的总计平均情况。如果某个线程在开机后loading一直很低,后来突然增高,shell和线程分析显示该线程的cpu loading只会慢慢增加,在增高一瞬间是看不出来的,此时可以通过观察Current execution cycles,如果该值较平时大说明该线程的瞬时loading上涨,但这样并不直观,我们还是希望能像ps一样得到一个百分比。shell和线程分析的百分比不能反映瞬时的线程loading是因为统计时间太长,因此我们另外写一个cpu loading的计算方法就是缩短统计时间, 以下是伪代码表示方法,1s计算一次loading:

while(1){
    //获取当前执行的cycles数量
    k_thread_runtime_stats_all_get(&cpu_stats);
    k_thread_runtime_stats_get(thread, &thread_stats);
    //通过两次的差值计算线程的loading百分比
    usage = (thread_stats.execution_cycles - last_thread_stats.execution_cycles)/
            (cpu_stats.execution_cycles - last_cpu_stats.execution_cycles);
    //保存本次的采样
    last_cpu_stats = cpu_stats;
    last_thread_stats = thread_stats;
    k_msleep(1000);
}

此时就比较贴近当前的CPU Loading了

temp_task_id         CPU 0 %
sys_stress           CPU 99 %
sys_ps               CPU 0 %
input                CPU 0 %
display_task_id      CPU 0 %
check_task_id        CPU 0 %
sysworkq             CPU 0 %
shell_uart           CPU 0 %
logging              CPU 0 %
idle                 CPU 1 %

增补 链接到标题

如果你对上面的解释还不太能理解,下面的付费内容从一副图片举例说明上面各字段的含义:

调度流程说明 链接到标题

在每次上下文切换时通过zephyr/kernel/usage.c的usage_now获取当前cpu已经执行了多少个cycle数量,也就是图中cycle0~7。 在每次上下文切换时会调用z_sched_usage_stop计算该线程这一次调度占用的cycle数,也就是cycle[n]-cycle[n-1]:

  • B0 = cycle1-cycle0
  • A0 = cycle2-cycle1
  • I0 = cycle3-cycle2
  • B1 = cycle4-cycle3
  • A1 = cycle5-cycle4
  • I1 = cycle6-cycle5
  • B2 = cycle7-cycle6

每次上下文切换时,都会将计算出的被调度时间记录到对应的线程状态和cpu状态中,都以struct k_cycle_stats格式存储,在cycle7这一时间点上 A线程:

  • total=A0+A1 ,该线程总计执行了多少个cycle
  • current=A1 ,该线程最近一次调度执行了多少个cycle
  • longest=A1, 该线程多次调度中最长执行了的cycle数
  • num_windows=2 线程被调度的次数

B线程:

  • total=B0+B1+B2 ,该线程总计执行了多少个cycle
  • current=B2 ,该线程最近一次调度执行了多少个cycle
  • longest=B0, 该线程多次调度中最长执行了的cycle数
  • num_windows=3 线程被调度的次数

Idle线程:

  • total=I0+I1 ,该线程总计执行了多少个cycle
  • current=I1 ,该线程最近一次调度执行了多少个cycle
  • longest=I1, 该线程多次调度中最长执行了的cycle数
  • num_windows=2 线程被调度的次数

CPU:

  • total=A0+A1+B0+B1+B2 示该cpu总计执行了多少个cycle,不包含idle线程的
  • current=B2 ,距离上一次idle该cpu执行了多少个cycle,如果当前是从cycle5看的话,就是current=A1+B1
  • longest=A0+B0 ,两次idle之间该cpu最长执行了的cycle数
  • num_windows=7 ,该cpu上发生了多少次调度

运行时获取流程说明 链接到标题

在cycle8这个位置,并没有发生调度,此时只是想通过k_thread_runtime_stats_get/k_thread_runtime_stats_all_get获取当前对线程和CPU的统计,这两个函数会分别调用到z_sched_thread_usage/z_sched_cpu_usage,也就是对应上图中cycle8的时刻,虽然没有发生上下文切换,为了准确还是会将已经调度的时间A2=cycle8-cycle7计入到线程和CPU状态内,此时B线程和Idle线程的统计信息不变, A线程:

  • total=A0+A1+A2 ,该线程总计执行了多少个cycle
  • current=A2 ,该线程最近一次调度执行了多少个cycle
  • longest=A1, 该线程多次调度中最长执行了的cycle数
  • num_windows=3 线程被调度的次数

CPU:

  • total=A0+A1+B0+B1+B2+A2 示该cpu总计执行了多少个cycle,不包含idle线程的
  • current=A2 ,距离上一次idle该cpu执行了多少个cycle
  • longest=A0+B0 ,两次idle之间该cpu最长执行了的cycle数
  • num_windows=8 ,该cpu上发生了多少次调度

k_thread_runtime_stats_get/k_thread_runtime_stats_all_get得到的数据都由usage统计数据转换而来,存放在k_thread_runtime_stats_t中,在cycle8这一时刻执行时: A线程

  • execution_cycles=A0+A1+A2 该线程总计调度的cycle数
  • total_cycles=A0+A1+A2 该线程总计调度的cycle数
  • current_cycles=A2 该线程最近一次调度的cycle数
  • peak_cycles=A1 线程最长一次调度的cycle数
  • average_cycles=(A0+A1+A2)/3 线程调度平均的cycle数
  • idle_cycles=0 无效

B线程

  • execution_cycles=B0+B1+B2 该线程总计调度的cycle数
  • total_cycles=B0+B1+B2 该线程总计调度的cycle数
  • current_cycles=B2 该线程最近一次调度的cycle数
  • peak_cycles=B0 线程最长一次调度的cycle数
  • average_cycles=(B0+B1+B2)/3 线程调度平均的cycle数
  • idle_cycles=0 无效

Idle线程

  • execution_cycles=I0+I1 该线程总计调度的cycle数
  • total_cycles=I0+I1 该线程总计调度的cycle数
  • current_cycles=I1 该线程最近一次调度的cycle数
  • peak_cycles=I1 线程最长一次调度的cycle数
  • average_cycles=(I0+I1)/2 线程调度平均的cycle数
  • idle_cycles=0 无效

CPU线程

  • execution_cycles=A0+A1+A2+B0+B1+B2+I0+I1 cpu总计执行了多少cycles,包含idle和非idle
  • total_cycles=A0+A1+A2+B0+B1+B2 cpu总计调度的cycle,不包含idle
  • current_cycles=A2+B2 距离上一次idle该cpu执行了多少个cycle
  • peak_cycles=B0+A0 表示cpu上两次idle之间该cpu最长多少个cycle
  • average_cycles=(A0+A1+A2+B0+B1+B2)/2 该cpu上平均调度时间
  • idle_cycles=I0+I1 cpu上的idle线程执总计执行了多少tick

average_cycles的时间是用total_cycles/num_windows得到,num_windows是idle的次数,对于cpu来说只有调度和idle两个状态,因此idle多少次就对应有多少次调度。