Re: [PATCH 0/6] Add ACRN CPU frequency management

Eddie Dong

-----Original Message-----
From: acrn-dev@... <acrn-dev@...> On
Behalf Of Zhou, Wu
Sent: Wednesday, August 10, 2022 2:25 AM
To: acrn-dev@...
Cc: Zhou, Wu <wu.zhou@...>
Subject: [acrn-dev] [PATCH 0/6] Add ACRN CPU frequency management

1. ACRN CPU frequency management's design The base design is to let ACRN
own CPU frequency control with those two
- Performance: CPU can run at its max possible frequency (turbo boost
will be activated if enabled).
- Nominal: CPU runs at its base frequency.
Users can choose which of the governors to use.
Users also must choose which frequency interface to use:
- ACPI p-state
The HWP switch MSR IA32_PM_ENABLE is a global control (and it by-passes
ACPI cpufreq interface once enabled), so the frequency interface has to be a
global setting.
Somehow it is hard to understand... Please revisit carefully.

In general, I think hardware platform may support either HWP or ACPI method for frequency control.
ACRN HV should be able to manage the frequency automatically (no need for configuration tool to set). Do I miss anything?
To me, if HWP method is available, it is preferred. If not, ACPI method is the choice.

RTVMs needs certainty in latency, so their CPUs always run at base/nominal
frequency. Here are the combinations:
Cores not running any RTVM:
- Performance + HWP: Range from lowest to highest performance levels.
- Nominal + HWP: Fix to nominal performance level.
- Performance + P-state: Fix to highest p-state.
- Nominal + P-state: Fix to base P-state.
Cores running an RTVM:
- P-state: Fix to base P-state
- HWP: Fix to nominal performance level.

Just like the Linux cpufreq driver, a 'policy' object type is introduced to help
ACRN manage CPU frequency. 'policy' is a per CPU data type which indicates
“the highest/lowest/base CPU frequency limits under current HW and scenario
setup". It is statically allocated by config-tools, and do not need to be
With policy given for each cores, the hypervisor doesn’t have to deal with
hardware or scenario settings. It only choses highest or base frequency to run

The frequency management system is like this:

VM0 ... VM* (VMs have no CPU freq control)

Governor (Performance/Nominal)
policy0 policy1 ... policy*
| | |
| | |
HWP or p-state HWP or p-state ... HWP or p-state
Does guest care about which method to be exposed?
It seems either way works. If YES, we don’t want to bother users to configure in tools.

A general policy is to "automatically" configure the system if it can.

We use the tool (to configure) for the reason of saving HV code. In here, I didn’t see the saving.

Major changes made to code are:
- Add a hypervisor config item which specifies the governor.
- Add a Boolean config item which specifies the preference of P-state
over HWP.
- Parse and hand over P-states and HWP info from ACPI namespace to the
- Add the CPU frequency governors.
- Hide P-state and HWP in guest CPUID.
- Inject #GP(0) upon accesses to P-state and HWP related MSRs.
Device Model:
- Do not generate _PSS and _PPC for post-launched VMs.

2. ACPI p-state pass through
ACPI p-state control can be passed through to guest if it is not sharing pCPUs
with others. In this patch CPU sharing is detected by analyzing cpu_affinity
config in config-tools.

3. Dealing with CPU frequency domains
When CPUs are in a frequency domain, they share frequency on HW level, and
would always be working on the same frequency (of the highest one in the
group). A typical example is the group of 4 e-cores in ADL.

Those CPUs could be assigned to different VMs. This is no problem for none-
RTVMs. Because they would not mind to run at higher frequency.
But if one of those VMs is RTVM, we must choose between:
- Let all those CPUs run at base frequency for certainty.
- Let all those CPUs run at none-guaranteed turbo max frequency.
This patch has chosen base frequency for certainty.

Signed-off-by: Wu Zhou <wu.zhou@...>

*** BLURB HERE ***

Wu Zhou (6):
config_tools: add HV CPU frequency options to configurator
config_tools: extract CPU frequency info in board_inspector
config_tools: allocate CPU frequency policy
config_tools & hv: generate CPU frequency info code
hv: add CPU frequency driver in hv
hv: block guest eist/hwp cpuids and MSRs

devicemodel/include/types.h | 1 +
hypervisor/arch/x86/cpu.c | 5 +
hypervisor/arch/x86/guest/vcpuid.c | 10 +
hypervisor/arch/x86/guest/vmsr.c | 38 +++-
hypervisor/arch/x86/pm.c | 59 ++++++
hypervisor/common/hypercall.c | 2 +-
hypervisor/include/arch/x86/asm/board.h | 2 +
hypervisor/include/arch/x86/asm/cpuid.h | 2 +
hypervisor/include/arch/x86/asm/guest/vcpu.h | 2 +-
hypervisor/include/arch/x86/asm/host_pm.h | 2 +
hypervisor/include/arch/x86/asm/per_cpu.h | 1 +
hypervisor/include/arch/x86/asm/vm_config.h | 2 +
hypervisor/include/public/acrn_common.h | 29 +++
misc/config_tools/board_config/ | 35 ++++
.../board_inspector/ | 2 +-
.../board_inspector/cpuparser/ | 6 +
.../board_inspector/cpuparser/ | 31 ++-
.../board_inspector/cpuparser/ | 12 ++
.../extractors/ | 45 +++-
misc/config_tools/library/ | 37 ++++
misc/config_tools/schema/config.xsd | 23 ++
misc/config_tools/schema/types.xsd | 36 ++++
.../static_allocators/ | 198 ++++++++++++++++++
.../xforms/vm_configurations.c.xsl | 8 +
24 files changed, 580 insertions(+), 8 deletions(-) create mode 100644


Join { to automatically receive all group messages.