[PATCH 0/6] Add ACRN CPU frequency management


Zhou, Wu
 

1. ACRN CPU frequency management's design
The base design is to let ACRN own CPU frequency control with those two
governors:
- Performance: CPU can run at its max possible frequency (turbo boost
will be activated if enabled).
- Nominal: CPU runs at its base frequency.
Users can choose which of the governors to use.
Users also must choose which frequency interface to use:
- HWP
- ACPI p-state
The HWP switch MSR IA32_PM_ENABLE is a global control (and it by-passes
ACPI cpufreq interface once enabled), so the frequency interface has to
be a global setting.

RTVMs needs certainty in latency, so their CPUs always run at
base/nominal frequency. Here are the combinations:
Cores not running any RTVM:
- Performance + HWP: Range from lowest to highest performance levels.
- Nominal + HWP: Fix to nominal performance level.
- Performance + P-state: Fix to highest p-state.
- Nominal + P-state: Fix to base P-state.
Cores running an RTVM:
- P-state: Fix to base P-state
- HWP: Fix to nominal performance level.

Just like the Linux cpufreq driver, a 'policy' object type is introduced
to help ACRN manage CPU frequency. 'policy' is a per CPU data type which
indicates “the highest/lowest/base CPU frequency limits under current
HW and scenario setup". It is statically allocated by config-tools, and
do not need to be configurated.
With policy given for each cores, the hypervisor doesn’t have to deal
with hardware or scenario settings. It only choses highest or base
frequency to run at.

The frequency management system is like this:

VM0 ... VM* (VMs have no CPU freq control)

ACRN
Governor (Performance/Nominal)
policy0 policy1 ... policy*
| | |
| | |
HWP or p-state HWP or p-state ... HWP or p-state
pCPU0 pCPU1 pCPU*

Major changes made to code are:
Configuration:
- Add a hypervisor config item which specifies the governor.
- Add a Boolean config item which specifies the preference of P-state
over HWP.
- Parse and hand over P-states and HWP info from ACPI namespace to the
hypervisor.
Hypervisor:
- Add the CPU frequency governors.
- Hide P-state and HWP in guest CPUID.
- Inject #GP(0) upon accesses to P-state and HWP related MSRs.
Device Model:
- Do not generate _PSS and _PPC for post-launched VMs.

2. ACPI p-state pass through
ACPI p-state control can be passed through to guest if it is not sharing
pCPUs with others. In this patch CPU sharing is detected by analyzing
cpu_affinity config in config-tools.

3. Dealing with CPU frequency domains
When CPUs are in a frequency domain, they share frequency on HW level,
and would always be working on the same frequency (of the highest one in
the group). A typical example is the group of 4 e-cores in ADL.

Those CPUs could be assigned to different VMs. This is no problem for
none-RTVMs. Because they would not mind to run at higher frequency.
But if one of those VMs is RTVM, we must choose between:
- Let all those CPUs run at base frequency for certainty.
- Let all those CPUs run at none-guaranteed turbo max frequency.
This patch has chosen base frequency for certainty.


Signed-off-by: Wu Zhou <wu.zhou@...>

*** BLURB HERE ***

Wu Zhou (6):
config_tools: add HV CPU frequency options to configurator
config_tools: extract CPU frequency info in board_inspector
config_tools: allocate CPU frequency policy
config_tools & hv: generate CPU frequency info code
hv: add CPU frequency driver in hv
hv: block guest eist/hwp cpuids and MSRs

devicemodel/include/types.h | 1 +
hypervisor/arch/x86/cpu.c | 5 +
hypervisor/arch/x86/guest/vcpuid.c | 10 +
hypervisor/arch/x86/guest/vmsr.c | 38 +++-
hypervisor/arch/x86/pm.c | 59 ++++++
hypervisor/common/hypercall.c | 2 +-
hypervisor/include/arch/x86/asm/board.h | 2 +
hypervisor/include/arch/x86/asm/cpuid.h | 2 +
hypervisor/include/arch/x86/asm/guest/vcpu.h | 2 +-
hypervisor/include/arch/x86/asm/host_pm.h | 2 +
hypervisor/include/arch/x86/asm/per_cpu.h | 1 +
hypervisor/include/arch/x86/asm/vm_config.h | 2 +
hypervisor/include/public/acrn_common.h | 29 +++
misc/config_tools/board_config/board_c.py | 35 ++++
.../board_inspector/board_inspector.py | 2 +-
.../board_inspector/cpuparser/cpuids.py | 6 +
.../board_inspector/cpuparser/msr.py | 31 ++-
.../board_inspector/cpuparser/platformbase.py | 12 ++
.../extractors/10-processors.py | 45 +++-
misc/config_tools/library/board_cfg_lib.py | 37 ++++
misc/config_tools/schema/config.xsd | 23 ++
misc/config_tools/schema/types.xsd | 36 ++++
.../static_allocators/cpu_freq.py | 198 ++++++++++++++++++
.../xforms/vm_configurations.c.xsl | 8 +
24 files changed, 580 insertions(+), 8 deletions(-)
create mode 100644 misc/config_tools/static_allocators/cpu_freq.py

--
2.25.1


Eddie Dong
 

-----Original Message-----
From: acrn-dev@... <acrn-dev@...> On
Behalf Of Zhou, Wu
Sent: Wednesday, August 10, 2022 2:25 AM
To: acrn-dev@...
Cc: Zhou, Wu <wu.zhou@...>
Subject: [acrn-dev] [PATCH 0/6] Add ACRN CPU frequency management

1. ACRN CPU frequency management's design The base design is to let ACRN
own CPU frequency control with those two
governors:
- Performance: CPU can run at its max possible frequency (turbo boost
will be activated if enabled).
- Nominal: CPU runs at its base frequency.
Users can choose which of the governors to use.
Users also must choose which frequency interface to use:
- HWP
- ACPI p-state
The HWP switch MSR IA32_PM_ENABLE is a global control (and it by-passes
ACPI cpufreq interface once enabled), so the frequency interface has to be a
global setting.
Somehow it is hard to understand... Please revisit carefully.


In general, I think hardware platform may support either HWP or ACPI method for frequency control.
ACRN HV should be able to manage the frequency automatically (no need for configuration tool to set). Do I miss anything?
To me, if HWP method is available, it is preferred. If not, ACPI method is the choice.


RTVMs needs certainty in latency, so their CPUs always run at base/nominal
frequency. Here are the combinations:
Cores not running any RTVM:
- Performance + HWP: Range from lowest to highest performance levels.
- Nominal + HWP: Fix to nominal performance level.
- Performance + P-state: Fix to highest p-state.
- Nominal + P-state: Fix to base P-state.
Cores running an RTVM:
- P-state: Fix to base P-state
- HWP: Fix to nominal performance level.

Just like the Linux cpufreq driver, a 'policy' object type is introduced to help
ACRN manage CPU frequency. 'policy' is a per CPU data type which indicates
“the highest/lowest/base CPU frequency limits under current HW and scenario
setup". It is statically allocated by config-tools, and do not need to be
configurated.
With policy given for each cores, the hypervisor doesn’t have to deal with
hardware or scenario settings. It only choses highest or base frequency to run
at.

The frequency management system is like this:

VM0 ... VM* (VMs have no CPU freq control)

ACRN
Governor (Performance/Nominal)
policy0 policy1 ... policy*
| | |
| | |
HWP or p-state HWP or p-state ... HWP or p-state
pCPU0 pCPU1 pCPU*
Does guest care about which method to be exposed?
It seems either way works. If YES, we don’t want to bother users to configure in tools.

A general policy is to "automatically" configure the system if it can.

We use the tool (to configure) for the reason of saving HV code. In here, I didn’t see the saving.


Major changes made to code are:
Configuration:
- Add a hypervisor config item which specifies the governor.
- Add a Boolean config item which specifies the preference of P-state
over HWP.
- Parse and hand over P-states and HWP info from ACPI namespace to the
hypervisor.
Hypervisor:
- Add the CPU frequency governors.
- Hide P-state and HWP in guest CPUID.
- Inject #GP(0) upon accesses to P-state and HWP related MSRs.
Device Model:
- Do not generate _PSS and _PPC for post-launched VMs.

2. ACPI p-state pass through
ACPI p-state control can be passed through to guest if it is not sharing pCPUs
with others. In this patch CPU sharing is detected by analyzing cpu_affinity
config in config-tools.

3. Dealing with CPU frequency domains
When CPUs are in a frequency domain, they share frequency on HW level, and
would always be working on the same frequency (of the highest one in the
group). A typical example is the group of 4 e-cores in ADL.

Those CPUs could be assigned to different VMs. This is no problem for none-
RTVMs. Because they would not mind to run at higher frequency.
But if one of those VMs is RTVM, we must choose between:
- Let all those CPUs run at base frequency for certainty.
- Let all those CPUs run at none-guaranteed turbo max frequency.
This patch has chosen base frequency for certainty.


Signed-off-by: Wu Zhou <wu.zhou@...>

*** BLURB HERE ***

Wu Zhou (6):
config_tools: add HV CPU frequency options to configurator
config_tools: extract CPU frequency info in board_inspector
config_tools: allocate CPU frequency policy
config_tools & hv: generate CPU frequency info code
hv: add CPU frequency driver in hv
hv: block guest eist/hwp cpuids and MSRs

devicemodel/include/types.h | 1 +
hypervisor/arch/x86/cpu.c | 5 +
hypervisor/arch/x86/guest/vcpuid.c | 10 +
hypervisor/arch/x86/guest/vmsr.c | 38 +++-
hypervisor/arch/x86/pm.c | 59 ++++++
hypervisor/common/hypercall.c | 2 +-
hypervisor/include/arch/x86/asm/board.h | 2 +
hypervisor/include/arch/x86/asm/cpuid.h | 2 +
hypervisor/include/arch/x86/asm/guest/vcpu.h | 2 +-
hypervisor/include/arch/x86/asm/host_pm.h | 2 +
hypervisor/include/arch/x86/asm/per_cpu.h | 1 +
hypervisor/include/arch/x86/asm/vm_config.h | 2 +
hypervisor/include/public/acrn_common.h | 29 +++
misc/config_tools/board_config/board_c.py | 35 ++++
.../board_inspector/board_inspector.py | 2 +-
.../board_inspector/cpuparser/cpuids.py | 6 +
.../board_inspector/cpuparser/msr.py | 31 ++-
.../board_inspector/cpuparser/platformbase.py | 12 ++
.../extractors/10-processors.py | 45 +++-
misc/config_tools/library/board_cfg_lib.py | 37 ++++
misc/config_tools/schema/config.xsd | 23 ++
misc/config_tools/schema/types.xsd | 36 ++++
.../static_allocators/cpu_freq.py | 198 ++++++++++++++++++
.../xforms/vm_configurations.c.xsl | 8 +
24 files changed, 580 insertions(+), 8 deletions(-) create mode 100644
misc/config_tools/static_allocators/cpu_freq.py

--
2.25.1





Zhou, Wu
 

Hi Eddie,

Thanks for the reply, please see my inline comments.

Regards
Zhou, Wu

-----Original Message-----
From: Dong, Eddie <eddie.dong@...>
Sent: Saturday, August 13, 2022 4:58 AM
To: acrn-dev@...
Cc: Zhou, Wu <wu.zhou@...>
Subject: RE: [acrn-dev] [PATCH 0/6] Add ACRN CPU frequency management



-----Original Message-----
From: acrn-dev@... <acrn-dev@...>
On Behalf Of Zhou, Wu
Sent: Wednesday, August 10, 2022 2:25 AM
To: acrn-dev@...
Cc: Zhou, Wu <wu.zhou@...>
Subject: [acrn-dev] [PATCH 0/6] Add ACRN CPU frequency management

1. ACRN CPU frequency management's design The base design is to let
ACRN own CPU frequency control with those two
governors:
- Performance: CPU can run at its max possible frequency (turbo boost
will be activated if enabled).
- Nominal: CPU runs at its base frequency.
Users can choose which of the governors to use.
Users also must choose which frequency interface to use:
- HWP
- ACPI p-state
The HWP switch MSR IA32_PM_ENABLE is a global control (and it
by-passes ACPI cpufreq interface once enabled), so the frequency
interface has to be a global setting.
Somehow it is hard to understand... Please revisit carefully.


In general, I think hardware platform may support either HWP or ACPI
method for frequency control.
ACRN HV should be able to manage the frequency automatically (no need for
configuration tool to set). Do I miss anything?
To me, if HWP method is available, it is preferred. If not, ACPI method is the
choice.
Yes, HWP should be preferred if available.

But if we want to pass through ACRN's legacy ACPI p-state tables to guests, the ACPI
option should exist. I think this is the only reason.

HWP enabling is global and it bypasses ACPI method. To make the ACPI pass-through
work, the entire CPU frequency system must be on ACPI mode.

If we don't provide the ACPI option, we should either:
- Do not support ACPI p-state passing through.
- Tell users to disable HWP in BIOS. However this BIOS option is not available on all
platforms.



RTVMs needs certainty in latency, so their CPUs always run at
base/nominal frequency. Here are the combinations:
Cores not running any RTVM:
- Performance + HWP: Range from lowest to highest performance levels.
- Nominal + HWP: Fix to nominal performance level.
- Performance + P-state: Fix to highest p-state.
- Nominal + P-state: Fix to base P-state.
Cores running an RTVM:
- P-state: Fix to base P-state
- HWP: Fix to nominal performance level.

Just like the Linux cpufreq driver, a 'policy' object type is
introduced to help ACRN manage CPU frequency. 'policy' is a per CPU
data type which indicates “the highest/lowest/base CPU frequency
limits under current HW and scenario setup". It is statically
allocated by config-tools, and do not need to be configurated.
With policy given for each cores, the hypervisor doesn’t have to deal
with hardware or scenario settings. It only choses highest or base
frequency to run at.

The frequency management system is like this:

VM0 ... VM* (VMs have no CPU freq control)

ACRN
Governor (Performance/Nominal)
policy0 policy1 ... policy*
| | |
| | |
HWP or p-state HWP or p-state ... HWP or p-state
pCPU0 pCPU1 pCPU*
Does guest care about which method to be exposed?
It seems either way works. If YES, we don’t want to bother users to
configure in tools.

A general policy is to "automatically" configure the system if it can.

We use the tool (to configure) for the reason of saving HV code. In here, I
didn’t see the saving.
As explained above, HWP should be the better choice unless guest has the need
to pass through ACPI p-state tables. User has to make a choice between:
- HWP that can adjust frequency automatically
- ACPI that can only get fixed frequency, but can be passed through


Our general policy is the "automatically" HWP, but there are many exceptions:
- We need to decide on what frequency to run when using 'nominal' governor.
This is decided by HWP capability MSR, ACPI p-state tables, turbo boost
enabling, max none turbo ratio, and other factors.
- We want to fix RTVM's pCPUs to a 'highest guaranteed' frequency.
- We have to deal with CPU's frequency domain. A typical scenario is that ADL's
e-cores share frequency in group of 4, in hardware level. Those frequency-linked
cores could be assigned to different RTVM/none-RTVM.

If all those works are done in hypervisor, that could be a lot of code.



Major changes made to code are:
Configuration:
- Add a hypervisor config item which specifies the governor.
- Add a Boolean config item which specifies the preference of P-state
over HWP.
- Parse and hand over P-states and HWP info from ACPI namespace to the
hypervisor.
Hypervisor:
- Add the CPU frequency governors.
- Hide P-state and HWP in guest CPUID.
- Inject #GP(0) upon accesses to P-state and HWP related MSRs.
Device Model:
- Do not generate _PSS and _PPC for post-launched VMs.

2. ACPI p-state pass through
ACPI p-state control can be passed through to guest if it is not
sharing pCPUs with others. In this patch CPU sharing is detected by
analyzing cpu_affinity config in config-tools.

3. Dealing with CPU frequency domains
When CPUs are in a frequency domain, they share frequency on HW level,
and would always be working on the same frequency (of the highest one
in the group). A typical example is the group of 4 e-cores in ADL.

Those CPUs could be assigned to different VMs. This is no problem for
none- RTVMs. Because they would not mind to run at higher frequency.
But if one of those VMs is RTVM, we must choose between:
- Let all those CPUs run at base frequency for certainty.
- Let all those CPUs run at none-guaranteed turbo max frequency.
This patch has chosen base frequency for certainty.


Signed-off-by: Wu Zhou <wu.zhou@...>

*** BLURB HERE ***

Wu Zhou (6):
config_tools: add HV CPU frequency options to configurator
config_tools: extract CPU frequency info in board_inspector
config_tools: allocate CPU frequency policy
config_tools & hv: generate CPU frequency info code
hv: add CPU frequency driver in hv
hv: block guest eist/hwp cpuids and MSRs

devicemodel/include/types.h | 1 +
hypervisor/arch/x86/cpu.c | 5 +
hypervisor/arch/x86/guest/vcpuid.c | 10 +
hypervisor/arch/x86/guest/vmsr.c | 38 +++-
hypervisor/arch/x86/pm.c | 59 ++++++
hypervisor/common/hypercall.c | 2 +-
hypervisor/include/arch/x86/asm/board.h | 2 +
hypervisor/include/arch/x86/asm/cpuid.h | 2 +
hypervisor/include/arch/x86/asm/guest/vcpu.h | 2 +-
hypervisor/include/arch/x86/asm/host_pm.h | 2 +
hypervisor/include/arch/x86/asm/per_cpu.h | 1 +
hypervisor/include/arch/x86/asm/vm_config.h | 2 +
hypervisor/include/public/acrn_common.h | 29 +++
misc/config_tools/board_config/board_c.py | 35 ++++
.../board_inspector/board_inspector.py | 2 +-
.../board_inspector/cpuparser/cpuids.py | 6 +
.../board_inspector/cpuparser/msr.py | 31 ++-
.../board_inspector/cpuparser/platformbase.py | 12 ++
.../extractors/10-processors.py | 45 +++-
misc/config_tools/library/board_cfg_lib.py | 37 ++++
misc/config_tools/schema/config.xsd | 23 ++
misc/config_tools/schema/types.xsd | 36 ++++
.../static_allocators/cpu_freq.py | 198 ++++++++++++++++++
.../xforms/vm_configurations.c.xsl | 8 +
24 files changed, 580 insertions(+), 8 deletions(-) create mode
100644 misc/config_tools/static_allocators/cpu_freq.py

--
2.25.1