build: move e2e dependencies into e2e/go.mod

Several packages are only used while running the e2e suite. These
packages are less important to update, as the they can not influence the
final executable that is part of the Ceph-CSI container-image.

By moving these dependencies out of the main Ceph-CSI go.mod, it is
easier to identify if a reported CVE affects Ceph-CSI, or only the
testing (like most of the Kubernetes CVEs).

Signed-off-by: Niels de Vos <ndevos@ibm.com>
This commit is contained in:
Niels de Vos
2025-03-04 08:57:28 +01:00
committed by mergify[bot]
parent 15da101b1b
commit bec6090996
8047 changed files with 1407827 additions and 3453 deletions

202
e2e/vendor/k8s.io/component-helpers/LICENSE generated vendored Normal file
View File

@ -0,0 +1,202 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@ -0,0 +1,58 @@
/*
Copyright 2020 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package topology
import (
"k8s.io/api/core/v1"
)
// GetZoneKey is a helper function that builds a string identifier that is unique per failure-zone;
// it returns empty-string for no zone.
// Since there are currently two separate zone keys:
// - "failure-domain.beta.kubernetes.io/zone"
// - "topology.kubernetes.io/zone"
//
// GetZoneKey will first check failure-domain.beta.kubernetes.io/zone and if not exists, will then check
// topology.kubernetes.io/zone
func GetZoneKey(node *v1.Node) string {
labels := node.Labels
if labels == nil {
return ""
}
// TODO: "failure-domain.beta..." names are deprecated, but will
// stick around a long time due to existing on old extant objects like PVs.
// Maybe one day we can stop considering them (see #88493).
zone, ok := labels[v1.LabelFailureDomainBetaZone]
if !ok {
zone, _ = labels[v1.LabelTopologyZone]
}
region, ok := labels[v1.LabelFailureDomainBetaRegion]
if !ok {
region, _ = labels[v1.LabelTopologyRegion]
}
if region == "" && zone == "" {
return ""
}
// We include the null character just in case region or failureDomain has a colon
// (We do assume there's no null characters in a region or failureDomain)
// As a nice side-benefit, the null character is not printed by fmt.Print or glog
return region + ":\x00:" + zone
}

View File

@ -0,0 +1,104 @@
/*
Copyright 2023 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package sysctl
import (
"strings"
)
// Namespace represents a kernel namespace name.
type Namespace string
const (
// refer to https://man7.org/linux/man-pages/man7/ipc_namespaces.7.html
// the Linux IPC namespace
IPCNamespace = Namespace("IPC")
// refer to https://man7.org/linux/man-pages/man7/network_namespaces.7.html
// the network namespace
NetNamespace = Namespace("Net")
// the zero value if no namespace is known
UnknownNamespace = Namespace("")
)
var nameToNamespace = map[string]Namespace{
// kernel semaphore parameters: SEMMSL, SEMMNS, SEMOPM, and SEMMNI.
"kernel.sem": IPCNamespace,
// kernel shared memory limits include shmall, shmmax, shmmni, and shm_rmid_forced.
"kernel.shmall": IPCNamespace,
"kernel.shmmax": IPCNamespace,
"kernel.shmmni": IPCNamespace,
"kernel.shm_rmid_forced": IPCNamespace,
// make backward compatibility to know the namespace of kernel.shm*
"kernel.shm": IPCNamespace,
// kernel messages include msgmni, msgmax and msgmnb.
"kernel.msgmax": IPCNamespace,
"kernel.msgmnb": IPCNamespace,
"kernel.msgmni": IPCNamespace,
// make backward compatibility to know the namespace of kernel.msg*
"kernel.msg": IPCNamespace,
}
var prefixToNamespace = map[string]Namespace{
"net": NetNamespace,
// mqueue filesystem provides the necessary kernel features to enable the creation
// of a user space library that implements the POSIX message queues API.
"fs.mqueue": IPCNamespace,
}
// namespaceOf returns the namespace of the Linux kernel for a sysctl, or
// unknownNamespace if the sysctl is not known to be namespaced.
// The second return is prefixed bool.
// It returns true if the key is prefixed with a key in the prefix map
func namespaceOf(val string) Namespace {
if ns, found := nameToNamespace[val]; found {
return ns
}
for p, ns := range prefixToNamespace {
if strings.HasPrefix(val, p+".") {
return ns
}
}
return UnknownNamespace
}
// GetNamespace extracts information from a sysctl string. It returns:
// 1. The sysctl namespace, which can be one of the following: IPC, Net, or unknown.
// 2. sysctlOrPrefix: the prefix of the sysctl parameter until the first '*'.
// If there is no '*', it will be the original string.
// 3. 'prefixed' is set to true if the sysctl parameter contains '*' or it is in the prefixToNamespace key list, in most cases, it is a suffix *.
//
// For example, if the input sysctl is 'net.ipv6.neigh.*', GetNamespace will return:
// - The Net namespace
// - The sysctlOrPrefix as 'net.ipv6.neigh'
// - 'prefixed' set to true
//
// For the input sysctl 'net.ipv6.conf.all.disable_ipv6', GetNamespace will return:
// - The Net namespace
// - The sysctlOrPrefix as 'net.ipv6.conf.all.disable_ipv6'
// - 'prefixed' set to false.
func GetNamespace(sysctl string) (ns Namespace, sysctlOrPrefix string, prefixed bool) {
sysctlOrPrefix = NormalizeName(sysctl)
firstIndex := strings.IndexAny(sysctlOrPrefix, "*")
if firstIndex != -1 {
sysctlOrPrefix = sysctlOrPrefix[:firstIndex]
prefixed = true
}
ns = namespaceOf(sysctlOrPrefix)
return
}

View File

@ -0,0 +1,131 @@
/*
Copyright 2015 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package sysctl
import (
"os"
"path"
"strconv"
"strings"
)
const (
sysctlBase = "/proc/sys"
// VMOvercommitMemory refers to the sysctl variable responsible for defining
// the memory over-commit policy used by kernel.
VMOvercommitMemory = "vm/overcommit_memory"
// VMPanicOnOOM refers to the sysctl variable responsible for defining
// the OOM behavior used by kernel.
VMPanicOnOOM = "vm/panic_on_oom"
// KernelPanic refers to the sysctl variable responsible for defining
// the timeout after a panic for the kernel to reboot.
KernelPanic = "kernel/panic"
// KernelPanicOnOops refers to the sysctl variable responsible for defining
// the kernel behavior when an oops or BUG is encountered.
KernelPanicOnOops = "kernel/panic_on_oops"
// RootMaxKeys refers to the sysctl variable responsible for defining
// the maximum number of keys that the root user (UID 0 in the root user namespace) may own.
RootMaxKeys = "kernel/keys/root_maxkeys"
// RootMaxBytes refers to the sysctl variable responsible for defining
// the maximum number of bytes of data that the root user (UID 0 in the root user namespace)
// can hold in the payloads of the keys owned by root.
RootMaxBytes = "kernel/keys/root_maxbytes"
// VMOvercommitMemoryAlways represents that kernel performs no memory over-commit handling.
VMOvercommitMemoryAlways = 1
// VMPanicOnOOMInvokeOOMKiller represents that kernel calls the oom_killer function when OOM occurs.
VMPanicOnOOMInvokeOOMKiller = 0
// KernelPanicOnOopsAlways represents that kernel panics on kernel oops.
KernelPanicOnOopsAlways = 1
// KernelPanicRebootTimeout is the timeout seconds after a panic for the kernel to reboot.
KernelPanicRebootTimeout = 10
// RootMaxKeysSetting is the maximum number of keys that the root user (UID 0 in the root user namespace) may own.
// Needed since docker creates a new key per container.
RootMaxKeysSetting = 1000000
// RootMaxBytesSetting is the maximum number of bytes of data that the root user (UID 0 in the root user namespace)
// can hold in the payloads of the keys owned by root.
// Allocate 25 bytes per key * number of MaxKeys.
RootMaxBytesSetting = RootMaxKeysSetting * 25
)
// Interface is an injectable interface for running sysctl commands.
type Interface interface {
// GetSysctl returns the value for the specified sysctl setting
GetSysctl(sysctl string) (int, error)
// SetSysctl modifies the specified sysctl flag to the new value
SetSysctl(sysctl string, newVal int) error
}
// New returns a new Interface for accessing sysctl
func New() Interface {
return &procSysctl{}
}
// procSysctl implements Interface by reading and writing files under /proc/sys
type procSysctl struct {
}
// GetSysctl returns the value for the specified sysctl setting
func (*procSysctl) GetSysctl(sysctl string) (int, error) {
data, err := os.ReadFile(path.Join(sysctlBase, sysctl))
if err != nil {
return -1, err
}
val, err := strconv.Atoi(strings.Trim(string(data), " \n"))
if err != nil {
return -1, err
}
return val, nil
}
// SetSysctl modifies the specified sysctl flag to the new value
func (*procSysctl) SetSysctl(sysctl string, newVal int) error {
return os.WriteFile(path.Join(sysctlBase, sysctl), []byte(strconv.Itoa(newVal)), 0640)
}
// NormalizeName can return sysctl variables in dots separator format.
// The '/' separator is also accepted in place of a '.'.
// Convert the sysctl variables to dots separator format for validation.
// More info:
//
// https://man7.org/linux/man-pages/man8/sysctl.8.html
// https://man7.org/linux/man-pages/man5/sysctl.d.5.html
func NormalizeName(val string) string {
if val == "" {
return val
}
firstSepIndex := strings.IndexAny(val, "./")
// if the first found is `.` like `net.ipv4.conf.eno2/100.rp_filter`
if firstSepIndex == -1 || val[firstSepIndex] == '.' {
return val
}
// for `net/ipv4/conf/eno2.100/rp_filter`, swap the use of `.` and `/`
// to `net.ipv4.conf.eno2/100.rp_filter`
f := func(r rune) rune {
switch r {
case '.':
return '/'
case '/':
return '.'
}
return r
}
return strings.Map(f, val)
}

13
e2e/vendor/k8s.io/component-helpers/resource/OWNERS generated vendored Normal file
View File

@ -0,0 +1,13 @@
# See the OWNERS docs at https://go.k8s.io/owners
options:
no_parent_owners: true
approvers:
- api-approvers
reviewers:
- sig-node-reviewers
- sig-scheduling
labels:
- sig/node
- sig/scheduling
- kind/api-change

382
e2e/vendor/k8s.io/component-helpers/resource/helpers.go generated vendored Normal file
View File

@ -0,0 +1,382 @@
/*
Copyright 2024 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package resource
import (
v1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/util/sets"
)
// ContainerType signifies container type
type ContainerType int
const (
// Containers is for normal containers
Containers ContainerType = 1 << iota
// InitContainers is for init containers
InitContainers
)
// PodResourcesOptions controls the behavior of PodRequests and PodLimits.
type PodResourcesOptions struct {
// Reuse, if provided will be reused to accumulate resources and returned by the PodRequests or PodLimits
// functions. All existing values in Reuse will be lost.
Reuse v1.ResourceList
// UseStatusResources indicates whether resources reported by the PodStatus should be considered
// when evaluating the pod resources. This MUST be false if the InPlacePodVerticalScaling
// feature is not enabled.
UseStatusResources bool
// ExcludeOverhead controls if pod overhead is excluded from the calculation.
ExcludeOverhead bool
// ContainerFn is called with the effective resources required for each container within the pod.
ContainerFn func(res v1.ResourceList, containerType ContainerType)
// NonMissingContainerRequests if provided will replace any missing container level requests for the specified resources
// with the given values. If the requests for those resources are explicitly set, even if zero, they will not be modified.
NonMissingContainerRequests v1.ResourceList
// SkipPodLevelResources controls whether pod-level resources should be skipped
// from the calculation. If pod-level resources are not set in PodSpec,
// pod-level resources will always be skipped.
SkipPodLevelResources bool
}
var supportedPodLevelResources = sets.New(v1.ResourceCPU, v1.ResourceMemory)
func SupportedPodLevelResources() sets.Set[v1.ResourceName] {
return supportedPodLevelResources
}
// IsSupportedPodLevelResources checks if a given resource is supported by pod-level
// resource management through the PodLevelResources feature. Returns true if
// the resource is supported.
func IsSupportedPodLevelResource(name v1.ResourceName) bool {
return supportedPodLevelResources.Has(name)
}
// IsPodLevelResourcesSet check if PodLevelResources pod-level resources are set.
// It returns true if either the Requests or Limits maps are non-empty.
func IsPodLevelResourcesSet(pod *v1.Pod) bool {
if pod.Spec.Resources == nil {
return false
}
if (len(pod.Spec.Resources.Requests) + len(pod.Spec.Resources.Limits)) == 0 {
return false
}
for resourceName := range pod.Spec.Resources.Requests {
if IsSupportedPodLevelResource(resourceName) {
return true
}
}
for resourceName := range pod.Spec.Resources.Limits {
if IsSupportedPodLevelResource(resourceName) {
return true
}
}
return false
}
// IsPodLevelRequestsSet checks if pod-level requests are set. It returns true if
// Requests map is non-empty.
func IsPodLevelRequestsSet(pod *v1.Pod) bool {
if pod.Spec.Resources == nil {
return false
}
if len(pod.Spec.Resources.Requests) == 0 {
return false
}
for resourceName := range pod.Spec.Resources.Requests {
if IsSupportedPodLevelResource(resourceName) {
return true
}
}
return false
}
// PodRequests computes the total pod requests per the PodResourcesOptions supplied.
// If PodResourcesOptions is nil, then the requests are returned including pod overhead.
// If the PodLevelResources feature is enabled AND the pod-level resources are set,
// those pod-level values are used in calculating Pod Requests.
// The computation is part of the API and must be reviewed as an API change.
func PodRequests(pod *v1.Pod, opts PodResourcesOptions) v1.ResourceList {
reqs := AggregateContainerRequests(pod, opts)
if !opts.SkipPodLevelResources && IsPodLevelRequestsSet(pod) {
for resourceName, quantity := range pod.Spec.Resources.Requests {
if IsSupportedPodLevelResource(resourceName) {
reqs[resourceName] = quantity
}
}
}
// Add overhead for running a pod to the sum of requests if requested:
if !opts.ExcludeOverhead && pod.Spec.Overhead != nil {
addResourceList(reqs, pod.Spec.Overhead)
}
return reqs
}
// AggregateContainerRequests computes the total resource requests of all the containers
// in a pod. This computation folows the formula defined in the KEP for sidecar
// containers. See https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/753-sidecar-containers#resources-calculation-for-scheduling-and-pod-admission
// for more details.
func AggregateContainerRequests(pod *v1.Pod, opts PodResourcesOptions) v1.ResourceList {
// attempt to reuse the maps if passed, or allocate otherwise
reqs := reuseOrClearResourceList(opts.Reuse)
var containerStatuses map[string]*v1.ContainerStatus
if opts.UseStatusResources {
containerStatuses = make(map[string]*v1.ContainerStatus, len(pod.Status.ContainerStatuses))
for i := range pod.Status.ContainerStatuses {
containerStatuses[pod.Status.ContainerStatuses[i].Name] = &pod.Status.ContainerStatuses[i]
}
}
for _, container := range pod.Spec.Containers {
containerReqs := container.Resources.Requests
if opts.UseStatusResources {
cs, found := containerStatuses[container.Name]
if found && cs.Resources != nil {
if pod.Status.Resize == v1.PodResizeStatusInfeasible {
containerReqs = cs.Resources.Requests.DeepCopy()
} else {
containerReqs = max(container.Resources.Requests, cs.Resources.Requests)
}
}
}
if len(opts.NonMissingContainerRequests) > 0 {
containerReqs = applyNonMissing(containerReqs, opts.NonMissingContainerRequests)
}
if opts.ContainerFn != nil {
opts.ContainerFn(containerReqs, Containers)
}
addResourceList(reqs, containerReqs)
}
restartableInitContainerReqs := v1.ResourceList{}
initContainerReqs := v1.ResourceList{}
// init containers define the minimum of any resource
// Note: In-place resize is not allowed for InitContainers, so no need to check for ResizeStatus value
//
// Let's say `InitContainerUse(i)` is the resource requirements when the i-th
// init container is initializing, then
// `InitContainerUse(i) = sum(Resources of restartable init containers with index < i) + Resources of i-th init container`.
//
// See https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/753-sidecar-containers#exposing-pod-resource-requirements for the detail.
for _, container := range pod.Spec.InitContainers {
containerReqs := container.Resources.Requests
if len(opts.NonMissingContainerRequests) > 0 {
containerReqs = applyNonMissing(containerReqs, opts.NonMissingContainerRequests)
}
if container.RestartPolicy != nil && *container.RestartPolicy == v1.ContainerRestartPolicyAlways {
// and add them to the resulting cumulative container requests
addResourceList(reqs, containerReqs)
// track our cumulative restartable init container resources
addResourceList(restartableInitContainerReqs, containerReqs)
containerReqs = restartableInitContainerReqs
} else {
tmp := v1.ResourceList{}
addResourceList(tmp, containerReqs)
addResourceList(tmp, restartableInitContainerReqs)
containerReqs = tmp
}
if opts.ContainerFn != nil {
opts.ContainerFn(containerReqs, InitContainers)
}
maxResourceList(initContainerReqs, containerReqs)
}
maxResourceList(reqs, initContainerReqs)
return reqs
}
// applyNonMissing will return a copy of the given resource list with any missing values replaced by the nonMissing values
func applyNonMissing(reqs v1.ResourceList, nonMissing v1.ResourceList) v1.ResourceList {
cp := v1.ResourceList{}
for k, v := range reqs {
cp[k] = v.DeepCopy()
}
for k, v := range nonMissing {
if _, found := reqs[k]; !found {
rk := cp[k]
rk.Add(v)
cp[k] = rk
}
}
return cp
}
// PodLimits computes the pod limits per the PodResourcesOptions supplied. If PodResourcesOptions is nil, then
// the limits are returned including pod overhead for any non-zero limits. The computation is part of the API and must be reviewed
// as an API change.
func PodLimits(pod *v1.Pod, opts PodResourcesOptions) v1.ResourceList {
// attempt to reuse the maps if passed, or allocate otherwise
limits := AggregateContainerLimits(pod, opts)
if !opts.SkipPodLevelResources && IsPodLevelResourcesSet(pod) {
for resourceName, quantity := range pod.Spec.Resources.Limits {
if IsSupportedPodLevelResource(resourceName) {
limits[resourceName] = quantity
}
}
}
// Add overhead to non-zero limits if requested:
if !opts.ExcludeOverhead && pod.Spec.Overhead != nil {
for name, quantity := range pod.Spec.Overhead {
if value, ok := limits[name]; ok && !value.IsZero() {
value.Add(quantity)
limits[name] = value
}
}
}
return limits
}
// AggregateContainerLimits computes the aggregated resource limits of all the containers
// in a pod. This computation follows the formula defined in the KEP for sidecar
// containers. See https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/753-sidecar-containers#resources-calculation-for-scheduling-and-pod-admission
// for more details.
func AggregateContainerLimits(pod *v1.Pod, opts PodResourcesOptions) v1.ResourceList {
// attempt to reuse the maps if passed, or allocate otherwise
limits := reuseOrClearResourceList(opts.Reuse)
var containerStatuses map[string]*v1.ContainerStatus
if opts.UseStatusResources {
containerStatuses = make(map[string]*v1.ContainerStatus, len(pod.Status.ContainerStatuses))
for i := range pod.Status.ContainerStatuses {
containerStatuses[pod.Status.ContainerStatuses[i].Name] = &pod.Status.ContainerStatuses[i]
}
}
for _, container := range pod.Spec.Containers {
containerLimits := container.Resources.Limits
if opts.UseStatusResources {
cs, found := containerStatuses[container.Name]
if found && cs.Resources != nil {
if pod.Status.Resize == v1.PodResizeStatusInfeasible {
containerLimits = cs.Resources.Limits.DeepCopy()
} else {
containerLimits = max(container.Resources.Limits, cs.Resources.Limits)
}
}
}
if opts.ContainerFn != nil {
opts.ContainerFn(containerLimits, Containers)
}
addResourceList(limits, containerLimits)
}
restartableInitContainerLimits := v1.ResourceList{}
initContainerLimits := v1.ResourceList{}
// init containers define the minimum of any resource
//
// Let's say `InitContainerUse(i)` is the resource requirements when the i-th
// init container is initializing, then
// `InitContainerUse(i) = sum(Resources of restartable init containers with index < i) + Resources of i-th init container`.
//
// See https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/753-sidecar-containers#exposing-pod-resource-requirements for the detail.
for _, container := range pod.Spec.InitContainers {
containerLimits := container.Resources.Limits
// Is the init container marked as a restartable init container?
if container.RestartPolicy != nil && *container.RestartPolicy == v1.ContainerRestartPolicyAlways {
addResourceList(limits, containerLimits)
// track our cumulative restartable init container resources
addResourceList(restartableInitContainerLimits, containerLimits)
containerLimits = restartableInitContainerLimits
} else {
tmp := v1.ResourceList{}
addResourceList(tmp, containerLimits)
addResourceList(tmp, restartableInitContainerLimits)
containerLimits = tmp
}
if opts.ContainerFn != nil {
opts.ContainerFn(containerLimits, InitContainers)
}
maxResourceList(initContainerLimits, containerLimits)
}
maxResourceList(limits, initContainerLimits)
return limits
}
// addResourceList adds the resources in newList to list.
func addResourceList(list, newList v1.ResourceList) {
for name, quantity := range newList {
if value, ok := list[name]; !ok {
list[name] = quantity.DeepCopy()
} else {
value.Add(quantity)
list[name] = value
}
}
}
// maxResourceList sets list to the greater of list/newList for every resource in newList
func maxResourceList(list, newList v1.ResourceList) {
for name, quantity := range newList {
if value, ok := list[name]; !ok || quantity.Cmp(value) > 0 {
list[name] = quantity.DeepCopy()
}
}
}
// max returns the result of max(a, b) for each named resource and is only used if we can't
// accumulate into an existing resource list
func max(a v1.ResourceList, b v1.ResourceList) v1.ResourceList {
result := v1.ResourceList{}
for key, value := range a {
if other, found := b[key]; found {
if value.Cmp(other) <= 0 {
result[key] = other.DeepCopy()
continue
}
}
result[key] = value.DeepCopy()
}
for key, value := range b {
if _, found := result[key]; !found {
result[key] = value.DeepCopy()
}
}
return result
}
// reuseOrClearResourceList is a helper for avoiding excessive allocations of
// resource lists within the inner loop of resource calculations.
func reuseOrClearResourceList(reuse v1.ResourceList) v1.ResourceList {
if reuse == nil {
return make(v1.ResourceList, 4)
}
for k := range reuse {
delete(reuse, k)
}
return reuse
}

View File

@ -0,0 +1,23 @@
/*
Copyright 2020 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
// Package corev1 defines functions which should satisfy one of the following:
//
// - Be used by more than one core component (kube-scheduler, kubelet, kube-apiserver, etc.)
// - Be used by a core component and another kubernetes project (cluster-autoscaler, descheduler)
//
// And be a scheduling feature.
package corev1 // import "k8s.io/component-helpers/scheduling/corev1"

View File

@ -0,0 +1,101 @@
/*
Copyright 2020 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package corev1
import (
"encoding/json"
v1 "k8s.io/api/core/v1"
"k8s.io/component-helpers/scheduling/corev1/nodeaffinity"
)
// PodPriority returns priority of the given pod.
func PodPriority(pod *v1.Pod) int32 {
if pod.Spec.Priority != nil {
return *pod.Spec.Priority
}
// When priority of a running pod is nil, it means it was created at a time
// that there was no global default priority class and the priority class
// name of the pod was empty. So, we resolve to the static default priority.
return 0
}
// MatchNodeSelectorTerms checks whether the node labels and fields match node selector terms in ORed;
// nil or empty term matches no objects.
func MatchNodeSelectorTerms(
node *v1.Node,
nodeSelector *v1.NodeSelector,
) (bool, error) {
if node == nil {
return false, nil
}
return nodeaffinity.NewLazyErrorNodeSelector(nodeSelector).Match(node)
}
// GetAvoidPodsFromNodeAnnotations scans the list of annotations and
// returns the pods that needs to be avoided for this node from scheduling
func GetAvoidPodsFromNodeAnnotations(annotations map[string]string) (v1.AvoidPods, error) {
var avoidPods v1.AvoidPods
if len(annotations) > 0 && annotations[v1.PreferAvoidPodsAnnotationKey] != "" {
err := json.Unmarshal([]byte(annotations[v1.PreferAvoidPodsAnnotationKey]), &avoidPods)
if err != nil {
return avoidPods, err
}
}
return avoidPods, nil
}
// TolerationsTolerateTaint checks if taint is tolerated by any of the tolerations.
func TolerationsTolerateTaint(tolerations []v1.Toleration, taint *v1.Taint) bool {
for i := range tolerations {
if tolerations[i].ToleratesTaint(taint) {
return true
}
}
return false
}
type taintsFilterFunc func(*v1.Taint) bool
// FindMatchingUntoleratedTaint checks if the given tolerations tolerates
// all the filtered taints, and returns the first taint without a toleration
// Returns true if there is an untolerated taint
// Returns false if all taints are tolerated
func FindMatchingUntoleratedTaint(taints []v1.Taint, tolerations []v1.Toleration, inclusionFilter taintsFilterFunc) (v1.Taint, bool) {
filteredTaints := getFilteredTaints(taints, inclusionFilter)
for _, taint := range filteredTaints {
if !TolerationsTolerateTaint(tolerations, &taint) {
return taint, true
}
}
return v1.Taint{}, false
}
// getFilteredTaints returns a list of taints satisfying the filter predicate
func getFilteredTaints(taints []v1.Taint, inclusionFilter taintsFilterFunc) []v1.Taint {
if inclusionFilter == nil {
return taints
}
filteredTaints := []v1.Taint{}
for _, taint := range taints {
if !inclusionFilter(&taint) {
continue
}
filteredTaints = append(filteredTaints, taint)
}
return filteredTaints
}

View File

@ -0,0 +1,333 @@
/*
Copyright 2020 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package nodeaffinity
import (
v1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/fields"
"k8s.io/apimachinery/pkg/labels"
"k8s.io/apimachinery/pkg/selection"
"k8s.io/apimachinery/pkg/util/errors"
"k8s.io/apimachinery/pkg/util/validation/field"
)
// NodeSelector is a runtime representation of v1.NodeSelector.
type NodeSelector struct {
lazy LazyErrorNodeSelector
}
// LazyErrorNodeSelector is a runtime representation of v1.NodeSelector that
// only reports parse errors when no terms match.
type LazyErrorNodeSelector struct {
terms []nodeSelectorTerm
}
// NewNodeSelector returns a NodeSelector or aggregate parsing errors found.
func NewNodeSelector(ns *v1.NodeSelector, opts ...field.PathOption) (*NodeSelector, error) {
lazy := NewLazyErrorNodeSelector(ns, opts...)
var errs []error
for _, term := range lazy.terms {
if len(term.parseErrs) > 0 {
errs = append(errs, term.parseErrs...)
}
}
if len(errs) != 0 {
return nil, errors.Flatten(errors.NewAggregate(errs))
}
return &NodeSelector{lazy: *lazy}, nil
}
// NewLazyErrorNodeSelector creates a NodeSelector that only reports parse
// errors when no terms match.
func NewLazyErrorNodeSelector(ns *v1.NodeSelector, opts ...field.PathOption) *LazyErrorNodeSelector {
p := field.ToPath(opts...)
parsedTerms := make([]nodeSelectorTerm, 0, len(ns.NodeSelectorTerms))
path := p.Child("nodeSelectorTerms")
for i, term := range ns.NodeSelectorTerms {
// nil or empty term selects no objects
if isEmptyNodeSelectorTerm(&term) {
continue
}
p := path.Index(i)
parsedTerms = append(parsedTerms, newNodeSelectorTerm(&term, p))
}
return &LazyErrorNodeSelector{
terms: parsedTerms,
}
}
// Match checks whether the node labels and fields match the selector terms, ORed;
// nil or empty term matches no objects.
func (ns *NodeSelector) Match(node *v1.Node) bool {
// parse errors are reported in NewNodeSelector.
match, _ := ns.lazy.Match(node)
return match
}
// Match checks whether the node labels and fields match the selector terms, ORed;
// nil or empty term matches no objects.
// Parse errors are only returned if no terms matched.
func (ns *LazyErrorNodeSelector) Match(node *v1.Node) (bool, error) {
if node == nil {
return false, nil
}
nodeLabels := labels.Set(node.Labels)
nodeFields := extractNodeFields(node)
var errs []error
for _, term := range ns.terms {
match, tErrs := term.match(nodeLabels, nodeFields)
if len(tErrs) > 0 {
errs = append(errs, tErrs...)
continue
}
if match {
return true, nil
}
}
return false, errors.Flatten(errors.NewAggregate(errs))
}
// PreferredSchedulingTerms is a runtime representation of []v1.PreferredSchedulingTerms.
type PreferredSchedulingTerms struct {
terms []preferredSchedulingTerm
}
// NewPreferredSchedulingTerms returns a PreferredSchedulingTerms or all the parsing errors found.
// If a v1.PreferredSchedulingTerm has a 0 weight, its parsing is skipped.
func NewPreferredSchedulingTerms(terms []v1.PreferredSchedulingTerm, opts ...field.PathOption) (*PreferredSchedulingTerms, error) {
p := field.ToPath(opts...)
var errs []error
parsedTerms := make([]preferredSchedulingTerm, 0, len(terms))
for i, term := range terms {
path := p.Index(i)
if term.Weight == 0 || isEmptyNodeSelectorTerm(&term.Preference) {
continue
}
parsedTerm := preferredSchedulingTerm{
nodeSelectorTerm: newNodeSelectorTerm(&term.Preference, path),
weight: int(term.Weight),
}
if len(parsedTerm.parseErrs) > 0 {
errs = append(errs, parsedTerm.parseErrs...)
} else {
parsedTerms = append(parsedTerms, parsedTerm)
}
}
if len(errs) != 0 {
return nil, errors.Flatten(errors.NewAggregate(errs))
}
return &PreferredSchedulingTerms{terms: parsedTerms}, nil
}
// Score returns a score for a Node: the sum of the weights of the terms that
// match the Node.
func (t *PreferredSchedulingTerms) Score(node *v1.Node) int64 {
var score int64
nodeLabels := labels.Set(node.Labels)
nodeFields := extractNodeFields(node)
for _, term := range t.terms {
// parse errors are reported in NewPreferredSchedulingTerms.
if ok, _ := term.match(nodeLabels, nodeFields); ok {
score += int64(term.weight)
}
}
return score
}
func isEmptyNodeSelectorTerm(term *v1.NodeSelectorTerm) bool {
return len(term.MatchExpressions) == 0 && len(term.MatchFields) == 0
}
func extractNodeFields(n *v1.Node) fields.Set {
f := make(fields.Set)
if len(n.Name) > 0 {
f["metadata.name"] = n.Name
}
return f
}
type nodeSelectorTerm struct {
matchLabels labels.Selector
matchFields fields.Selector
parseErrs []error
}
func newNodeSelectorTerm(term *v1.NodeSelectorTerm, path *field.Path) nodeSelectorTerm {
var parsedTerm nodeSelectorTerm
var errs []error
if len(term.MatchExpressions) != 0 {
p := path.Child("matchExpressions")
parsedTerm.matchLabels, errs = nodeSelectorRequirementsAsSelector(term.MatchExpressions, p)
if errs != nil {
parsedTerm.parseErrs = append(parsedTerm.parseErrs, errs...)
}
}
if len(term.MatchFields) != 0 {
p := path.Child("matchFields")
parsedTerm.matchFields, errs = nodeSelectorRequirementsAsFieldSelector(term.MatchFields, p)
if errs != nil {
parsedTerm.parseErrs = append(parsedTerm.parseErrs, errs...)
}
}
return parsedTerm
}
func (t *nodeSelectorTerm) match(nodeLabels labels.Set, nodeFields fields.Set) (bool, []error) {
if t.parseErrs != nil {
return false, t.parseErrs
}
if t.matchLabels != nil && !t.matchLabels.Matches(nodeLabels) {
return false, nil
}
if t.matchFields != nil && len(nodeFields) > 0 && !t.matchFields.Matches(nodeFields) {
return false, nil
}
return true, nil
}
var validSelectorOperators = []v1.NodeSelectorOperator{
v1.NodeSelectorOpIn,
v1.NodeSelectorOpNotIn,
v1.NodeSelectorOpExists,
v1.NodeSelectorOpDoesNotExist,
v1.NodeSelectorOpGt,
v1.NodeSelectorOpLt,
}
// nodeSelectorRequirementsAsSelector converts the []NodeSelectorRequirement api type into a struct that implements
// labels.Selector.
func nodeSelectorRequirementsAsSelector(nsm []v1.NodeSelectorRequirement, path *field.Path) (labels.Selector, []error) {
if len(nsm) == 0 {
return labels.Nothing(), nil
}
var errs []error
selector := labels.NewSelector()
for i, expr := range nsm {
p := path.Index(i)
var op selection.Operator
switch expr.Operator {
case v1.NodeSelectorOpIn:
op = selection.In
case v1.NodeSelectorOpNotIn:
op = selection.NotIn
case v1.NodeSelectorOpExists:
op = selection.Exists
case v1.NodeSelectorOpDoesNotExist:
op = selection.DoesNotExist
case v1.NodeSelectorOpGt:
op = selection.GreaterThan
case v1.NodeSelectorOpLt:
op = selection.LessThan
default:
errs = append(errs, field.NotSupported(p.Child("operator"), expr.Operator, validSelectorOperators))
continue
}
r, err := labels.NewRequirement(expr.Key, op, expr.Values, field.WithPath(p))
if err != nil {
errs = append(errs, err)
} else {
selector = selector.Add(*r)
}
}
if len(errs) != 0 {
return nil, errs
}
return selector, nil
}
var validFieldSelectorOperators = []v1.NodeSelectorOperator{
v1.NodeSelectorOpIn,
v1.NodeSelectorOpNotIn,
}
// nodeSelectorRequirementsAsFieldSelector converts the []NodeSelectorRequirement core type into a struct that implements
// fields.Selector.
func nodeSelectorRequirementsAsFieldSelector(nsr []v1.NodeSelectorRequirement, path *field.Path) (fields.Selector, []error) {
if len(nsr) == 0 {
return fields.Nothing(), nil
}
var errs []error
var selectors []fields.Selector
for i, expr := range nsr {
p := path.Index(i)
switch expr.Operator {
case v1.NodeSelectorOpIn:
if len(expr.Values) != 1 {
errs = append(errs, field.Invalid(p.Child("values"), expr.Values, "must have one element"))
} else {
selectors = append(selectors, fields.OneTermEqualSelector(expr.Key, expr.Values[0]))
}
case v1.NodeSelectorOpNotIn:
if len(expr.Values) != 1 {
errs = append(errs, field.Invalid(p.Child("values"), expr.Values, "must have one element"))
} else {
selectors = append(selectors, fields.OneTermNotEqualSelector(expr.Key, expr.Values[0]))
}
default:
errs = append(errs, field.NotSupported(p.Child("operator"), expr.Operator, validFieldSelectorOperators))
}
}
if len(errs) != 0 {
return nil, errs
}
return fields.AndSelectors(selectors...), nil
}
type preferredSchedulingTerm struct {
nodeSelectorTerm
weight int
}
type RequiredNodeAffinity struct {
labelSelector labels.Selector
nodeSelector *LazyErrorNodeSelector
}
// GetRequiredNodeAffinity returns the parsing result of pod's nodeSelector and nodeAffinity.
func GetRequiredNodeAffinity(pod *v1.Pod) RequiredNodeAffinity {
var selector labels.Selector
if len(pod.Spec.NodeSelector) > 0 {
selector = labels.SelectorFromSet(pod.Spec.NodeSelector)
}
// Use LazyErrorNodeSelector for backwards compatibility of parsing errors.
var affinity *LazyErrorNodeSelector
if pod.Spec.Affinity != nil &&
pod.Spec.Affinity.NodeAffinity != nil &&
pod.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution != nil {
affinity = NewLazyErrorNodeSelector(pod.Spec.Affinity.NodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution)
}
return RequiredNodeAffinity{labelSelector: selector, nodeSelector: affinity}
}
// Match checks whether the pod is schedulable onto nodes according to
// the requirements in both nodeSelector and nodeAffinity.
func (s RequiredNodeAffinity) Match(node *v1.Node) (bool, error) {
if s.labelSelector != nil {
if !s.labelSelector.Matches(labels.Set(node.Labels)) {
return false, nil
}
}
if s.nodeSelector != nil {
return s.nodeSelector.Match(node)
}
return true, nil
}

View File

@ -0,0 +1,57 @@
/*
Copyright 2021 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
// Package ephemeral provides code that supports the usual pattern
// for accessing the PVC that provides a generic ephemeral inline volume:
//
// - determine the PVC name that corresponds to the inline volume source
// - retrieve the PVC
// - verify that the PVC is owned by the pod
// - use the PVC
package ephemeral
import (
"fmt"
v1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// VolumeClaimName returns the name of the PersistentVolumeClaim
// object that gets created for the generic ephemeral inline volume. The
// name is deterministic and therefore this function does not need any
// additional information besides the Pod name and volume name and it
// will never fail.
//
// Before using the PVC for the Pod, the caller must check that it is
// indeed the PVC that was created for the Pod by calling IsUsable.
func VolumeClaimName(pod *v1.Pod, volume *v1.Volume) string {
return pod.Name + "-" + volume.Name
}
// VolumeIsForPod checks that the PVC is the ephemeral volume that
// was created for the Pod. It returns an error that is informative
// enough to be returned by the caller without adding further details
// about the Pod or PVC.
func VolumeIsForPod(pod *v1.Pod, pvc *v1.PersistentVolumeClaim) error {
// Checking the namespaces is just a precaution. The caller should
// never pass in a PVC that isn't from the same namespace as the
// Pod.
if pvc.Namespace != pod.Namespace || !metav1.IsControlledBy(pvc, pod) {
return fmt.Errorf("PVC %s/%s was not created for pod %s/%s (pod is not owner)", pvc.Namespace, pvc.Name, pod.Namespace, pod.Name)
}
return nil
}

View File

@ -0,0 +1,84 @@
/*
Copyright 2020 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package volume
import (
"fmt"
v1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/component-helpers/scheduling/corev1"
)
// PersistentVolumeClaimHasClass returns true if given claim has set StorageClassName field.
func PersistentVolumeClaimHasClass(claim *v1.PersistentVolumeClaim) bool {
// Use beta annotation first
if _, found := claim.Annotations[v1.BetaStorageClassAnnotation]; found {
return true
}
if claim.Spec.StorageClassName != nil {
return true
}
return false
}
// GetPersistentVolumeClaimClass returns StorageClassName. If no storage class was
// requested, it returns "".
func GetPersistentVolumeClaimClass(claim *v1.PersistentVolumeClaim) string {
// Use beta annotation first
if class, found := claim.Annotations[v1.BetaStorageClassAnnotation]; found {
return class
}
if claim.Spec.StorageClassName != nil {
return *claim.Spec.StorageClassName
}
return ""
}
// GetPersistentVolumeClass returns StorageClassName.
func GetPersistentVolumeClass(volume *v1.PersistentVolume) string {
// Use beta annotation first
if class, found := volume.Annotations[v1.BetaStorageClassAnnotation]; found {
return class
}
return volume.Spec.StorageClassName
}
// CheckNodeAffinity looks at the PV node affinity, and checks if the node has the same corresponding labels
// This ensures that we don't mount a volume that doesn't belong to this node
func CheckNodeAffinity(pv *v1.PersistentVolume, nodeLabels map[string]string) error {
if pv.Spec.NodeAffinity == nil {
return nil
}
if pv.Spec.NodeAffinity.Required != nil {
node := &v1.Node{ObjectMeta: metav1.ObjectMeta{Labels: nodeLabels}}
terms := pv.Spec.NodeAffinity.Required
if matches, err := corev1.MatchNodeSelectorTerms(node, terms); err != nil {
return err
} else if !matches {
return fmt.Errorf("no matching NodeSelectorTerms")
}
}
return nil
}

View File

@ -0,0 +1,363 @@
/*
Copyright 2019 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package volume
import (
"fmt"
v1 "k8s.io/api/core/v1"
storage "k8s.io/api/storage/v1"
apierrors "k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/api/resource"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/labels"
"k8s.io/client-go/kubernetes/scheme"
storagelisters "k8s.io/client-go/listers/storage/v1"
"k8s.io/client-go/tools/reference"
"k8s.io/utils/ptr"
)
const (
// AnnBindCompleted Annotation applies to PVCs. It indicates that the lifecycle
// of the PVC has passed through the initial setup. This information changes how
// we interpret some observations of the state of the objects. Value of this
// Annotation does not matter.
AnnBindCompleted = "pv.kubernetes.io/bind-completed"
// AnnBoundByController annotation applies to PVs and PVCs. It indicates that
// the binding (PV->PVC or PVC->PV) was installed by the controller. The
// absence of this annotation means the binding was done by the user (i.e.
// pre-bound). Value of this annotation does not matter.
// External PV binders must bind PV the same way as PV controller, otherwise PV
// controller may not handle it correctly.
AnnBoundByController = "pv.kubernetes.io/bound-by-controller"
// AnnSelectedNode annotation is added to a PVC that has been triggered by scheduler to
// be dynamically provisioned. Its value is the name of the selected node.
AnnSelectedNode = "volume.kubernetes.io/selected-node"
// NotSupportedProvisioner is a special provisioner name which can be set
// in storage class to indicate dynamic provisioning is not supported by
// the storage.
NotSupportedProvisioner = "kubernetes.io/no-provisioner"
// AnnDynamicallyProvisioned annotation is added to a PV that has been dynamically provisioned by
// Kubernetes. Its value is name of volume plugin that created the volume.
// It serves both user (to show where a PV comes from) and Kubernetes (to
// recognize dynamically provisioned PVs in its decisions).
AnnDynamicallyProvisioned = "pv.kubernetes.io/provisioned-by"
// AnnMigratedTo annotation is added to a PVC and PV that is supposed to be
// dynamically provisioned/deleted by by its corresponding CSI driver
// through the CSIMigration feature flags. When this annotation is set the
// Kubernetes components will "stand-down" and the external-provisioner will
// act on the objects
AnnMigratedTo = "pv.kubernetes.io/migrated-to"
// AnnStorageProvisioner annotation is added to a PVC that is supposed to be dynamically
// provisioned. Its value is name of volume plugin that is supposed to provision
// a volume for this PVC.
// TODO: remove beta anno once deprecation period ends
AnnStorageProvisioner = "volume.kubernetes.io/storage-provisioner"
AnnBetaStorageProvisioner = "volume.beta.kubernetes.io/storage-provisioner"
//PVDeletionProtectionFinalizer is the finalizer added by the external-provisioner on the PV
PVDeletionProtectionFinalizer = "external-provisioner.volume.kubernetes.io/finalizer"
// PVDeletionInTreeProtectionFinalizer is the finalizer added to protect PV deletion for in-tree volumes.
PVDeletionInTreeProtectionFinalizer = "kubernetes.io/pv-controller"
)
// IsDelayBindingProvisioning checks if claim provisioning with selected-node annotation
func IsDelayBindingProvisioning(claim *v1.PersistentVolumeClaim) bool {
// When feature VolumeScheduling enabled,
// Scheduler signal to the PV controller to start dynamic
// provisioning by setting the "AnnSelectedNode" annotation
// in the PVC
_, ok := claim.Annotations[AnnSelectedNode]
return ok
}
// IsDelayBindingMode checks if claim is in delay binding mode.
func IsDelayBindingMode(claim *v1.PersistentVolumeClaim, classLister storagelisters.StorageClassLister) (bool, error) {
className := GetPersistentVolumeClaimClass(claim)
if className == "" {
return false, nil
}
class, err := classLister.Get(className)
if err != nil {
if apierrors.IsNotFound(err) {
return false, nil
}
return false, err
}
if class.VolumeBindingMode == nil {
return false, fmt.Errorf("VolumeBindingMode not set for StorageClass %q", className)
}
return *class.VolumeBindingMode == storage.VolumeBindingWaitForFirstConsumer, nil
}
// GetBindVolumeToClaim returns a new volume which is bound to given claim. In
// addition, it returns a bool which indicates whether we made modification on
// original volume.
func GetBindVolumeToClaim(volume *v1.PersistentVolume, claim *v1.PersistentVolumeClaim) (*v1.PersistentVolume, bool, error) {
dirty := false
// Check if the volume was already bound (either by user or by controller)
shouldSetBoundByController := false
if !IsVolumeBoundToClaim(volume, claim) {
shouldSetBoundByController = true
}
// The volume from method args can be pointing to watcher cache. We must not
// modify these, therefore create a copy.
volumeClone := volume.DeepCopy()
// Bind the volume to the claim if it is not bound yet
if volume.Spec.ClaimRef == nil ||
volume.Spec.ClaimRef.Name != claim.Name ||
volume.Spec.ClaimRef.Namespace != claim.Namespace ||
volume.Spec.ClaimRef.UID != claim.UID {
claimRef, err := reference.GetReference(scheme.Scheme, claim)
if err != nil {
return nil, false, fmt.Errorf("unexpected error getting claim reference: %w", err)
}
volumeClone.Spec.ClaimRef = claimRef
dirty = true
}
// Set AnnBoundByController if it is not set yet
if shouldSetBoundByController && !metav1.HasAnnotation(volumeClone.ObjectMeta, AnnBoundByController) {
metav1.SetMetaDataAnnotation(&volumeClone.ObjectMeta, AnnBoundByController, "yes")
dirty = true
}
return volumeClone, dirty, nil
}
// IsVolumeBoundToClaim returns true, if given volume is pre-bound or bound
// to specific claim. Both claim.Name and claim.Namespace must be equal.
// If claim.UID is present in volume.Spec.ClaimRef, it must be equal too.
func IsVolumeBoundToClaim(volume *v1.PersistentVolume, claim *v1.PersistentVolumeClaim) bool {
if volume.Spec.ClaimRef == nil {
return false
}
if claim.Name != volume.Spec.ClaimRef.Name || claim.Namespace != volume.Spec.ClaimRef.Namespace {
return false
}
if volume.Spec.ClaimRef.UID != "" && claim.UID != volume.Spec.ClaimRef.UID {
return false
}
return true
}
// FindMatchingVolume goes through the list of volumes to find the best matching volume
// for the claim.
//
// This function is used by both the PV controller and scheduler.
//
// delayBinding is true only in the PV controller path. When set, prebound PVs are still returned
// as a match for the claim, but unbound PVs are skipped.
//
// node is set only in the scheduler path. When set, the PV node affinity is checked against
// the node's labels.
//
// excludedVolumes is only used in the scheduler path, and is needed for evaluating multiple
// unbound PVCs for a single Pod at one time. As each PVC finds a matching PV, the chosen
// PV needs to be excluded from future matching.
func FindMatchingVolume(
claim *v1.PersistentVolumeClaim,
volumes []*v1.PersistentVolume,
node *v1.Node,
excludedVolumes map[string]*v1.PersistentVolume,
delayBinding bool,
vacEnabled bool) (*v1.PersistentVolume, error) {
if !vacEnabled {
claimVAC := ptr.Deref(claim.Spec.VolumeAttributesClassName, "")
if claimVAC != "" {
return nil, fmt.Errorf("unsupported volumeAttributesClassName is set on claim %s when the feature-gate VolumeAttributesClass is disabled", claimToClaimKey(claim))
}
}
var smallestVolume *v1.PersistentVolume
var smallestVolumeQty resource.Quantity
requestedQty := claim.Spec.Resources.Requests[v1.ResourceName(v1.ResourceStorage)]
requestedClass := GetPersistentVolumeClaimClass(claim)
var selector labels.Selector
if claim.Spec.Selector != nil {
internalSelector, err := metav1.LabelSelectorAsSelector(claim.Spec.Selector)
if err != nil {
return nil, fmt.Errorf("error creating internal label selector for claim: %v: %v", claimToClaimKey(claim), err)
}
selector = internalSelector
}
// Go through all available volumes with two goals:
// - find a volume that is either pre-bound by user or dynamically
// provisioned for this claim. Because of this we need to loop through
// all volumes.
// - find the smallest matching one if there is no volume pre-bound to
// the claim.
for _, volume := range volumes {
if _, ok := excludedVolumes[volume.Name]; ok {
// Skip volumes in the excluded list
continue
}
if volume.Spec.ClaimRef != nil && !IsVolumeBoundToClaim(volume, claim) {
continue
}
volumeQty := volume.Spec.Capacity[v1.ResourceStorage]
if volumeQty.Cmp(requestedQty) < 0 {
continue
}
// filter out mismatching volumeModes
if CheckVolumeModeMismatches(&claim.Spec, &volume.Spec) {
continue
}
claimVAC := ptr.Deref(claim.Spec.VolumeAttributesClassName, "")
volumeVAC := ptr.Deref(volume.Spec.VolumeAttributesClassName, "")
// filter out mismatching volumeAttributesClassName
if vacEnabled && claimVAC != volumeVAC {
continue
}
if !vacEnabled && volumeVAC != "" {
// when the feature gate is disabled, the PV object has VAC set, then we should not bind at all.
continue
}
// check if PV's DeletionTimeStamp is set, if so, skip this volume.
if volume.ObjectMeta.DeletionTimestamp != nil {
continue
}
nodeAffinityValid := true
if node != nil {
// Scheduler path, check that the PV NodeAffinity
// is satisfied by the node
// CheckNodeAffinity is the most expensive call in this loop.
// We should check cheaper conditions first or consider optimizing this function.
err := CheckNodeAffinity(volume, node.Labels)
if err != nil {
nodeAffinityValid = false
}
}
if IsVolumeBoundToClaim(volume, claim) {
// If PV node affinity is invalid, return no match.
// This means the prebound PV (and therefore PVC)
// is not suitable for this node.
if !nodeAffinityValid {
return nil, nil
}
return volume, nil
}
if node == nil && delayBinding {
// PV controller does not bind this claim.
// Scheduler will handle binding unbound volumes
// Scheduler path will have node != nil
continue
}
// filter out:
// - volumes in non-available phase
// - volumes whose labels don't match the claim's selector, if specified
// - volumes in Class that is not requested
// - volumes whose NodeAffinity does not match the node
if volume.Status.Phase != v1.VolumeAvailable {
// We ignore volumes in non-available phase, because volumes that
// satisfies matching criteria will be updated to available, binding
// them now has high chance of encountering unnecessary failures
// due to API conflicts.
continue
} else if selector != nil && !selector.Matches(labels.Set(volume.Labels)) {
continue
}
if GetPersistentVolumeClass(volume) != requestedClass {
continue
}
if !nodeAffinityValid {
continue
}
if node != nil {
// Scheduler path
// Check that the access modes match
if !CheckAccessModes(claim, volume) {
continue
}
}
if smallestVolume == nil || smallestVolumeQty.Cmp(volumeQty) > 0 {
smallestVolume = volume
smallestVolumeQty = volumeQty
}
}
if smallestVolume != nil {
// Found a matching volume
return smallestVolume, nil
}
return nil, nil
}
// CheckVolumeModeMismatches is a convenience method that checks volumeMode for PersistentVolume
// and PersistentVolumeClaims
func CheckVolumeModeMismatches(pvcSpec *v1.PersistentVolumeClaimSpec, pvSpec *v1.PersistentVolumeSpec) bool {
// In HA upgrades, we cannot guarantee that the apiserver is on a version >= controller-manager.
// So we default a nil volumeMode to filesystem
requestedVolumeMode := v1.PersistentVolumeFilesystem
if pvcSpec.VolumeMode != nil {
requestedVolumeMode = *pvcSpec.VolumeMode
}
pvVolumeMode := v1.PersistentVolumeFilesystem
if pvSpec.VolumeMode != nil {
pvVolumeMode = *pvSpec.VolumeMode
}
return requestedVolumeMode != pvVolumeMode
}
// CheckAccessModes returns true if PV satisfies all the PVC's requested AccessModes
func CheckAccessModes(claim *v1.PersistentVolumeClaim, volume *v1.PersistentVolume) bool {
pvModesMap := map[v1.PersistentVolumeAccessMode]bool{}
for _, mode := range volume.Spec.AccessModes {
pvModesMap[mode] = true
}
for _, mode := range claim.Spec.AccessModes {
_, ok := pvModesMap[mode]
if !ok {
return false
}
}
return true
}
func claimToClaimKey(claim *v1.PersistentVolumeClaim) string {
return fmt.Sprintf("%s/%s", claim.Namespace, claim.Name)
}