This commit adds a gRPC middleware that logs calls that
keep running after their deadline.
Adds --logslowopinterval cmdline argument to pass the log rate.
Signed-off-by: Robert Vasek <robert.vasek@clyso.com>
There has been some confusion about using different variables for the
InstanceID of the RBD-driver. By removing the global variable
CSIInstanceID, there should be no confusion anymore what variable to
use.
Signed-off-by: Niels de Vos <ndevos@ibm.com>
setting pod limit only for nodeserver for below
reasons
* We dont execute any commands with CLI
anymore in controller service
* Controller deployment is not privileged
enough to set the pid limits.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
This commit makes use of crush location labels from node
labels to supply `crush_location` and `read_from_replica=localize`
options during rbd map cmd. Using these options, ceph
will be able to redirect reads to the closest OSD,
improving performance.
Signed-off-by: Rakshith R <rar@redhat.com>
As we added support to set the metadata on the rbd images created for
the PVC and volume snapshot, by default metadata is set on all the images.
As we have seen we are hitting issues#2327 a lot of times with this,
we start to leave a lot of stale images. Currently, we rely on
`--extra-create-metadata=true` to decide to set the metadata or not,
we cannot set this option to false to disable setting metadata because we
use this for encryption too.
This changes is to provide an option to disable setting the image
metadata when starting cephcsi.
Fixes: #3009
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Kubernetes 1.24 and newer use a different path for staging the volume.
That means the CSI-driver is requested to mount the volume at an other
location, compared to previous versions of Kubernetes. CSI-drivers
implementing the volumeHealer, must receive the correct path, otherwise
the after a nodeplugin restart the NBD mounts will bailout attempting
to NodeStageVolume() call and return an error.
See-also: kubernetes/kubernetes#107065
Fixes: #3176
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
This change helps read the cluster name from the cmdline args,
the provisioner will set the same on the RBD images.
Fixes: #2973
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Move the printing of the version and other information to its own
function. This reduces the complexity enough so that golang-ci does not
complain about it anymore.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The --csi-addons-endpoint= option has been added recently, but was not
configured in the deployment files yet. The socket was incorrectly
created as `/csi-addons.sock`, by correcting the URL to the socket, the
socket now gets created as `/tmp/csi-addons.sock` in the same directory
as other sockets.
If an endpoint is a UNIX Domain Socket, the format needs to be
`unix:///path/to/socket`. The `unix://` URL format allows an
authentication provider to be added directly after the `//`. If there is
no authentication provider needed, the field can remain empty. After the
authetication provider, the full path needs to be specified, starting
with a `/`. This means that URLs to a UDS should start with `unix:///`
in normal cases.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The rbd package contains several functions that can be used by
CSI-Addons Service implmentations. Unfortunately it is not possible to
do this, as the rbd-driver needs to import the csi-addons/rbd package to
provide the CSI-Addons server. This causes a circular import when
services use the rbd package:
- rbd/driver.go import csi-addons/rbd
- csi-addons/rbd import rbd (including the driver)
By moving rbd/driver.go into its own package, the circular import can be
prevented.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
This commit calls WriteCephConfig() in cephcsi.go to
create ceph.conf and keyring if it is not mounted to
be used by all cli calls and conn cmds.
Before this change, rbd-controller/omap-generator did not create
ceph.conf on startup.
Signed-off-by: Rakshith R <rar@redhat.com>
Moving the log functions into its own internal/util/log package makes it
possible to split out the humongous internal/util packages in further
smaller pieces. This reduces the inter-dependencies between utility
functions and components, preventing circular dependencies which are not
allowed in Go.
Updates: #852
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Problem:
-------
For rbd nbd userspace mounter backends, after a restart of the nodeplugin
all the mounts will start seeing IO errors. This is because, for rbd-nbd
backends there will be a userspace mount daemon running per volume, post
restart of the nodeplugin pod, there is no way to restore the daemons
back to life.
Solution:
--------
The volume healer is a one-time activity that is triggered at the startup
time of the rbd nodeplugin. It navigates through the list of volume
attachments on the node and acts accordingly.
For now, it is limited to nbd type storage only, but it is flexible and
can be extended in the future for other backend types as needed.
From a few feets above:
This solves a severe problem for nbd backed csi volumes. The healer while
going through the list of volume attachments on the node, if finds the
volume is in attached state and is of type nbd, then it will attempt to
fix the rbd-nbd volumes by sending a NodeStageVolume request with the
required volume attributes like secrets, device name, image attributes,
and etc.. which will finally help start the required rbd-nbd daemons in
the nodeplugin csi-rbdplugin container. This will allow reattaching the
backend images with the right nbd device, thus allowing the applications
to perform IO without any interruptions even after a nodeplugin restart.
Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
This commit resolves godot linter issue
which says "Comment should end in a period (godot)".
Updates: #1586
Signed-off-by: Yati Padia <ypadia@redhat.com>
We have many declarations and invocations..etc
with long lines which are very difficult to follow while doing
code reading. This address the issues in 'cmd' package to restrict
the line length to 120 chars.
Signed-off-by: Humble Chirammal <hchiramm@redhat.com>
An rbd image can have a maximum number of
snapshots defined by maxsnapshotsonimage
On the limit is reached the cephcsi will
start flattening the older snapshots and
returns the ABORT error message, The Request
comes after this as to wait till all the
images are flattened (this will increase the
PVC creation time. Instead of waiting till
the maximum snapshots on an RBD image, we can
have a soft limit, once the limit reached
cephcsi will start flattening the task to
break the chain. With this PVC creation time
will only be affected when the hard limit
(minsnapshotsonimage) reached.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
When the cephcsi executable detects an error when calling
validateMaxSnaphostFlag(), it panics due to klog.Fatalln(). The error
that validateMaxSnaphostFlag() logs should be understandable enough, so
that users know what to investigate. A Go panic on a user error is not
very userfriendly, and does not provide any additional usefil
information.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When the cephcsi executable receives an error when calling
validateCloneDepthFlag(), it panics due to klog.Fatalln(). The errors
that validateCloneDepthFlag() logs should be understandable enough, so
that users know what to investigate. A Go panic on a user error is not
very userfriendly, and does not provide any additional usefil
information.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When the cephcsi executable receives an error when calling
util.ValidateURL() on the optional "metricspath". The error that
util.ValidateURL() returns should be understandable enough, so that
users know what to investigate. A Go panic on a user error is not very
userfriendly, and does not provide any additional usefil information.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When the cephcsi executable receives an error when calling
util.ValidateDriverName(), it panics due to klog.Fatalln(). The error
that util.ValidateDriverName() returns should be understandable enough,
so that users know what to investigate. A Go panic on a user error is
not very userfriendly, and does not provide any additional usefil
information.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
When running the 'cephcsi' executable without arguments, a Go panic is
reported:
$ ./_output/cephcsi
F1026 13:59:04.302740 3409054 cephcsi.go:126] driver type not specified
goroutine 1 [running]:
k8s.io/klog/v2.stacks(0xc000010001, 0xc0000520a0, 0x48, 0x9a)
/go/src/github.com/ceph/ceph-csi/vendor/k8s.io/klog/v2/klog.go:996 +0xb9
k8s.io/klog/v2.(*loggingT).output(0x2370360, 0xc000000003, 0x0, 0x0, 0xc000194770, 0x20cb265, 0xa, 0x7e, 0x413500)
/go/src/github.com/ceph/ceph-csi/vendor/k8s.io/klog/v2/klog.go:945 +0x191
k8s.io/klog/v2.(*loggingT).println(0x2370360, 0x3, 0x0, 0x0, 0xc000163e08, 0x1, 0x1)
/go/src/github.com/ceph/ceph-csi/vendor/k8s.io/klog/v2/klog.go:699 +0x11a
k8s.io/klog/v2.Fatalln(...)
/go/src/github.com/ceph/ceph-csi/vendor/k8s.io/klog/v2/klog.go:1456
main.main()
/go/src/github.com/ceph/ceph-csi/cmd/cephcsi.go:126 +0xafa
Just logging the error and exiting should be sufficient. This stack-trace
from the Go panic does not add any useful information.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The main() function of the cephcsi executable calls klog.Fatalln() to
report certain errors. This causes the executable to panic which is not
helpful to users that only need the error message.
By introducing logAndExit(), there is no need to call klog.Fatalln()
anymore.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Direct usage of numbers should be avoided.
Issue reported:
mnd: Magic number: X, in <argument> detected (gomnd)
Signed-off-by: Yug <yuggupta27@gmail.com>
as v1.0.0 is deprecated we need to remove the support
for it in the Next coming (v3.0.0) release. This PR
removes the support for the same.
closes#882
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
Added maxsnapshotsonimage flag to flatten
the older rbd images on the chain to avoid
issue in krbd.The limit is in krbd since it
only allocate 1 4KiB page to handle all the
snapshot ids for an image.
The max limit is 510 as per
https://github.com/torvalds/linux/blob/
aaa2faab4ed8e5fe0111e04d6e168c028fe2987f/drivers/block/rbd.c#L98
in cephcsi we arekeeping the default to 450 to reserve 10%
to avoid issues.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added skipForceFlatten flag to skip
the image deptha and skip image flattening.
This will be very useful if the kernel is
not listed in cephcsi which supports deep
flatten fauture.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added Hardlimit and Softlimit flags for cephcsi
arguments. When the Softlimit is reached cephcsi
will start a background task to flatten the rbd
image and return success and if the hardlimit
is reached it will start a background task
to flatten the rbd image and return ready
to use as false to make sure that the image
will not be used until it is flatten.
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
It is useful to have the kernel version logged while starting binaries.
Some functionality depends on the version of the kernel, debugging
issues related to this will be easier.
Signed-off-by: Niels de Vos <ndevos@redhat.com>
The internal/ directory in Go has a special meaning, and indicates that
those packages are not meant for external consumption. Ceph-CSI does
provide public APIs for other projects to consume. There is no plan to
keep the API of the internally used packages stable.
Closes: #903
Signed-off-by: Niels de Vos <ndevos@redhat.com>
Fix below error in current codebase
File is not `goimports`-ed with -local
github.com/ceph/ceph-csi
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
As kubernetes CSI sidecar is exposing the
GRPC mertics we can make use of the same in
ceph-csi we dont need to expose our own.
update: #881
Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>