Kubernetes discovery protocol for JGroups
KUBE_PING
is a discovery protocol for JGroups cluster nodes managed by Kubernetes.
Since Kubernetes is in charge of launching nodes, it knows the IP addresses of all pods it started, and is therefore the best place to ask for cluster discovery.
Discovery is therefore done by asking Kubernetes for a list of IP addresses of all cluster nodes.
Combined with bind_port
/ port_range
, the protocol will then send a discovery request to all instances and wait for the responses.
A sample configuration looks like this:
<TCP
bind_addr="loopback,match-interface:eth0"
bind_port="7800"
...
/>
<org.jgroups.protocols.kubernetes.KUBE_PING
port_range="1"
namespace="${KUBERNETES_NAMESPACE:production}"
labels="${KUBERNETES_LABELS:cluster=nyc}"
/>
...
When a discovery is started, KUBE_PING
asks Kubernetes for a list of the IP addresses of all pods which it launched, matching the given namespace and labels (see below).
Let’s say Kubernetes launched a cluster of 3 pods with IP addresses 172.17.0.2
, 172.17.0.3
and 172.17.0.5
(all launched into the same namespace and without any (or the same) labels).
On a discovery request, Kubernetes returns a list of 3 IP addresses. JGroups now knows that the ports have been allocated from range [7800
.. 7801
] (bind_port
.. (bind_port
+ port_range
) in TCP
).
KUBE_PING therefore sends discovery requests to members at addresses 172.17.0.2:7800
, 172.17.0.2:7801
, 172.17.0.3:7800
, 172.17.0.3:7801
, 172.17.0.5:7800
and 172.17.0.5:7801
.
Note
|
Clients need permission to ask Kubernetes for a list of pod IPs. This is done by creating various policies e.g. with YAML. See the 3 example policies at the top of the Demo section below. |
Separating different clusters
If pods with containers in different clusters are launched, we’d get a list of IP addresses of all nodes, not just the ones in the same cluster, as Kubernetes knows nothing about clusters.
If we start multiple clusters, we need to separate them using namespaces and/or labels. If no namespaces or labels were used, things would still work, but we’d see warning messages in the logs about messages from different clusters that were discarded.
Namespaces
Namespaces can be used to separate deployments, services and pods. The example below shows namespaces default
(the default namespace) and kube-system
(the name space used by Kubernetes for its own deployments, pods etc):
[belasmac] /Users/bela/IspnPerfTest$ kubectl get services,deployments,pods --all-namespaces NAMESPACE NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE default svc/ispn-perf-test 10.0.0.137 <nodes> 8090:32700/TCP 8h default svc/kubernetes 10.0.0.1 <none> 443/TCP 53d kube-system svc/kube-dns 10.0.0.10 <none> 53/UDP,53/TCP 53d kube-system svc/kubernetes-dashboard 10.0.0.30 <nodes> 80:30000/TCP 53d NAMESPACE NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE default deploy/ispn-perf-test2 3 3 3 3 8h NAMESPACE NAME READY STATUS RESTARTS AGE default po/ispn-perf-test2-2831437780-g07c5 1/1 Running 1 8h default po/ispn-perf-test2-2831437780-np83d 1/1 Running 0 6h default po/ispn-perf-test2-2831437780-szl60 1/1 Running 1 8h kube-system po/kube-addon-manager-minikube 1/1 Running 12 53d kube-system po/kube-dns-v20-b63lk 3/3 Running 34 53d kube-system po/kubernetes-dashboard-73c7v 1/1 Running 11 53d
The last section shows 6 pods that have been started, 3 in the default
namespace and 3 by Kubernetes itself in the kube-system
namespace. The following 3 namespaces exist:
[belasmac] /Users/bela$ kubectl get namespaces NAME STATUS AGE belaban Active 2d default Active 53d kube-system Active 53d
To create a new namespace foo
, run kubectl create namespace foo
.
Namespaces have to be created and deleted manually via kubectl.
When launching a new deployment in a pod, the namespace can be given with -n <name space>
or --namespace=<name space>
. Alternatively, the namespace can be defined in the YAML or JSON config passed to kubectl create -f <config>
.
In the JGroups configuration, attribute namespace
is used to define the namespace to be used for discovery. The example above used namespace="${KUBERNETES_NAMESPACE:production}"
. This means that the namespace is
-
the value of system property
KUBERNETES_NAMESPACE
, if set -
the value of environment variable
KUBERNETES_NAMESPACE
if set, or -
"production"
if neither system property nor env var are set
In the example above, Kubernetes will return only IP addresses of pods created in namespace "production"
.
Labels
Labels are similar to namespaces; different labels can separate clusters running inside the same namespace.
Similarly to namespaces, labels also have to be defined when launching a pod, either via --labels=<label>
passed to kubectl run
, or in the YAML or JSON configuration file passed to kubectl create -f <config>
.
In the sample configuration above, labels were defined as labels="${KUBERNETES_LABELS:cluster=nyc}"
. This means that Kubernetes will only return IP addresses of pods started with label cluster=nyc
.
Namespaces and labels can be both set; in this case, Kubernetes will return the IP addresses of all pods started in the given namespace matching the given label(s).
If neither namespace nor labels are set, then the IP addresses of all pods created by Kubernetes in the default namespace "default"
will be returned.
KUBE_PING configuration
Attribute name | System property | Default | Description |
---|---|---|---|
port_range |
1 |
Number of additional ports to be probed for membership. A port_range of 0 does not probe additional ports. Example: |
|
connectTimeout |
KUBERNETES_CONNECT_TIMEOUT |
5000 |
Maximum time (in milliseconds) to wait for a connection to the Kubernetes server. If exceeded, an exception will be thrown. |
readTimeout |
KUBERNETES_READ_TIMEOUT |
30000 |
Maximum time in milliseconds to wait for a response from the Kubernetes server. |
operationAttempts |
KUBERNETES_OPERATION_ATTEMPTS |
3 |
Maximum number of attempts to send discovery requests. |
operationSleep |
KUBERNETES_OPERATION_SLEEP |
1000 |
Time in milliseconds between operation attempts. |
masterProtocol |
KUBERNETES_MASTER_PROTOCOL |
"https" |
Schema http or https to be used to send the initial discovery request to the Kubernetes server. |
masterHost |
KUBERNETES_SERVICE_HOST |
The URL of the Kubernetes server. |
|
masterPort |
KUBERNETES_SERVICE_PORT |
The port on which the Kubernetes server is listening. |
|
apiVersion |
KUBERNETES_API_VERSION |
"v1" |
The version of the protocol to the Kubernetes server. |
namespace |
KUBERNETES_NAMESPACE |
"default" |
The namespace to be used. |
labels |
KUBERNETES_LABELS |
The labels to use in the discovery request to the Kubernetes server. |
|
clientCertFile |
KUBERNETES_CLIENT_CERTIFICATE_FILE |
Certificate to access the Kubernetes server. |
|
clientKeyFile |
KUBERNETES_CLIENT_KEY_FILE |
Client key file (store). |
|
clientKeyPassword |
KUBERNETES_CLIENT_KEY_PASSWORD |
The password to access the client key store. |
|
clientKeyAlgo |
KUBERNETES_CLIENT_KEY_ALGO |
"RSA" |
The algorithm used by the client. |
caCertFile |
KUBERNETES_CA_CERTIFICATE_FILE |
"/var/run/secrets/kubernetes.io/serviceaccount/ca.crt" |
Client CA certificate. |
saTokenFile |
SA_TOKEN_FILE |
"/var/run/secrets/kubernetes.io/serviceaccount/token" |
Token file. |
dump_requests |
false |
Dumps all discovery requests and responses to the Kubernetes server to stdout when true. |
|
split_clusters_during_rolling_update |
KUBERNETES_SPLIT_CLUSTERS_DURING_ROLLING_UPDATE |
false |
During the Rolling Update, prevents from putting all Pods into a single cluster. |
useNotReadyAddresses |
KUBERNETES_USE_NOT_READY_ADDRESSES |
true |
True if initial discovery should take unready Pods into consideration. |
Demo
In this demo, we’re going to let Kubernetes start 3 instances of IspnPerfTest via a YAML configuration. Then we’ll run a separate instance interactively and confirm that the instances have formed a cluster of 4. All instances are created in the default namespace and no labels are used.
Copy n' paste the snippet below in a terminal where kubectl is running against your K8S cluster
# --------------------------------------------------------------------- # This demo assumes that RBAC is enabled on the Kubernetes cluster. # # The serviceaccount, clusterrole and clusterrolebinding provide # permission for the pods to query K8S api # --------------------------------------------------------------------- # Change to a Kubernetes namespace of your preference export TARGET_NAMESPACE=default kubectl create serviceaccount jgroups-kubeping-service-account -n $TARGET_NAMESPACE cat <<EOF | kubectl apply -f - kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: jgroups-kubeping-pod-reader rules: - apiGroups: [""] resources: ["pods"] verbs: ["get", "list"] --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: jgroups-kubeping-api-access roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: jgroups-kubeping-pod-reader subjects: - kind: ServiceAccount name: jgroups-kubeping-service-account namespace: $TARGET_NAMESPACE --- apiVersion: v1 items: - apiVersion: apps/v1 kind: Deployment metadata: annotations: name: ispn-perf-test namespace: $TARGET_NAMESPACE spec: replicas: 3 selector: matchLabels: run: ispn-perf-test template: metadata: labels: run: ispn-perf-test spec: serviceAccountName: jgroups-kubeping-service-account containers: - args: - /opt/ispn/IspnPerfTest/bin/kube.sh - -nohup env: - name: KUBERNETES_NAMESPACE valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.namespace image: belaban/ispn_perf_test name: ispn-perf-test resources: {} terminationMessagePath: /dev/termination-log kind: List metadata: {} EOF
To remove the resources when demo time is over:
kubectl delete deployment/ispn-perf-test clusterrolebinding/jgroups-kubeping-api-access clusterrole/jgroups-kubeping-pod-reader serviceaccount/jgroups-kubeping-service-account -n $TARGET_NAMESPACE
The image is belaban/ispn_perf_test
which contains the IspnPerfTest project plus some scripts to start nodes. 3 instances are started and the start command is kube-debug.sh -nohup
; this launches the programs without the loop which reads commands from stdin.
kubectl get pods
confirms that 3 instances have been created:
belasmac] /Users/bela/kubetest$ kubectl get pods NAME READY STATUS RESTARTS AGE ispn-perf-test-2224433472-6l456 1/1 Running 0 29s ispn-perf-test-2224433472-ksh58 1/1 Running 0 29s ispn-perf-test-2224433472-rlr0m 1/1 Running 0 29s
We can now run a shell in one of the nodes and confirm that a cluster of 3 has formed. First, we have to exec a bash shell in one of the 3 nodes:
[belasmac] /Users/bela/kubetest$ kubectl exec -it ispn-perf-test-2224433472-rlr0m bash bash-4.3$
Now probe can be used to list all cluster members:
bash-4.3$ cd IspnPerfTest/ bash-4.3$ bin/probe.sh -- sending probe request to /224.0.75.75:7500 #1 (300 bytes): local_addr=ispn-perf-test-2224433472-rlr0m-12151 physical_addr=172.17.0.5:7800 view=[ispn-perf-test-2224433472-ksh58-1200|2] (3) [ispn-perf-test-2224433472-ksh58-1200, ispn-perf-test-2224433472-6l456-41832, ispn-perf-test-2224433472-rlr0m-12151] cluster=default version=4.0.3-SNAPSHOT (Schiener Berg) #2 (299 bytes): local_addr=ispn-perf-test-2224433472-ksh58-1200 physical_addr=172.17.0.6:7800 view=[ispn-perf-test-2224433472-ksh58-1200|2] (3) [ispn-perf-test-2224433472-ksh58-1200, ispn-perf-test-2224433472-6l456-41832, ispn-perf-test-2224433472-rlr0m-12151] cluster=default version=4.0.3-SNAPSHOT (Schiener Berg) #3 (300 bytes): local_addr=ispn-perf-test-2224433472-6l456-41832 physical_addr=172.17.0.7:7800 view=[ispn-perf-test-2224433472-ksh58-1200|2] (3) [ispn-perf-test-2224433472-ksh58-1200, ispn-perf-test-2224433472-6l456-41832, ispn-perf-test-2224433472-rlr0m-12151] cluster=default version=4.0.3-SNAPSHOT (Schiener Berg) 3 responses (3 matches, 0 non matches)
As can be seen, every member has the same view ispn-perf-test-2224433472-ksh58-1200|2] (3)
containing 3 members, so the cluster has formed correctly.
Now a fourth instance can be created, but this time we’ll enable the event loop reading from stdin. To this end, we have to use kubectl run -it
(-it
for interactively):
[belasmac] /Users/bela/kubetest$ kubectl run ispn -it --rm=true --image=belaban/ispn_perf_test kube.sh Waiting for pod default/ispn-3105267510-nr9dp to be running, status is Pending, pod ready: false If you don't see a command prompt, try pressing enter. ------------------------------------------------------------------- GMS: address=ispn-3105267510-nr9dp-29942, cluster=default, physical address=172.17.0.8:7800 ------------------------------------------------------------------- ------------------------------------------------------------------- GMS: address=ispn-3105267510-nr9dp-43008, cluster=cfg, physical address=172.17.0.8:7900 ------------------------------------------------------------------- created 100,000 keys: [1-100,000], old key set size: 0 Fetched config from ispn-perf-test-2224433472-ksh58-51617: {print_details=true, num_threads=100, print_invokers=false, num_keys=100000, time_secs=60, msg_size=1000, read_percentage=1.0} created 100,000 keys: [1-100,000] [1] Start test [2] View [3] Cache size [4] Threads (100) [5] Keys (100,000) [6] Time (secs) (60) [7] Value size (1.00KB) [8] Validate [p] Populate cache [c] Clear cache [v] Versions [r] Read percentage (1.00) [d] Details (true) [i] Invokers (false) [l] dump local cache [q] Quit [X] Quit all
This starts the instance and it should have joined the cluster, which should now have 4 nodes. This can be confirmed by running probe.sh
again in the other shell, or by pressing [2] View
):
2 -- local: ispn-3105267510-nr9dp-43008 -- view: [ispn-perf-test-2224433472-ksh58-51617|3] (4) [ispn-perf-test-2224433472-ksh58-51617, ispn-perf-test-2224433472-rlr0m-11878, ispn-perf-test-2224433472-6l456-28251, ispn-3105267510-nr9dp-43008]
We can see that the view is now ispn-perf-test-2224433472-ksh58-51617|3] (4)
, and the cluster has correctly added the fourth member.
Running on Google Container Engine
The commands for running on Google Container Engine (GKE) are the same as when running locally in minikube.
The only difference is that on GKE, contrary to minikube, IP multicasting is not available. This means that the probe.sh
command has to be run as probe.sh -addr localhost
instead of simply running probe.sh
.