VMWare
VMWare offline deployment guide
Prerequisites
-
Ubuntu 22.04 VMs or RedHat 9.5 VMs
- Standalone: 5 kubernetes nodes, 1 registry, 1 DB+etcd
- HA (DC/DR): 10 kubernetes nodes, 2 registry, 2 DB+etcd, 1 etcd
-
Download the offline tar files (will be provided) and upload them to registry VM in the home dir of the user from where the deployment scripts will be run.
- db-dependencies.tar.gz
- k8snode-dependencies.tar.gz
- semaphore-dependencies.tar.gz
- cluster_infra_images_kubespray.tar.gz (Cilium, metallb etc)
- ct_service_images_static_files_repo.tar.gz
- ct_observability_service_images.tar.gz
- ct_third_party_service_images.tar.gz (vault, redis, kyverno etc)
- registry-semaphore-images.tar (registry, offline-files, semaphore images)
XShield Deployment Steps
- SSH into the registry VM
cd $HOME
tar -xvf semaphore-dependencies.tar.gz
mv opt/semaphore-dependencies/onprem-infrastructure/ .
rm -rf opt/
- SSH into DB VM
- Get the backup disk name. Use the command "lsblk" to find the backup disk name
ubuntu@db0:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
fd0 2:0 1 4K 0 disk
loop0 7:0 0 89.4M 1 loop /snap/lxd/31333
loop1 7:1 0 63.7M 1 loop /snap/core20/2434
loop2 7:2 0 44.4M 1 loop /snap/snapd/23545
sda 8:0 0 64G 0 disk
├─sda1 8:1 0 63.9G 0 part /
├─sda14 8:14 0 4M 0 part
└─sda15 8:15 0 106M 0 part /boot/efi
sdb 8:16 0 64G 0 disk
sr0 11:0 1 44K 0 rom
- The unformatted backup disk will have empty MOUNTPOINTPATH and there will not be any partitions under it. In the example above it is sdb. Note the disk name, it will go into customer.yaml in the next step
- Copy config/customer.yaml.sample to config/customer.yaml
cd ~/onprem-infrastructure
cp config/customer.yaml.sample config/customer.yaml
- Edit
config/customer.yaml
with the following required variables.
Note: Variable names are mentioned in JSON path expression format. If any value is left blank here, it means that user needs to input the value according to the environment. It doesn't neccessarily mean that this entry should be left blank.
Required variables (expand)
Json Query Path of Variable | Description | Value |
---|---|---|
.tenant.name | Name of the tenant created in Xshield | "Tenant Name" |
.tenant.codename | Codename for tenant used for vSphere | "ct"` |
.ansible.sshMethod | Login method to the VMs | accepted values 'password' and 'privatekey'` |
.ansible.sshPrivateKeyPath | Path to ssh private RSA key | -` |
.tenant.codename | Codename for tenant used for vSphere | "ct"` |
.databaseVmBackupDisk | Name of backup disk on DB VM | value from step 2 above |
.cluster.kind | Type of kubernetes cluster | "kubespray" |
.cluster.nodeIps[] | Array of static IPs for nodes in current environment | - |
.database.datacenterLocalPostgresNodeIP | Static IPs of database node in current environment | - |
.database.etcdNodeIPs[] | Array of static IPs for etcd nodes in current environment | - |
.database.postgresNodeIPs[] | Array of static IPs for database nodes in current environment | - |
.domains.platform | Platform domain | e.g. onprem.colortokens.com |
.domains.monitoring | Grafana domain | e.g. monitoring-onprem.colortokens.com |
.domains.pmm | PMM Domain | e.g. pmm-onprem.colortokens.com |
.domains.objectStore | Rook / Ceph domain | e.g. artifacts-onprem-dr.colortokens.com |
.domains.registry | Registry domain | e.g. registry-onprem-dr.colortokens.com |
.domains.platformPrimary | Platform active domain | e.g. onprem.colortokens.com |
.loadBalancerIps.platform | Static IP for platform | - |
.loadBalancerIps.monitoring | Static IP for monitoring | - |
.loadBalancerIps.objectStore | Static IP for object store | - |
.objectStore.cephRgw.enabled | Enable Ceph Object Store | - |
.objectStore.cephRgw.multisite.enabled | Enable ceph radosgw multisite | - |
.objectStore.cephRgw.multisite.kind | Configure as ceph radosgw master site | "master" |
.prometheus.etcd.endpoints[] | Array of static IPs used by control plane nodes, this would be the first three nodes from .cluster.nodeIps[] | - |
.vault.kind | Kind of vault | master |
-
Verify DNS records for all the domains mentioned in
customer/config.yaml
are added and resolving as expected -
Deploy semaphore
# Setup Semaphore
bash ~/onprem-infrastructure/deploy-semaphoreui.sh
If you notice any shcema errors validate the customer.yaml file
- Login to semaphore UI using port 3000 on registry domain. e.g. http://registry-onprem.colortokens.com:3000
username: admin
password: colors321
- Run the following tasks in semaphoreui in sequence:
- pull-images-and-helm-charts (only run during upgrades to fetch latest images)
- deploy-registry-offline
- deploy-db
- deploy-k8s-cluster
- deploy-ct-platform
- post-install-tasks
If there are errors like this usermod: user etcd is currently used by process 478351\n error while running configure-db taks , then we should stop etcd from db vm and then rerun the task.
Upgrade flow
To be filled (work in progress)
DR / Replica site steps
-
Follow the [VM setup steps from before] to deploy the required VMs for DR site, similar to DC site.
-
Clone the onprem-infrastructure repo at ~/onprem-infrastructure into the registry VM.
-
Copy over
/home/ctuser/onprem-infrastructure/config/customer.yaml
from the DC site registry VM and update only the following values accordingly.
ssh ctuser@<DR-site-registry-vm-ip>
scp ctuser@<DC-site-registry-vm-ip>:/home/ctuser/onprem-infrastructure/config/customer.yaml ~/onprem-infrastructure/config/customer.yaml
Note: Variable names are mentioned in JSON path expression format. If any value is left blank here, it means that user needs to input the value according to the environment. It doesn't neccessarily mean that this entry should be left blank.
Required variables (expand)
Json Query Path of Variable | Description | Value |
---|---|---|
.cluster.nodeIps[] | Array of static IPs for nodes in current environment | - |
.domains.platform | Platform domain | e.g. onprem-dr.colortokens.com |
.domains.monitoring | Grafana domain | e.g. monitoring-onprem-dr.colortokens.com |
.domains.pmm | PMM Domain | e.g. pmm-onprem-dr.colortokens.com |
.domains.objectStore | Rook / Ceph domain | e.g. artifacts-onprem-dr.colortokens.com |
.domains.registry | Registry domain | e.g. registry-onprem-dr.colortokens.com |
.loadBalancerIps.platform | Static IP for platform | - |
.loadBalancerIps.monitoring | Static IP for monitoring | - |
.loadBalancerIps.objectStore | Static IP for object store | - |
.database.datacenterLocalPostgresNodeIP | Static IPs of database node in current environment | - |
.objectStore.cephRgw.multisite.kind | Configure as ceph radosgw replica site | "replica" |
.objectStore.cephRgw.multisite.replica.realmPullEndpoint | Configure ceph radosgw replica site realm pull endpoint, this is equal to ceph external endpoint from master site | e.g "http://<IP of one of the master site nodes>:30180/" on master site, we can run the invite-tenant task through semaphoreui, it will spill out values for realmpullendpoint, access key id and secret key id somewhere in the logs |
.objectStore.cephRgw.multisite.replica.systemAccessKeyId | Configure ceph radosgw replica site system access key id, this should be equal to system access key id from master site | - |
.objectStore.cephRgw.multisite.replica.systemSecretAccessKey | Configure ceph radosgw replica site system access key id, this should be equal to system secret access key from master site | - |
.prometheus.etcd.endpoints[] | Array of static IPs used by control plane nodes, this would be the first three nodes from .cluster.nodeIps[] | - |
.vault.kind | Kind of vault | replica |
- After configuring the above variables, please run the following script to test the configuration before proceeding.
cd ~/onprem-infrastructure
bash ./test-config.sh
- Copy over vault keys file from DC site to DR site using the paths specified below.
Note: You will require root privileges as the source and destination folders are owned by root
DC bastion location -- source: ~/onprem-infrastructure/ansible-outputs/vault-keys.json
DR bastion location -- destination: ~/onprem-infrastructure/ansible-outputs/vault-keys.master.json
-
Repeat steps 4 - 7 from DC site steps.
-
Run the following tasks in semaphoreui in sequence: a. configure-registry b. pull-images-and-helm-charts c. deploy-k8s-cluster d. deploy-ct-platform e. post-install-tasks