ONE Discovery, Inc.

LightningIQ Deployment Scripts

This repository hosts the deployment and migration scripts for LightningIQ, served from deploy.lightningiq.io. There are three main scripts:

Script Purpose
run Deploy or upgrade LightningIQ v1.7+ on a Kubernetes cluster
v1.6/run Deploy or upgrade LightningIQ v1.6 on a Kubernetes cluster
migrate-1.6-to-1.7.sh Migrate an existing 1.6 install to 1.7

Table of Contents


Prerequisites

All scripts require the following tools installed and available in $PATH:

Tool Purpose
kubectl Kubernetes cluster management
helm Kubernetes package manager (used for operator deployment)
aws CLI ECR registry authentication
curl Fetching remote imports and GitHub files
git Cloning the deployer repository
base64 Decoding Kubernetes secrets (migration script only)

For Kubernetes-level deployments (k/k8s, f/full), Ansible is used under the hood via the k8s-deployer child script. The req action installs all required tools (kubectl, helm, Ansible, and others) via ASDF.


run — LightningIQ v1.7+ Deployer

Overview

The run script is the primary entry point for deploying and managing LightningIQ v1.7+. It can be run directly from a local clone or piped from the remote host.

Usage

Remote (recommended):

export GH_TOKEN=ghp_xxx
export SSH_PASSWORD=<ssh_password>
export VAULT_PASSWORD=<vault_password>
export ENV_REPO_NAME=<client-repo-name>

curl -s https://deploy.lightningiq.io/run | bash -s -- \
  -r "v1.7.145" \
  -e <environment_name> \
  -d <action>

With all arguments inline:

curl -s https://deploy.lightningiq.io/run | bash -s -- \
  -r "v1.7.145" \
  -e prod-usw2 \
  -er client-repo \
  -d k8s \
  -t $GH_TOKEN \
  -s "<ssh_password>" \
  -vp "<vault_password>"

Local (for development/debugging):

export USE_LOCAL_IMPORTS=true
./run -r "v1.7.145" -e qa -er client-repo -d sync

Arguments

Flag Long form Required Description
-r --release Yes Release version tag, e.g. v1.7.145
-e --env Yes Environment name, e.g. prod-usw2 (or ENV_NAME env var)
-er --env-repo Yes Client environment repository name (or ENV_REPO_NAME env var)
-d --deploy Yes Action to perform (see Actions)
-t --token No* GitHub token (or GH_TOKEN / GITHUB_TOKEN env var)
-s --ssh-password No* SSH password for Ansible (or SSH_PASSWORD env var)
-vp --vault-password No* Ansible vault password (or VAULT_PASSWORD env var)
-v --verbose No Enable verbose mode, show Ansible task progress

* If not provided as a flag or environment variable, the script will interactively prompt for the value.

Actions

Action Short Description
req r Check and install prerequisites (runs Ansible req playbook)
k8s k Deploy the Kubernetes cluster (runs req first)
sync s Deploy/update LightningIQ applications via Helm (runs req first)
full f Full deployment: reqk8ssync
upgrade u Upgrade the Kubernetes cluster version (requires UPGRADE_VERSION)

How It Works

  1. Imports variables.sh and functions.sh from deploy.lightningiq.io (or locally).
  2. Parses and validates arguments. Prompts interactively for any missing credentials.
  3. Sources update_variables.sh to export runtime variables (inventory paths, ECR registry, etc.).
  4. Clones OneDiscovery/k8s-deployer from the main branch into /tmp/k8s-deployer.
  5. Clones the client environment repository into /tmp/<ENV_REPO_NAME>.
  6. Detects ansible_become_method from the inventory’s connection_settings.yml to set SUDO.
  7. Executes the selected action by delegating to the child script /tmp/k8s-deployer/run.sh.

The sync Action — Helm Deployment Detail

The sync (and full) action runs start_apps_deployment, which:

  1. Prompts for AWS credentials if not already configured.
  2. Authenticates with ECR: aws ecr get-login-password | helm registry login.
  3. Reads values from /tmp/<ENV_REPO_NAME>/deploy/<environment>/values.yml.
  4. Fresh install path: If lightningiq-operator Helm release does not exist, bootstraps the operator with cr.enabled=false first (registers the CRD), waits for the operator to be ready, then applies the full values with cr.spec.keycloak.initialAdminPassword=oned-pass.
  5. Upgrade path: If the release already exists, runs helm upgrade --install with the environment values directly.

The upgrade Action

Requires UPGRADE_VERSION to be set (e.g. 1.28). The script will interactively prompt if it is not set. Delegates to the child script’s Ansible playbook with -t upgrade -e common_k8s_ver=<version>.

Logs

Logs are written to /tmp/onediscovery-init-logs/init.log. The log directory is recreated fresh on each run.


v1.6/run — LightningIQ v1.6 Deployer

Overview

The v1.6/run script is the deployer for LightningIQ v1.6. It follows the same interface as the v1.7 deployer but has key behavioral differences (see Differences).

Usage

Remote:

export GH_TOKEN=ghp_xxx
export SSH_PASSWORD=<ssh_password>
export VAULT_PASSWORD=<vault_password>
export ENV_REPO_NAME=<client-repo-name>

curl -s https://deploy.lightningiq.io/v1.6/run | bash -s -- \
  -r "v1.6.x" \
  -e <environment_name> \
  -d <action>

Local:

export USE_LOCAL_IMPORTS=true
./v1.6/run -r "v1.6.55" -e qa -er client-repo -d sync

Arguments

Identical to the v1.7 run script. See Arguments above.

Actions

Action Short Description
req r Check and install prerequisites
k8s k Deploy the Kubernetes cluster
test t Run test suite
sync s Deploy/update LightningIQ applications
full f Full deployment: reqk8ssync
upgrade u Upgrade Kubernetes cluster version

How It Works

  1. Imports variables.sh and functions.sh from deploy.lightningiq.io/v1.6/.
  2. Parses and validates arguments.
  3. Sources update_variables.sh for runtime variables.
  4. Release resolution: If BRANCH is not set, fetches release info from GitHub API (latest or specified tag), downloads the release tarball, and extracts it to /tmp/k8s-deployer. If BRANCH is set, clones that branch directly.
  5. Sets up authentications (.netrc, SSH password file, Vault password file).
  6. Clones the client environment repository.
  7. Extracts ansible_become_method from connection_settings.yml (no file-existence guard — the file is expected to always exist in v1.6).
  8. Delegates to the child script.

Differences Between v1.6 and v1.7 Deployers

Behavior v1.6/run run (v1.7+)
Imports URL .../v1.6/imports/... .../imports/... (no version prefix)
Deployer fetch Release tarball or branch clone Always clones main branch
Release handling fetch_release_infoget_release_tarball (or get_deployer_branch) get_deployer_main only
test action Supported (t\|test) Not supported
SUDO extraction Always reads connection_settings.yml (no guard) Only reads if file exists; falls back to ${SUDO:-sudo}
Default admin password N/A (not set) Set to oned-pass on fresh install
OPERATOR_VERSION Derived from RELEASE in update_variables.sh Derived from RELEASE after get_deployer_main

migrate-1.6-to-1.7.sh — Migration Script

Overview

migrate-1.6-to-1.7.sh automates the full migration of an existing LightningIQ v1.6 installation to v1.7. It is destructive: the old cluster is wiped and rebuilt. The migration is organized into 8 sequential phases with a persistent state file so interrupted runs can be safely resumed.

Warning: This script wipes the entire v1.6 cluster. The backups created in Phase 2 are the only recovery path once Phase 3 begins.

Usage

Remote (recommended):

curl -s https://deploy.lightningiq.io/migrate-1.6-to-1.7.sh | bash -s -- \
  --version 1.7.145 \
  --env-name qa \
  --env-repo client-repo \
  --token ghp_xxx

Local:

./migrate-1.6-to-1.7.sh \
  --version 1.7.145 \
  --env-name qa \
  --env-repo client-repo \
  --token $GH_TOKEN

With environment variables:

export GH_TOKEN=ghp_xxx
export ENV_NAME=qa
export ENV_REPO_NAME=client-repo

./migrate-1.6-to-1.7.sh --version 1.7.145

Arguments

Flag Required Default Description
--version VERSION Yes Operator version in X.Y.Z format, e.g. 1.7.145
--env-name NAME Yes ENV_NAME env var Environment name, e.g. qa, prod-usw2
--env-repo REPO Yes ENV_REPO_NAME env var Client environment repository name
--token TOKEN Yes GH_TOKEN env var GitHub personal access token
--branch BRANCH No main Branch to fetch env values from
--backup-dir DIR No ~/liq-backup Directory for backups and state file
-h, --help No Show usage

Prerequisites

Before running, ensure:

Migration Phases

Phase 1 — Discover 1.6 Landscape

Surveys the running cluster and captures all secrets required for migration:

Security note: migration-config.txt contains secrets in plaintext. Delete it after the migration is complete.

Phase 2 — Back Up PostgreSQL and Redis

Creates off-cluster backups of all stateful data:

Backup artifacts:

$BACKUP_DIR/
  overlord_full.sql        # PostgreSQL dump
  overlord_row_counts.txt  # Row count snapshot
  redis_dump.rdb           # Redis RDB file
  redis_dbsize.txt         # Redis key count
  migration-config.txt     # Captured secrets (delete after migration!)

Phase 3 — Wipe the Cluster

DESTRUCTIVE. Requires explicit yes confirmation.

Deletes all v1.6 resources:

Phase 4 — Install Fresh 1.7 (Infrastructure Only)

Installs LightningIQ v1.7 with all application services explicitly disabled so only PostgreSQL, Redis, and Kafka are provisioned first:

  1. Authenticates with ECR.
  2. Runs helm upgrade --install lightningiq-operator with:
    • cr.enabled=true, cr.spec.storageBasePath=/mnt/nfs
    • All services (overlordApi, scanner, adminApi, metrics, metadataAnalyzer, unpack, fileProcessor, cleanup, fileRouter, datastore, ui) set to enabled=false
    • Environment values from the client repo (deploy/$ENV_NAME/values.yml)
  3. Waits up to 10 minutes for PostgreSQLReady, RedisReady, and KafkaReady conditions on the liq CR. Fails if not all three are True.

Phase 5 — Restore PostgreSQL and Redis

Restores the backups created in Phase 2 into the new v1.7 infrastructure:

Phase 6 — Create External Secrets

Presents a menu to selectively create Kubernetes secrets in the lightning-iq namespace from the values captured in Phase 1:

Secret Contains
pgsql-ddc-lan password (external PostgreSQL)
garage-ddc-lan accessKeyId, secretAccessKey (object storage)

Input options: individual numbers (1), comma-separated (1,2), all, or none. Selecting none skips secret creation with a warning to create them manually before starting services.

Uses --dry-run=client -o yaml | kubectl apply -f - for idempotent creation.

Phase 7 — Enable All Services

Requires explicit yes confirmation.

Runs helm upgrade -i lightningiq-operator without --reuse-values, intentionally picking up chart defaults (all services enabled), applying only:

Waits up to 15 minutes for the liq CR to reach phase: Ready. Issues a warning (not a failure) if the timeout is exceeded, since services may still be starting.

Phase 8 — Verify

Final health check:

Idempotency and Resuming

The migration tracks progress in $BACKUP_DIR/.migration-state. Each phase appends PHASE_N_COMPLETE to this file when it finishes. On re-run, completed phases are automatically skipped.

To re-run a specific phase, remove its marker from the state file:

# Example: re-run Phase 5
sed -i '/PHASE_5_COMPLETE/d' ~/liq-backup/.migration-state
./migrate-1.6-to-1.7.sh --version 1.7.145 --env-name qa --env-repo client-repo

To start over completely:

rm ~/liq-backup/.migration-state

If the script is interrupted (Ctrl+C), it prints: Interrupted. Re-run the script to resume from the last completed phase.

Output Log

All output is tee’d to migration_1.6-to-1.7.log in the current working directory. To use a different filename, set OUTPUT:

export OUTPUT=/var/log/migration.log

Post-Migration Checklist


Environment Variables Reference

Variable Used By Description
GH_TOKEN / GITHUB_TOKEN All scripts GitHub personal access token
ENV_NAME run, v1.6/run, migration Environment name (alternative to -e flag)
ENV_REPO_NAME run, v1.6/run, migration Client environment repo name
SSH_PASSWORD run, v1.6/run SSH password for Ansible become
VAULT_PASSWORD run, v1.6/run Ansible vault password
UPGRADE_VERSION run, v1.6/run Target Kubernetes version for upgrade action
SUDO run, v1.6/run Override become method (default: sudo)
USE_LOCAL_IMPORTS All scripts Set to true to source imports/ from local files instead of deploy.lightningiq.io
SHOW_PROGRESS run, v1.6/run Set to true to show Ansible task progress (same as -v)
BRANCH v1.6/run, migration Branch to use for deployer or env values checkout
OUTPUT Migration Log file path (default: migration_1.6-to-1.7.log)
ORG Migration GitHub org (default: OneDiscovery)

Troubleshooting

GH_TOKEN / credentials not found

If any required credential is missing, the deployer scripts will prompt interactively via /dev/tty. Alternatively, set the corresponding environment variable before running.

kubectl context wrong cluster

The migration script displays the current context and asks for confirmation. For the deployer scripts, ensure the correct context is active:

kubectl config use-context <cluster-name>
kubectl config current-context

Namespace stuck in Terminating (Phase 3)

Stuck namespaces usually have finalizers blocking deletion. To force-remove them:

kubectl get namespace <ns> -o json \
  | jq '.spec.finalizers = []' \
  | kubectl replace --raw "/api/v1/namespaces/<ns>/finalize" -f -

Then remove PHASE_3_COMPLETE from the state file and re-run.

Infrastructure not ready after Phase 4

Check the operator and CR status:

kubectl -n lightning-iq get liq liq -o yaml
kubectl -n lightning-iq get pods
kubectl -n lightning-iq logs deployment/lightning-iq-operator

Redis DBSIZE is 0 after Phase 5

The RDB may not have loaded if Redis restarted before the file was fully copied, or if the copied file was corrupt. Check:

kubectl -n lightning-iq exec <redis-pod> -c redis -- redis-cli -a <pass> DEBUG RELOAD

If necessary, repeat Phase 5 by removing PHASE_5_COMPLETE from the state file.

AWS ECR authentication failure

Ensure your AWS credentials are valid and have ecr:GetAuthorizationToken and ecr:BatchGetImage permissions for the 060147281721.dkr.ecr.us-east-1.amazonaws.com registry.

Local development with USE_LOCAL_IMPORTS=true

This bypasses the remote fetch of imports/ files and uses the local copies in ./imports/ (or ./v1.6/imports/ for the v1.6 script). Useful for testing changes to import files without publishing them:

export USE_LOCAL_IMPORTS=true
./run -r "v1.7.145" -e qa -er client-repo -d sync