airunway-aks-setup

Set up AI Runway on AKS — from bare cluster to running model. Covers cluster verification, controller install, GPU assessment, provider setup, and first…

INSTALLATION
npx skills add https://github.com/microsoft/azure-skills --skill airunway-aks-setup
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$27

When to Use This Skill

Use this skill when the user wants to:

  • Set up AI Runway on an existing AKS cluster from scratch
  • Install the AI Runway controller and CRDs
  • Assess GPU hardware compatibility for model deployment
  • Choose and install an inference provider (KAITO, Dynamo, KubeRay)
  • Deploy their first AI model to AKS via AI Runway
  • Resume a partially-complete AI Runway setup from a specific step

MCP Tools

This skill uses no MCP tools. All cluster operations are performed directly via kubectl and make.

Rules

  • Execute steps in sequence — load the reference for each step as you reach it
  • Report cluster state at each step: ✓ healthy, ✗ missing/failed
  • Ask for user confirmation before any install or deployment action
  • If a step is already complete, report status and skip to the next step
  • If the user provides skip-to-step N, start at step N; assume prior steps are complete

Steps

#

Step

Reference

1

Cluster Verification — context check, node inventory, GPU detection

step-1-verify.md

2

Controller Installation — CRD + controller deployment

step-2-controller.md

3

GPU Assessment — detect GPU models, flag dtype/attention constraints

step-3-gpu.md

4

Provider Setup — recommend and install inference provider

step-4-provider.md

5

First Deployment — pick a model, deploy, verify Ready

step-5-deploy.md

6

Summary — recap, smoke test, next steps

step-6-summary.md

Error Handling

Error / Symptom

Likely Cause

Remediation

No kubeconfig context

Not connected to a cluster

Run az aks get-credentials or equivalent

Controller in CrashLoopBackOff

Config or RBAC issue

kubectl logs -n airunway-system -l control-plane=controller-manager --previous

Provider not ready

Image pull or RBAC issue

kubectl logs <pod-name> -n <namespace> for the provider pod

ModelDeployment stuck in Pending

GPU scheduling failure or provider not ready

kubectl describe modeldeployment <name> -n <namespace> events

bfloat16 errors at inference

T4 or V100 lacks bfloat16 support

Add --dtype float16 to serving args

For full error handling and rollback procedures, see troubleshooting.md.

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card