Claude Code for DevOps: 150+ Agent Skills for CI/CD, Kubernetes, and Infrastructure as Code

Turn Claude Code into a DevOps engineering assistant with 150+ agent skills for Docker, Kubernetes, Terraform, GitHub Actions, ArgoCD, Vault, chaos engineering, compliance, and cloud infrastructure. Install commands and real workflows.

Shirley11 min read

DevOps engineers already live in the terminal. Claude Code is a terminal-native agent. The fit is obvious. But "use AI to write my Dockerfile" is table stakes. The real value is in Skills that encode your infrastructure patterns: the exact Terraform module structure your team uses, the Kubernetes manifest conventions you enforce, the CI/CD pipeline stages you have standardized, and the incident response runbook that nobody reads until 3am.

j4flmao/agent-skills has 55+ DevOps skills covering Docker, Kubernetes, Terraform, Ansible, GitHub Actions, ArgoCD, Vault, chaos engineering, FinOps, and every major cloud provider. BagelHole/DevOps-Security-Agent-Skills adds 160+ skills with production-ready configs and scripts, including compliance frameworks (SOC2, HIPAA, GDPR, PCI-DSS). addyosmani/agent-skills adds 23 production-grade engineering skills with verification gates and anti-rationalization tables. Cloudflare and Netlify have published official skills. MicrosoftDocs/Agent-Skills covers 193 Azure-specific skills. Here is the full map.

Here is the difference a skill makes. Without the devops-terraform skill, Claude generates a generic Terraform resource with no lifecycle rules. With the skill installed:

Added lifecycle { prevent_destroy = true } to the RDS instance (gotcha: without this, a terraform destroy nukes the production database with no confirmation). Added ignore_changes = [engine_version] to prevent drift on managed upgrades. Module output includes the connection string but NOT the password - that routes through Vault dynamic credentials per the devops-vault skill.

The gotchas list is the difference between "AI wrote my Terraform" and "AI wrote my Terraform the way my senior engineer would."


What DevOps Skills Should You Install? (by Domain)

Containerization

SkillWhat It DoesSource
docker-patternsDocker best practices. Multi-stage builds, layer caching, security scanning, compose patterns, networking.j4flmao/agent-skills
devops-monorepoMonorepo tooling and build optimization. Nx, Turborepo, Bazel patterns for containerized services.j4flmao/agent-skills
container-hardeningSecure container configs against CIS benchmarks. Non-root users, read-only filesystems, capability dropping.BagelHole/DevOps-Security-Agent-Skills
container-scanningImage vulnerability scanning with Trivy and Grype. CI integration for blocking vulnerable builds.BagelHole/DevOps-Security-Agent-Skills

What makes these useful: The docker-patterns skill prevents the mistakes that ship to production: running as root, not pinning base image versions, missing health checks, bloated layers from dev dependencies. The container-hardening skill from BagelHole goes further with CIS benchmark compliance - the kind of thing you only discover you need when a security audit lands.


Kubernetes & Orchestration

SkillWhat It DoesSource
kubernetes-patternsK8s deployment manifests, resource management, pod scheduling, service discovery, networking policies.j4flmao/agent-skills
helm-patternsHelm chart authoring, values templating, release management, chart repositories.j4flmao/agent-skills
devops-service-meshService mesh configuration. Istio, Linkerd patterns. Traffic management, mTLS, observability.j4flmao/agent-skills
devops-nomadHashiCorp Nomad job specifications, scheduling, multi-region deployment.j4flmao/agent-skills
kubernetes-for-dataKubernetes for data workloads. Spark on K8s, distributed training, GPU scheduling.j4flmao/agent-skills

The kubernetes-patterns skill prevents the defaults that cause production outages: no resource limits, no health checks, no pod disruption budgets. The helm-patterns skill enforces values templating discipline so your charts don't break in staging.


CI/CD Pipelines

SkillWhat It DoesSource
cicd-pipelineCI/CD pipeline design. Build, test, deploy stages. Artifact management, environment promotion.j4flmao/agent-skills
github-actionsGitHub Actions workflow authoring. Matrix builds, caching, reusable workflows, composite actions.j4flmao/agent-skills
devops-jenkinsJenkins pipeline configuration. Declarative and scripted pipelines, shared libraries.j4flmao/agent-skills
devops-argo-cdArgoCD GitOps deployment. Application sets, sync policies, progressive delivery, rollback strategies.j4flmao/agent-skills
devops-gitopsGitOps principles and workflows. Flux, ArgoCD, branch strategies, environment management.j4flmao/agent-skills

The github-actions skill knows the gotchas: pin action versions to SHAs (not tags), use OIDC for cloud auth instead of long-lived secrets, cache dependency layers aggressively. The devops-argo-cd skill encodes sync wave ordering so your database migrates before your app deploys, not after.


Infrastructure as Code

SkillWhat It DoesSource
devops-terraformTerraform module structure, state management, workspace patterns, provider configuration, drift detection.j4flmao/agent-skills
devops-ansibleAnsible playbook authoring. Roles, inventories, vault encryption, idempotent task design.j4flmao/agent-skills
devops-serverlessServerless architecture patterns. Lambda/Cloud Functions/Azure Functions, event-driven design, cold start optimization.j4flmao/agent-skills
policy-as-codeOPA Rego, Kyverno, and Checkov for automated policy enforcement on infrastructure.BagelHole/DevOps-Security-Agent-Skills

What makes these useful: The devops-terraform skill encodes module structure, not just syntax. State management patterns, workspace strategies, provider pinning, and drift detection. The policy-as-code skill from BagelHole adds the guardrails: "no public S3 buckets," "all RDS instances must have encryption enabled," "no EC2 instances without termination protection." Policies that catch mistakes before terraform apply.


Cloud Platforms

SkillWhat It DoesSource
devops-awsAWS infrastructure patterns. VPC, IAM, ECS/EKS, RDS, S3, CloudFormation, CDK.j4flmao/agent-skills
devops-gcpGoogle Cloud patterns. GKE, Cloud Run, BigQuery, IAM, Pub/Sub.j4flmao/agent-skills
devops-azureAzure infrastructure. AKS, App Service, Cosmos DB, ARM templates, Bicep.j4flmao/agent-skills
cloudflare-workersCloudflare Workers and Pages. Edge functions, KV storage, Durable Objects, R2.Cloudflare (official)
netlify-deployNetlify deployment. Serverless functions, edge middleware, build plugins.Netlify (official)

If you are on AWS, the devops-aws skill covers the full stack (VPC, IAM, ECS/EKS, RDS, S3, CDK). If you use Cloudflare, their official skill covers Workers, Pages, KV, D1, R2, and even AI agent building with the Agents SDK. Pick your cloud, install its skill.


Monitoring & Observability

SkillWhat It DoesSource
devops-observabilityObservability stack design. Metrics (Prometheus), logs (Loki/ELK), traces (Jaeger/Tempo), SLOs/SLIs.j4flmao/agent-skills
devops-monitoringMonitoring setup. Alerting rules, dashboard design, on-call rotation, escalation policies.j4flmao/agent-skills
performance-optimizationMeasure-first approach. Core Web Vitals, profiling workflows, bundle analysis, anti-pattern detection.addyosmani/agent-skills

The devops-observability skill encodes the three pillars (metrics, logs, traces) as a unified design, not three separate tools. It includes SLO/SLI definition patterns so you measure what matters to users, not what is easy to instrument.


Security & Secrets

SkillWhat It DoesSource
devops-vaultHashiCorp Vault. Secrets management, dynamic credentials, PKI, transit encryption.j4flmao/agent-skills
security-and-hardeningOWASP Top 10 prevention, auth patterns, secrets management, dependency auditing, three-tier boundary system.addyosmani/agent-skills
devops-securityDevSecOps pipeline integration. SAST, DAST, container scanning, compliance checks.j4flmao/agent-skills

The security-and-hardening skill from Addy Osmani uses a three-tier boundary system and includes anti-rationalization entries like: AI says "we can skip the security review, this is an internal tool" - skill responds "internal tools get compromised first because nobody reviews them." That is institutional knowledge encoded.


Reliability & Chaos

SkillWhat It DoesSource
devops-chaos-engineeringChaos engineering practices. Failure injection, blast radius control, steady state hypothesis, game day planning.j4flmao/agent-skills
devops-backup-drBackup and disaster recovery. RPO/RTO planning, cross-region replication, failover testing.j4flmao/agent-skills
devops-incident-responseIncident response runbooks. Severity classification, communication templates, post-mortem structure.j4flmao/agent-skills

Database Operations

SkillWhat It DoesSource
devops-database-migrationDatabase migration patterns. Zero-downtime migrations, schema versioning, rollback strategies.j4flmao/agent-skills
devops-dataopsDataOps practices. Data pipeline testing, data quality checks, lineage tracking.j4flmao/agent-skills

Free AI Builder Newsletter

Weekly guides on AI tools & builder strategies.

Cost & Operations

SkillWhat It DoesSource
devops-finopsCloud cost optimization. Resource right-sizing, reserved instance planning, cost allocation tags, waste detection.j4flmao/agent-skills
devops-mlopsMLOps pipeline management. Model versioning, feature stores, A/B model deployment, monitoring.j4flmao/agent-skills
dependency-managementDependency management and security. Automated updates, vulnerability scanning, license compliance.j4flmao/agent-skills
api-documentationAPI documentation generation and maintenance. OpenAPI spec, changelog, versioning strategy.j4flmao/agent-skills

What Makes Addy Osmani's Engineering Skills Different?

addyosmani/agent-skills deserves its own section. This is Addy Osmani from the Google Chrome team. 23 skills that encode how senior engineers actually work. Each skill has verification gates (checkpoints before moving to the next phase) and anti-rationalization tables (common excuses the AI makes for cutting corners, with the correct response).

Skills relevant to DevOps:

SkillWhat It Does
security-and-hardeningOWASP Top 10, auth patterns, secrets management, dependency auditing
performance-optimizationMeasure-first profiling, Core Web Vitals, bundle analysis
code-reviewChange sizing (~100 lines), severity classification (Nit/Optional/FYI), verification before approval
code-simplificationReduce complexity in code that works but is harder to maintain than it should be
technical-specificationWrite specs before building. Requirements, constraints, alternatives, rollout plan

The anti-rationalization tables are the standout feature. Example: when the AI suggests "we can skip the load test because traffic is low," the skill includes the counter: "traffic is low NOW. The load test proves the system handles the traffic it will have after launch." This is the kind of institutional knowledge that normally lives in a senior engineer's head.


Which Repos Have the Best DevOps Skills?

RepoSkillsFocus
j4flmao/agent-skills55+ DevOpsDocker, K8s, Terraform, Ansible, CI/CD, cloud, chaos, FinOps, MLOps, incident response
BagelHole/DevOps-Security-Agent-Skills160+Deep DevOps + security + compliance. Production-ready configs and scripts. SOC2, HIPAA, GDPR, PCI-DSS frameworks.
addyosmani/agent-skills23 engineeringProduction-grade workflows with verification gates and anti-rationalization tables
MicrosoftDocs/Agent-Skills193 AzureComplete Azure coverage: compute, networking, security, AI/ML, data, management
cloudflare/skills8Workers, Pages, KV, D1, R2, AI, Agents SDK, MCP server building
netlify/context-and-tools12Functions, Edge Functions, Blobs, DB, Config, Caching, AI Gateway

Install the full j4flmao DevOps set:

git clone https://github.com/j4flmao/agent-skills.git
cp -r agent-skills/presets/devops-only/* ~/.claude/skills/

Install BagelHole's DevOps + Security collection (deepest coverage):

npx skills add bagelhole/DevOps-Security-Agent-Skills

Install Addy Osmani's engineering skills:

npx skills add addyosmani/agent-skills

How Do You Stack Skills for a DevOps Workflow?

New service deployment:
  → technical-specification (write the spec first)
  → docker-patterns (containerize the service)
  → kubernetes-patterns (write the K8s manifests)
  → helm-patterns (package as a Helm chart)
  → cicd-pipeline + github-actions (build the pipeline)
  → devops-gitops + devops-argo-cd (deploy via GitOps)

Infrastructure provisioning:
  → devops-terraform (write the IaC modules)
  → devops-aws / devops-gcp / devops-azure (cloud-specific patterns)
  → devops-vault (secrets management)
  → devops-observability (monitoring + alerting)

Incident response:
  → devops-incident-response (runbook, severity, communication)
  → devops-observability (investigate with metrics/logs/traces)
  → devops-chaos-engineering (prevent recurrence with chaos tests)
  → devops-backup-dr (verify recovery procedures)

Cost optimization:
  → devops-finops (identify waste, right-size resources)
  → performance-optimization (reduce compute requirements)
  → devops-serverless (evaluate serverless migration)

Why Do DevOps Skills Hit Different?

DevOps is the vertical where agent skills make the most obvious sense:

  1. You already work in the terminal. Claude Code is not a context switch. It is a faster way to do what you already do.
  2. Infrastructure is repetitive. Every Terraform module has the same structure. Every K8s manifest follows the same patterns. Every CI/CD pipeline has the same stages. Skills encode the patterns you repeat.
  3. Tribal knowledge is the bottleneck. "We use a specific Terraform module structure." "Our K8s manifests require these annotations." "The CI pipeline needs this secret rotation step." Skills capture this institutional knowledge so every team member (and every AI agent) follows it.
  4. Mistakes are expensive. A bad Docker image costs compute. A misconfigured K8s manifest causes downtime. A Terraform drift destroys infrastructure. Skills include gotchas lists that prevent the mistakes the AI would otherwise make with default patterns.

Frequently Asked Questions

Can I trust AI-generated infrastructure code?

Skills do not generate infrastructure code from scratch. They generate code following YOUR patterns and YOUR gotchas list. Review everything before applying, same as you would review a PR from a junior engineer. The skill reduces generation time; the review ensures correctness.

What about secrets and sensitive configuration?

Skills are plaintext Markdown files. They should NEVER contain secrets, API keys, or credentials. Use skills to generate the configuration structure and reference the secrets management pattern (Vault, AWS Secrets Manager, etc.). The devops-vault skill encodes the pattern for dynamic secrets without containing any actual secrets.

Do these skills work with Cursor too?

Yes. Copy the skill folders to ~/.cursor/skills/. The SKILL.md format is an open standard. Same instructions, same patterns.

How do I customize skills for my team's conventions?

Fork the skill into your project's .claude/skills/ directory. Edit the SKILL.md to include your specific conventions: naming patterns, required annotations, mandatory labels, resource limits, and any infrastructure-specific gotchas. Commit to git so everyone on the team gets them.


Sources

  1. j4flmao/agent-skills - 423 agent skills including 55+ DevOps (Docker, K8s, Terraform, CI/CD, cloud, chaos, FinOps)
  2. addyosmani/agent-skills - 23 production-grade engineering skills with verification gates
  3. Agent Skills Specification - The open standard for portable AI agent capabilities
  4. VoltAgent/awesome-agent-skills - Curated directory of 1,424+ skills


Start Here

Install the j4flmao DevOps preset: cp -r agent-skills/presets/devops-only/* ~/.claude/skills/. Then install addyosmani's engineering skills: npx skills add addyosmani/agent-skills. Customize the devops-terraform skill with your team's module structure. That is the skill with the highest ROI for most teams.

Want to build your own infrastructure skills from scratch? The generic skills in this guide are starting points. The real value is encoding YOUR Terraform module structure, YOUR K8s annotation requirements, YOUR incident response runbook, YOUR team's gotchas list. Agent Skills 101 covers the full SKILL.md spec, progressive loading, security best practices, and hands-on skill building with real-world deconstructions. 25 lessons. Works with Claude Code, Cursor, or any compatible agent.

Start Agent Skills 101 →

Continue Learning

AI Builder Club

Courses, workshops, and a builder community for shipping with AI agents, Claude Code, and more.

Full courses on AI agents & Claude Code
Weekly live workshops
Private community of 1,000+ builders
New content every week
See what's inside →Join 1,000+ builders

Get the free newsletter

Weekly deep-dives on AI tools, automation workflows, and builder strategies. Join 5,000+ readers.

No spam. Unsubscribe anytime.