Claude Code for DevOps: 150+ Agent Skills for CI/CD, Kubernetes, and Infrastructure as Code
Turn Claude Code into a DevOps engineering assistant with 150+ agent skills for Docker, Kubernetes, Terraform, GitHub Actions, ArgoCD, Vault, chaos engineering, compliance, and cloud infrastructure. Install commands and real workflows.
DevOps engineers already live in the terminal. Claude Code is a terminal-native agent. The fit is obvious. But "use AI to write my Dockerfile" is table stakes. The real value is in Skills that encode your infrastructure patterns: the exact Terraform module structure your team uses, the Kubernetes manifest conventions you enforce, the CI/CD pipeline stages you have standardized, and the incident response runbook that nobody reads until 3am.
j4flmao/agent-skills has 55+ DevOps skills covering Docker, Kubernetes, Terraform, Ansible, GitHub Actions, ArgoCD, Vault, chaos engineering, FinOps, and every major cloud provider. BagelHole/DevOps-Security-Agent-Skills adds 160+ skills with production-ready configs and scripts, including compliance frameworks (SOC2, HIPAA, GDPR, PCI-DSS). addyosmani/agent-skills adds 23 production-grade engineering skills with verification gates and anti-rationalization tables. Cloudflare and Netlify have published official skills. MicrosoftDocs/Agent-Skills covers 193 Azure-specific skills. Here is the full map.
Here is the difference a skill makes. Without the devops-terraform skill, Claude generates a generic Terraform resource with no lifecycle rules. With the skill installed:
Added
lifecycle { prevent_destroy = true }to the RDS instance (gotcha: without this, aterraform destroynukes the production database with no confirmation). Addedignore_changes = [engine_version]to prevent drift on managed upgrades. Module output includes the connection string but NOT the password - that routes through Vault dynamic credentials per thedevops-vaultskill.
The gotchas list is the difference between "AI wrote my Terraform" and "AI wrote my Terraform the way my senior engineer would."
What DevOps Skills Should You Install? (by Domain)
Containerization
| Skill | What It Does | Source |
|---|---|---|
| docker-patterns | Docker best practices. Multi-stage builds, layer caching, security scanning, compose patterns, networking. | j4flmao/agent-skills |
| devops-monorepo | Monorepo tooling and build optimization. Nx, Turborepo, Bazel patterns for containerized services. | j4flmao/agent-skills |
| container-hardening | Secure container configs against CIS benchmarks. Non-root users, read-only filesystems, capability dropping. | BagelHole/DevOps-Security-Agent-Skills |
| container-scanning | Image vulnerability scanning with Trivy and Grype. CI integration for blocking vulnerable builds. | BagelHole/DevOps-Security-Agent-Skills |
What makes these useful: The docker-patterns skill prevents the mistakes that ship to production: running as root, not pinning base image versions, missing health checks, bloated layers from dev dependencies. The container-hardening skill from BagelHole goes further with CIS benchmark compliance - the kind of thing you only discover you need when a security audit lands.
Kubernetes & Orchestration
| Skill | What It Does | Source |
|---|---|---|
| kubernetes-patterns | K8s deployment manifests, resource management, pod scheduling, service discovery, networking policies. | j4flmao/agent-skills |
| helm-patterns | Helm chart authoring, values templating, release management, chart repositories. | j4flmao/agent-skills |
| devops-service-mesh | Service mesh configuration. Istio, Linkerd patterns. Traffic management, mTLS, observability. | j4flmao/agent-skills |
| devops-nomad | HashiCorp Nomad job specifications, scheduling, multi-region deployment. | j4flmao/agent-skills |
| kubernetes-for-data | Kubernetes for data workloads. Spark on K8s, distributed training, GPU scheduling. | j4flmao/agent-skills |
The
kubernetes-patternsskill prevents the defaults that cause production outages: no resource limits, no health checks, no pod disruption budgets. Thehelm-patternsskill enforces values templating discipline so your charts don't break in staging.
CI/CD Pipelines
| Skill | What It Does | Source |
|---|---|---|
| cicd-pipeline | CI/CD pipeline design. Build, test, deploy stages. Artifact management, environment promotion. | j4flmao/agent-skills |
| github-actions | GitHub Actions workflow authoring. Matrix builds, caching, reusable workflows, composite actions. | j4flmao/agent-skills |
| devops-jenkins | Jenkins pipeline configuration. Declarative and scripted pipelines, shared libraries. | j4flmao/agent-skills |
| devops-argo-cd | ArgoCD GitOps deployment. Application sets, sync policies, progressive delivery, rollback strategies. | j4flmao/agent-skills |
| devops-gitops | GitOps principles and workflows. Flux, ArgoCD, branch strategies, environment management. | j4flmao/agent-skills |
The
github-actionsskill knows the gotchas: pin action versions to SHAs (not tags), use OIDC for cloud auth instead of long-lived secrets, cache dependency layers aggressively. Thedevops-argo-cdskill encodes sync wave ordering so your database migrates before your app deploys, not after.
Infrastructure as Code
| Skill | What It Does | Source |
|---|---|---|
| devops-terraform | Terraform module structure, state management, workspace patterns, provider configuration, drift detection. | j4flmao/agent-skills |
| devops-ansible | Ansible playbook authoring. Roles, inventories, vault encryption, idempotent task design. | j4flmao/agent-skills |
| devops-serverless | Serverless architecture patterns. Lambda/Cloud Functions/Azure Functions, event-driven design, cold start optimization. | j4flmao/agent-skills |
| policy-as-code | OPA Rego, Kyverno, and Checkov for automated policy enforcement on infrastructure. | BagelHole/DevOps-Security-Agent-Skills |
What makes these useful: The devops-terraform skill encodes module structure, not just syntax. State management patterns, workspace strategies, provider pinning, and drift detection. The policy-as-code skill from BagelHole adds the guardrails: "no public S3 buckets," "all RDS instances must have encryption enabled," "no EC2 instances without termination protection." Policies that catch mistakes before terraform apply.
Cloud Platforms
| Skill | What It Does | Source |
|---|---|---|
| devops-aws | AWS infrastructure patterns. VPC, IAM, ECS/EKS, RDS, S3, CloudFormation, CDK. | j4flmao/agent-skills |
| devops-gcp | Google Cloud patterns. GKE, Cloud Run, BigQuery, IAM, Pub/Sub. | j4flmao/agent-skills |
| devops-azure | Azure infrastructure. AKS, App Service, Cosmos DB, ARM templates, Bicep. | j4flmao/agent-skills |
| cloudflare-workers | Cloudflare Workers and Pages. Edge functions, KV storage, Durable Objects, R2. | Cloudflare (official) |
| netlify-deploy | Netlify deployment. Serverless functions, edge middleware, build plugins. | Netlify (official) |
If you are on AWS, the
devops-awsskill covers the full stack (VPC, IAM, ECS/EKS, RDS, S3, CDK). If you use Cloudflare, their official skill covers Workers, Pages, KV, D1, R2, and even AI agent building with the Agents SDK. Pick your cloud, install its skill.
Monitoring & Observability
| Skill | What It Does | Source |
|---|---|---|
| devops-observability | Observability stack design. Metrics (Prometheus), logs (Loki/ELK), traces (Jaeger/Tempo), SLOs/SLIs. | j4flmao/agent-skills |
| devops-monitoring | Monitoring setup. Alerting rules, dashboard design, on-call rotation, escalation policies. | j4flmao/agent-skills |
| performance-optimization | Measure-first approach. Core Web Vitals, profiling workflows, bundle analysis, anti-pattern detection. | addyosmani/agent-skills |
The
devops-observabilityskill encodes the three pillars (metrics, logs, traces) as a unified design, not three separate tools. It includes SLO/SLI definition patterns so you measure what matters to users, not what is easy to instrument.
Security & Secrets
| Skill | What It Does | Source |
|---|---|---|
| devops-vault | HashiCorp Vault. Secrets management, dynamic credentials, PKI, transit encryption. | j4flmao/agent-skills |
| security-and-hardening | OWASP Top 10 prevention, auth patterns, secrets management, dependency auditing, three-tier boundary system. | addyosmani/agent-skills |
| devops-security | DevSecOps pipeline integration. SAST, DAST, container scanning, compliance checks. | j4flmao/agent-skills |
The
security-and-hardeningskill from Addy Osmani uses a three-tier boundary system and includes anti-rationalization entries like: AI says "we can skip the security review, this is an internal tool" - skill responds "internal tools get compromised first because nobody reviews them." That is institutional knowledge encoded.
Reliability & Chaos
| Skill | What It Does | Source |
|---|---|---|
| devops-chaos-engineering | Chaos engineering practices. Failure injection, blast radius control, steady state hypothesis, game day planning. | j4flmao/agent-skills |
| devops-backup-dr | Backup and disaster recovery. RPO/RTO planning, cross-region replication, failover testing. | j4flmao/agent-skills |
| devops-incident-response | Incident response runbooks. Severity classification, communication templates, post-mortem structure. | j4flmao/agent-skills |
Database Operations
| Skill | What It Does | Source |
|---|---|---|
| devops-database-migration | Database migration patterns. Zero-downtime migrations, schema versioning, rollback strategies. | j4flmao/agent-skills |
| devops-dataops | DataOps practices. Data pipeline testing, data quality checks, lineage tracking. | j4flmao/agent-skills |
Free AI Builder Newsletter
Weekly guides on AI tools & builder strategies.
Cost & Operations
| Skill | What It Does | Source |
|---|---|---|
| devops-finops | Cloud cost optimization. Resource right-sizing, reserved instance planning, cost allocation tags, waste detection. | j4flmao/agent-skills |
| devops-mlops | MLOps pipeline management. Model versioning, feature stores, A/B model deployment, monitoring. | j4flmao/agent-skills |
| dependency-management | Dependency management and security. Automated updates, vulnerability scanning, license compliance. | j4flmao/agent-skills |
| api-documentation | API documentation generation and maintenance. OpenAPI spec, changelog, versioning strategy. | j4flmao/agent-skills |
What Makes Addy Osmani's Engineering Skills Different?
addyosmani/agent-skills deserves its own section. This is Addy Osmani from the Google Chrome team. 23 skills that encode how senior engineers actually work. Each skill has verification gates (checkpoints before moving to the next phase) and anti-rationalization tables (common excuses the AI makes for cutting corners, with the correct response).
Skills relevant to DevOps:
| Skill | What It Does |
|---|---|
| security-and-hardening | OWASP Top 10, auth patterns, secrets management, dependency auditing |
| performance-optimization | Measure-first profiling, Core Web Vitals, bundle analysis |
| code-review | Change sizing (~100 lines), severity classification (Nit/Optional/FYI), verification before approval |
| code-simplification | Reduce complexity in code that works but is harder to maintain than it should be |
| technical-specification | Write specs before building. Requirements, constraints, alternatives, rollout plan |
The anti-rationalization tables are the standout feature. Example: when the AI suggests "we can skip the load test because traffic is low," the skill includes the counter: "traffic is low NOW. The load test proves the system handles the traffic it will have after launch." This is the kind of institutional knowledge that normally lives in a senior engineer's head.
Which Repos Have the Best DevOps Skills?
| Repo | Skills | Focus |
|---|---|---|
| j4flmao/agent-skills | 55+ DevOps | Docker, K8s, Terraform, Ansible, CI/CD, cloud, chaos, FinOps, MLOps, incident response |
| BagelHole/DevOps-Security-Agent-Skills | 160+ | Deep DevOps + security + compliance. Production-ready configs and scripts. SOC2, HIPAA, GDPR, PCI-DSS frameworks. |
| addyosmani/agent-skills | 23 engineering | Production-grade workflows with verification gates and anti-rationalization tables |
| MicrosoftDocs/Agent-Skills | 193 Azure | Complete Azure coverage: compute, networking, security, AI/ML, data, management |
| cloudflare/skills | 8 | Workers, Pages, KV, D1, R2, AI, Agents SDK, MCP server building |
| netlify/context-and-tools | 12 | Functions, Edge Functions, Blobs, DB, Config, Caching, AI Gateway |
Install the full j4flmao DevOps set:
git clone https://github.com/j4flmao/agent-skills.git
cp -r agent-skills/presets/devops-only/* ~/.claude/skills/
Install BagelHole's DevOps + Security collection (deepest coverage):
npx skills add bagelhole/DevOps-Security-Agent-Skills
Install Addy Osmani's engineering skills:
npx skills add addyosmani/agent-skills
How Do You Stack Skills for a DevOps Workflow?
New service deployment:
→ technical-specification (write the spec first)
→ docker-patterns (containerize the service)
→ kubernetes-patterns (write the K8s manifests)
→ helm-patterns (package as a Helm chart)
→ cicd-pipeline + github-actions (build the pipeline)
→ devops-gitops + devops-argo-cd (deploy via GitOps)
Infrastructure provisioning:
→ devops-terraform (write the IaC modules)
→ devops-aws / devops-gcp / devops-azure (cloud-specific patterns)
→ devops-vault (secrets management)
→ devops-observability (monitoring + alerting)
Incident response:
→ devops-incident-response (runbook, severity, communication)
→ devops-observability (investigate with metrics/logs/traces)
→ devops-chaos-engineering (prevent recurrence with chaos tests)
→ devops-backup-dr (verify recovery procedures)
Cost optimization:
→ devops-finops (identify waste, right-size resources)
→ performance-optimization (reduce compute requirements)
→ devops-serverless (evaluate serverless migration)
Why Do DevOps Skills Hit Different?
DevOps is the vertical where agent skills make the most obvious sense:
- You already work in the terminal. Claude Code is not a context switch. It is a faster way to do what you already do.
- Infrastructure is repetitive. Every Terraform module has the same structure. Every K8s manifest follows the same patterns. Every CI/CD pipeline has the same stages. Skills encode the patterns you repeat.
- Tribal knowledge is the bottleneck. "We use a specific Terraform module structure." "Our K8s manifests require these annotations." "The CI pipeline needs this secret rotation step." Skills capture this institutional knowledge so every team member (and every AI agent) follows it.
- Mistakes are expensive. A bad Docker image costs compute. A misconfigured K8s manifest causes downtime. A Terraform drift destroys infrastructure. Skills include gotchas lists that prevent the mistakes the AI would otherwise make with default patterns.
Frequently Asked Questions
Can I trust AI-generated infrastructure code?
Skills do not generate infrastructure code from scratch. They generate code following YOUR patterns and YOUR gotchas list. Review everything before applying, same as you would review a PR from a junior engineer. The skill reduces generation time; the review ensures correctness.
What about secrets and sensitive configuration?
Skills are plaintext Markdown files. They should NEVER contain secrets, API keys, or credentials. Use skills to generate the configuration structure and reference the secrets management pattern (Vault, AWS Secrets Manager, etc.). The devops-vault skill encodes the pattern for dynamic secrets without containing any actual secrets.
Do these skills work with Cursor too?
Yes. Copy the skill folders to ~/.cursor/skills/. The SKILL.md format is an open standard. Same instructions, same patterns.
How do I customize skills for my team's conventions?
Fork the skill into your project's .claude/skills/ directory. Edit the SKILL.md to include your specific conventions: naming patterns, required annotations, mandatory labels, resource limits, and any infrastructure-specific gotchas. Commit to git so everyone on the team gets them.
Sources
- j4flmao/agent-skills - 423 agent skills including 55+ DevOps (Docker, K8s, Terraform, CI/CD, cloud, chaos, FinOps)
- addyosmani/agent-skills - 23 production-grade engineering skills with verification gates
- Agent Skills Specification - The open standard for portable AI agent capabilities
- VoltAgent/awesome-agent-skills - Curated directory of 1,424+ skills
Related Content
- Anthropic Built 300+ Claude Code Skills Internally. Here's What They Learned. - How to build your own skills + the complete skills directory by category
- Claude Code for Sales - 60+ sales skills mapped to every pipeline stage
- Claude Code for Data Scientists - 35+ data science skills for EDA, visualization, and modeling
- Claude Code for Growth & Marketing - 120+ marketing skills with the shared context pattern
- Claude Code for Solopreneurs - The 15-skill stack that replaces 5 hires
Start Here
Install the j4flmao DevOps preset: cp -r agent-skills/presets/devops-only/* ~/.claude/skills/. Then install addyosmani's engineering skills: npx skills add addyosmani/agent-skills. Customize the devops-terraform skill with your team's module structure. That is the skill with the highest ROI for most teams.
Want to build your own infrastructure skills from scratch? The generic skills in this guide are starting points. The real value is encoding YOUR Terraform module structure, YOUR K8s annotation requirements, YOUR incident response runbook, YOUR team's gotchas list. Agent Skills 101 covers the full SKILL.md spec, progressive loading, security best practices, and hands-on skill building with real-world deconstructions. 25 lessons. Works with Claude Code, Cursor, or any compatible agent.
Continue Learning
AI Builder Club
Courses, workshops, and a builder community for shipping with AI agents, Claude Code, and more.
Get the free newsletter
Weekly deep-dives on AI tools, automation workflows, and builder strategies. Join 5,000+ readers.
No spam. Unsubscribe anytime.