Make your resilience work actually work.
We fix broken resilience programs.
AI won't fix what your organization can't see.
We help engineering organizations understand why they keep having the same types of incidents — despite doing incident analysis, GameDays, operational readiness reviews, monitoring, chaos engineering, and increasingly, even AI.
Is this your reality?
The same types of incidents keep happening despite thorough postmortems and action items
You invested in chaos engineering but can't tell if it's actually making you more resilient
Your teams do incident reviews, but the learning never spreads beyond the room
You're spending more time fighting fires than improving your systems
You know something's broken organizationally, but can't pinpoint what
Leadership is asking "why does this keep happening?" and you don't have a good answer
You're deploying AI into operations but can't tell if it's making better decisions or just faster ones
You're betting on AI to improve operations but nobody can explain what happens when it's wrong
Sound familiar? These aren't isolated problems - they're all connected. Here's how we help organizations break this cycle.
NEW BOOK
Why We Still Suck at Resilience
Your organization does incident reviews, runs GameDays, and practices chaos engineering. So why do the same incidents keep happening? This book explains why, and what to do about it.
Start with diagnosis, then build from there
Most organizations start with the Resilience Assessment. Once we've identified what's actually broken, we can help you strengthen specific capabilities or partner for long-term transformation. But diagnosis comes first—you need to know what to fix.
1. Resilience Assessment
-
Your organization keeps having the same types of incidents despite doing retros, chaos engineering, and architecture reviews. Something organizational is broken—but what?
What we do:
We diagnose the feedback loop failures and organizational patterns that prevent your teams from learning and adapting.
We embed with your teams to see how work actually happens versus how it's described. We participate in your incident reviews, observe GameDays, sit in on chaos experiments, and join operational readiness reviews. We watch how teams interact, what incentives and pressures they face, and where the gaps appear between policy and practice.
Through this combination of observation and structured interviews, we identify exactly what's blocking resilience and give you a clear roadmap to fix it.
What you get:
✓ Embedded observation of your actual resilience practices (incident reviews, GameDays, ORRs, chaos experiments)
✓ Stakeholder interviews across engineering, ops, and leadership
✓ Deep analysis of your feedback loops and incident patterns
✓ Written report with prioritized, actionable recommendations
✓ 2-hour executive readout session with your leadership team
Timeline: 6-8 weeks from kickoff to delivery
2. Strengthen
(After Assessment)
-
Once we've identified what's broken, we help you build specific capabilities to address the gaps.
These are focused 2-4 month engagements that build one capability deeply.
Typical services:
Chaos Engineering Programs - Design and implement systematic resilience testingOperational Readiness Reviews - Validate systems are actually ready for production
Incident Analysis Process - Improve how your teams learn from failures
3. Transform
(Long-term Partnership)
-
For organizations ready for comprehensive change, we offer ongoing strategic partnership to embed resilience into your culture and operations.
What this looks like:
Monthly strategic sessions with leadershipQuarterly organizational health assessments
Ongoing advisory as you implement changes
Access for architecture reviews and escalations
Typical engagement: 6-12+ months of collaboration
What people say about us
-
"More often than not, "consultants" can talk the talk, but cannot walk the walk. If you want/need to improve the resilience of your systems and operations, Adrian has proven that he can deliver. He is an educator at heart, with in-depth knowledge based on real experience."
-
"As I moved on to focus on sustainability and eventually retired from AWS I’ve forwarded many people to “the other Adrian” as he specialized in the AWS Fault Injection Service and has now become the go-to independent expert in this space. Most companies don’t realize that a good resilience program will speed up their time to market for everything else, and Adrian can help you get there. "
-
"I can confidently recommend Adrian for any organization seeking to improve its operational readiness and resilience. He applies his deep expertise in Chaos and Resilience Engineering methodically to help teams identify and address gaps in their systems and processes. Adrian doesn't come to you with a prescriptive checklist. Instead, he studies your organization's culture carefully to understand deep underlying contributing factors that impact resilience. Be prepared for what you need to hear, not what you want to hear. But fear not - Adrian understands human psychology and delivers his insights in a respectful and constructive manner that drives effective and sustainable change. He is an accelerator for organizational learning and improvement."
-
"Collaborating with Adrian has been transformative for our team’s and BMW Group as enterprise approach to resiliency and chaos engineering. As a fellow techy at BMW, I had the opportunity to work closely with Adrian on several key projects. His deep understanding of resiliency best practices and chaos engineering was instrumental in scaling our chaos experimentation initiatives."
-
"I’ve had the privilege of knowing Adrian Hornsby for many years, and my respect for him has only grown over time — both on a personal and a professional level. Adrian has an exceptional ability to understand the deep interplay between people, teams, and the complex technical problems they are trying to solve. What sets him apart is his remarkable sensitivity and clarity in guiding teams toward the real heart of a problem. He navigates highly complex socio-technical systems with ease and helps organizations focus on what truly matters. Adrian has a rare talent for enabling teams to work more closely together and build systems that are not only reliable, but resilient by design. Anyone who has the chance to collaborate with him will benefit from his insight, his experience, and his genuinely thoughtful way of working."
-
"Adrian brought a blend of deep expertise and practical insight to our team. He didn’t just teach resilience patterns, he challenged our engineers to think differently about operational excellence and how to design systems with failure in mind. he was engaging, thought-provoking, and left the team with actionable ways to improve how we build and operate software.”
-
"We had an offsite this week to talk about goals, etc for next year and the concepts you talked about during our workshop kept getting brought up - it’s so great that we have that common knowledge/understanding/vocabulary now - just wanted to mention that and thank you for your prep, time and sharing your knowledge with us!"
Anonymous
Our team
Adrian Hornsby spent 9 years at AWS, including 4 years as Principal Engineer on the AWS Fault Injection Service team, helping build resilience for the world's largest systems. Now he helps software engineering organizations diagnose why their resilience programs aren't working—and fix them.
Ready to stop fighting fires?
The best first step is a conversation to understand your current challenges and resilience goals. We'll explore whether our approach aligns with your needs and discuss which step in the journey makes sense for your organization.
There's no obligation, and this conversation alone often provides valuable perspectives on your resilience opportunities.
Your information remains confidential, and we'll respond promptly.
Resilium Labs Oy
☎ +358 (0)504361615
✉ adhorn@resiliumlabs.com