Make your resilience work actually work.

We fix broken resilience programs.

We help engineering organizations understand why they keep having the same types of incidents—despite doing incident analysis, architecture review, monitoring, and chaos engineering.

Is this your reality?

  • The same types of incidents keep happening despite thorough postmortems and action items

  • You invested in chaos engineering but can't tell if it's actually making you more resilient

  • Your teams do incident reviews, but the learning never spreads beyond the room

  • You're spending more time fighting fires than improving your systems

  • You know something's broken organizationally, but can't pinpoint what

  • Leadership is asking "why does this keep happening?" and you don't have a good answer

Sound familiar? These aren't isolated problems - they're all connected. Here's how we help organizations break this cycle.


Start with diagnosis, then build from there

Most organizations start with the Resilience Assessment. Once we've identified what's actually broken, we can help you strengthen specific capabilities or partner for long-term transformation. But diagnosis comes first—you need to know what to fix.

1. Resilience Assessment

People in a meeting room with laptops and gift bags, attending a presentation.
  • Your organization keeps having the same types of incidents despite doing retros, chaos engineering, and architecture reviews. Something organizational is broken—but what?

    What we do:

    We diagnose the feedback loop failures and organizational patterns that prevent your teams from learning and adapting.

    We embed with your teams to see how work actually happens versus how it's described. We participate in your incident reviews, observe GameDays, sit in on chaos experiments, and join operational readiness reviews. We watch how teams interact, what incentives and pressures they face, and where the gaps appear between policy and practice.

    Through this combination of observation and structured interviews, we identify exactly what's blocking resilience and give you a clear roadmap to fix it.

    What you get:

    ✓ Embedded observation of your actual resilience practices (incident reviews, GameDays, ORRs, chaos experiments)

    ✓ Stakeholder interviews across engineering, ops, and leadership

    ✓ Deep analysis of your feedback loops and incident patterns

    ✓ Written report with prioritized, actionable recommendations

    ✓ 2-hour executive readout session with your leadership team

    Timeline: 6-8 weeks from kickoff to delivery

    Investment: Starting at €25,000

Book a diagnostic call

2. Strengthen
(After Assessment)

Person pointing at sticky notes on a whiteboard with a shadow on the wall, likely during a brainstorming session.
  • Once we've identified what's broken, we help you build specific capabilities to address the gaps.

    These are focused 2-4 month engagements that build one capability deeply.

    Typical services:
    Chaos Engineering Programs - Design and implement systematic resilience testing

    Operational Readiness Reviews - Validate systems are actually ready for production

    Incident Analysis Process - Improve how your teams learn from failures

3. Transform
(Long-term Partnership)

Two hands holding puzzle pieces that fit together against a blurred outdoor background.
  • For organizations ready for comprehensive change, we offer ongoing strategic partnership to embed resilience into your culture and operations.

    What this looks like:

    Monthly strategic sessions with leadership

    Quarterly organizational health assessments

    Ongoing advisory as you implement changes

    Access for architecture reviews and escalations

    Typical engagement: 6-12+ months of collaboration


About us

At Resilium Labs, we help software organizations build systems that not only recover from failure but also expect it and thrive through it.

Drawing from twenty years of IT industry experience, including a decade at AWS designing resilience for the world's most demanding companies, we provide long-term advisory partnerships that transform both technical systems and team capabilities.

Our approach combines deep technical expertise with a practical understanding of the human and organizational factors that create truly resilient systems. We don't just apply theoretical patterns; we've built and implemented resilience practices at a global scale.

We believe that resilience is not just about preventing failures; it's about creating organizations that learn, adapt, and grow stronger through challenges. Our work focuses on building durable capabilities that enable continuous improvement, not just quick fixes.

Whether you're struggling with outages, preparing for regulations, or aiming to build true operational excellence, we offer proven approaches that balance immediate improvements with long-term transformation.


What people say about us

  • "More often than not, "consultants" can talk the talk, but cannot walk the walk. If you want/need to improve the resilience of your systems and operations, Adrian has proven that he can deliver. He is an educator at heart, with in-depth knowledge based on real experience."

  • "As I moved on to focus on sustainability and eventually retired from AWS I’ve forwarded many people to “the other Adrian” as he specialized in the AWS Fault Injection Service and has now become the go-to independent expert in this space. Most companies don’t realize that a good resilience program will speed up their time to market for everything else, and Adrian can help you get there. "

  • "I can confidently recommend Adrian for any organization seeking to improve its operational readiness and resilience. He applies his deep expertise in Chaos and Resilience Engineering methodically to help teams identify and address gaps in their systems and processes. Adrian doesn't come to you with a prescriptive checklist. Instead, he studies your organization's culture carefully to understand deep underlying contributing factors that impact resilience. Be prepared for what you need to hear, not what you want to hear. But fear not - Adrian understands human psychology and delivers his insights in a respectful and constructive manner that drives effective and sustainable change. He is an accelerator for organizational learning and improvement."

  • "Collaborating with Adrian has been transformative for our team’s and BMW Group as enterprise approach to resiliency and chaos engineering. As a fellow techy at BMW, I had the opportunity to work closely with Adrian on several key projects. His deep understanding of resiliency best practices and chaos engineering was instrumental in scaling our chaos experimentation initiatives."

  • "I’ve had the privilege of knowing Adrian Hornsby for many years, and my respect for him has only grown over time — both on a personal and a professional level. Adrian has an exceptional ability to understand the deep interplay between people, teams, and the complex technical problems they are trying to solve. What sets him apart is his remarkable sensitivity and clarity in guiding teams toward the real heart of a problem. He navigates highly complex socio-technical systems with ease and helps organizations focus on what truly matters. Adrian has a rare talent for enabling teams to work more closely together and build systems that are not only reliable, but resilient by design. Anyone who has the chance to collaborate with him will benefit from his insight, his experience, and his genuinely thoughtful way of working."

  • "Adrian brought a blend of deep expertise and practical insight to our team. He didn’t just teach resilience patterns, he challenged our engineers to think differently about operational excellence and how to design systems with failure in mind. he was engaging, thought-provoking, and left the team with actionable ways to improve how we build and operate software.”

Trusted by industry leaders


Our team

Adrian Hornsby spent 9 years at AWS, including 4 years as Principal Engineer on the AWS Fault Injection Service team, helping build resilience for the world's largest systems. Now he helps software engineering organizations diagnose why their resilience programs aren't working—and fix them.


FAQs

  • Resilience engineering is the discipline of designing systems that can withstand, recover from, and adapt to unexpected disruptions. Unlike traditional approaches that focus on preventing failures, resilience engineering acknowledges that failures are inevitable in complex systems and focuses on building the capability to respond effectively when they occur.

    Click here to read a detailed post on that topic.

  • Chaos engineering is the practice of deliberately introducing controlled failures into systems to test their resilience. Think of it as a scientific approach to building confidence in your systems' ability to withstand unexpected conditions.

    Rather than randomly breaking things, chaos engineering involves careful experimentation.

    This proactive approach allows organizations to discover and fix hidden vulnerabilities during planned exercises rather than during actual outages. Common experiments include simulating server failures, network latency, resource exhaustion, or dependency outages.

    Chaos engineering has evolved into a sophisticated discipline practiced by forward-thinking organizations across industries. It represents a fundamental shift from hoping systems will work during disruption to verifying their resilience through evidence-based testing.

  • Resilience engineering delivers multifaceted value by protecting revenue through preventing costly outages, creating competitive advantage by maintaining service when competitors cannot, enabling faster innovation without sacrificing stability, systematically managing complex socio-technical risks, building adaptive capacity to handle unexpected challenges, and balancing efficiency with effectiveness. Beyond merely preventing failures, resilience engineering transforms your organization into one that learns from challenges and grows stronger through adversity, turning potential threats into opportunities for improvement that strengthen your position in increasingly unpredictable business environments.

    Click here to read a detailed post on that topic.

  • Unlike traditional IT consulting that often focuses on specific technologies or isolated improvements, our approach addresses the socio-technical aspects of resilience. We combine technical expertise with organizational and cultural considerations to build sustainable capabilities that evolve with your business. We don't just implement tools; we help transform how your organization thinks about and responds to disruption.

  • While we have extensive experience with technology organizations, our resilience principles apply to any business that relies on complex systems. We've worked with clients across industries including finance, healthcare, travel, retail, and manufacturing. The common factor is organizations that depend on resilient digital systems to serve their customers.

  • Most of our partnerships begin with a 6-month commitment. Building resilience capabilities is not a quick fix but a transformational journey. Some clients continue the partnership beyond the initial period at varying levels of intensity as their needs evolve. The pace of engagement determines how quickly we can implement key improvements.

  • While our partnership offers the most comprehensive value, we understand that organizations have different needs and constraints. We offer focused training programs and targeted assessments that can serve as entry points. However, we find that sustainable resilience requires the holistic approach of a partnership.

  • Absolutely. Our approach is technology-agnostic and complements your existing investments. We'll help you maximize the resilience capabilities of your current tools while identifying any critical gaps. We don't require you to replace technologies unless they fundamentally limit your resilience capabilities.

  • The return varies based on your current state and industry, but our clients typically see significant value in three areas: reduced costs from outages (which average €100,000-300,000 per hour in many industries), improved operational efficiency, and enhanced competitive advantage. Many clients find that preventing just one major incident pays for their entire resilience investment.

  • Organizations that benefit most from our services recognize that resilience is a strategic advantage rather than just an IT concern. Readiness indicators include executive support for resilience initiatives, recent incidents that highlighted issues, growth that's testing system limits, or regulatory requirements demanding improved resilience. We can help you understand your readiness through an initial conversation.

  • We promise that our engagement will identify specific, actionable improvements to your resilience posture, with clear implementation paths. While no consultant can guarantee the elimination of all failures, we commit to measurable improvements in your ability to anticipate, respond to, and learn from disruptions.

  • You'll begin seeing insights from our assessment within the first few weeks. The timeline for meaningful improvements depends on both your chosen engagement pace and your organization's readiness to implement changes.

    With strong internal commitment and active participation, initial improvements can begin within the first month. Organizations that dedicate resources, empower decision-making, and embrace recommendations typically see significant results 2-4 times faster than those with implementation constraints or competing priorities.

    The pace of your chosen engagement option provides a baseline expectation but your team's involvement and willingness to implement change is the most critical factor in determining how quickly we can transform resilience capabilities.

  • The best first step is a conversation to understand your current challenges and objectives. Contact us to schedule an initial discussion, and we'll explore whether our services align with your needs. There's no obligation, and this conversation alone often provides valuable perspectives on your resilience opportunities.


Ready to stop fighting fires?

The best first step is a conversation to understand your current challenges and resilience goals. We'll explore whether our approach aligns with your needs and discuss which step in the journey makes sense for your organization.

There's no obligation, and this conversation alone often provides valuable perspectives on your resilience opportunities.

Your information remains confidential, and we'll respond promptly.

Resilium Labs Oy
+358 (0)504361615
adhorn@resiliumlabs.com

Book a diagnostic call