Blameless Summit 2019

San Francisco • Oct 24, 2019

Mind meld with SRE thought leaders and shape the future of software reliability

Register Now

The Blameless Summit is a reliability conference bringing together engineers, product teams, customer success teams, and executives focused on the most important feature of any product - reliability. It's a day full of thought leadership, deep-dives, best practices, networking, and fun!

Summit Starts In

37 Days
05 Hours
05 Minutes
05 Seconds

Featured speakers

Dave Rensin

Dave Rensin

Senior Director of Engineering, Google

Dave Rensin is an engineering Director in Google SRE. He has touched various eclectic parts of Google's internal systems (including the on-hold system, whose music he wrote), and he founded and leads Customer Reliability Engineering -- the SRE team that teaches that world SRE principles and practice.

Lauren Rubin

Lauren Rubin

Sr Software Engineer, Site Reliability, GitHub

Lauren Rubin has spent over 20 years participating in Incident Response at every level in organizations of every size, including EMS work. She believes even the smallest organizations can benefit from learning early, and often what operational surprises can teach us. Currently, Lauren is helping champion SRE culture at GitHub.

Paul Osman, Senior Engineering Manager, Under Armour

Paul Osman

Senior Engineering Manager, Under Armour

Paul Osman is the Senior Manager for Site Reliability Engineering at Under Armour where he and his team support the fitness tracking applications MapMyFitness, Endomondo, and MyFitnessPal. He has 15+ years of experience as a software engineer, focusing mostly on microservices, reliability, and DevOps practices. He is an advocate for chaos engineering and building just cultures that prioritize safety and resiliency. Prior to joining Under Armour, he helped build platform teams at PagerDuty, SoundCloud and 500px.

Jed Needle, Site Reliability Engineer, Procore

Jed Needle

Site Reliability Engineer, Procore

Jed Needle was born and raised in New York. He now calls California his home and currently works at Procore in their Carpinteria headquarters as tech-lead/manager in Site Reliability. He discovered Unix/Linux by 'accident' while working in the biological sciences field and since then has worked mostly as a Linux generalist. Since then, Jed has worked at a few start-ups as well as some large enterprise companies. Today, Jed is helping establish standard SRE best practices across Procore's R&D organization.

Jonathan Solórzano-Hamilton, Senior Manager, Site Reliability Engineering, Procore

Jonathan Solórzano-Hamilton

Senior Manager, Site Reliability Engineering, Procore

Jonathan started his career as a physicist, but turned to technology. Prior to coming to Procore in 2018 he worked in the field for over 20 years at such employers as Hewlett-Packard, Stanford University, and UCLA. He alternated between operations and development roles until finding his calling in DevOps. Jonathan is the Senior Manager for Site Reliability Engineering at Procore and leads the effort to scale SRE practices across the engineering organization.

Learn from experts at top global brands

  • Google
  • GitHub
  • Under Armour
  • Procore

Agenda

Schedule is subject to change

    Time

    Session

  • 08:30 AM

    Registration / Breakfast

  • 09:30 AM
    Presentation

    Reliability: It's Personal

    Ashar Rizqi & Lyon Wong, CEO (Co-founder) & COO (Co-founder), Blameless

  • 10:00 AM

    Break

  • 10:30 AM
    Keynote

    How SRE is Transforming Traditional IT

    Is it possible to apply SRE practices to areas outside of Engineering teams, and if so, what do the benefits look like? What are examples or instances of this happening? In this talk, Dave will delve into how SRE can be a pillar to departments and disciplines outside of Engineering.

    Dave Rensin, Senior Director of Engineering, Google

  • 11:30 AM
    Presentation

    Improving Postmortems: From Chores to Masterclasses

    Postmortems are an essential tool for learning from production incidents. Unfortunately, it's common for them to become laborious chores for engineers and incident response teams. Whether it's because they feel like theatre or blame accidentally sets in, this can have an adverse effect on the resiliency of your organization. An effective postmortem culture doesn't just happen, it has to be an intentional and ongoing effort at multiple levels of an organization.

    In this talk, I'll walk through some of our efforts at Under Armour to improve our postmortem culture - I'll discuss some of the successes we've had, lessons we've learned along the way and areas we're excited to focus on next.

    Paul Osman, Senior Engineering Manager, Under Armour

  • 12:00 PM

    Lunch

  • 01:30 PM
    Presentation

    Human Cloning is off the table, or is it? How Procore Scales SRE

    Procore engineering has more than tripled its engineering headcount to over 300 in the past five years. The complexity of our tech stack exploded from a monolithic Rails application to a dynamic service mesh. We progressed from a single Postgres database executing 50,000 transactions/second to a variety of additional stores including Redis, Elasticsearch, and DynamoDB. This sophistication improved performance at the expense of complexity and stability.

    During the same period of time, we pivoted from private to public cloud infrastructure. Our single SRE squad could no longer keep up with the changerate. We became concerned that a human cloning side project of our most productive colleagues had begun in some SRE garage, so we decided to try a new approach.

    Learn how Procore shifted to a “zone defense” strategy by embedding SRE squads throughout the engineering organization. We will share how this change helped us rapidly advance the application stack, become more nimble, simplified our onboarding process for new hires, and reduced both the frequency and severity of incidents. As a bonus, we were able to maintain sanity and prevent unsanctioned medical experiments.

    Jonathan Solórzano-Hamilton & Jed Needle, Senior Manager, Site Reliability Engineering & Site Reliability Engineer, Procore

  • 02:00 PM
    Keynote

    Putting the Ops in Dev: How to bring operational experience to your development

    In the SRE/DevOps revolution, there has been a lot of discussion about pushing operational concerns to developers. Join me for a session on what developers can learn from operations veterans, and how they can grow their own operational outlook when making product decisions. Whether you have cultivated your own operational outlook through years of experience, or are just getting started submitting code, we will look at  how and why operations considerations apply to projects of even the smallest size, and offer suggestions for how to help the newer members of our organizations to harness the knowledge gained from past mistakes.

    Lauren Rubin, Sr Software Engineer, Site Reliability, GitHub

  • 03:00 PM

    Happy Hour

Venue

Verdi Club

2424 mariposa st
san francisco
ca 94110
learn about the venue
verdi club

Register Today for Your Complimentary Ticket