On-Call Onboarding: How to Prepare a New Engineer for Their First Rotation

Q: When is a new engineer ready for primary on-call?

A new engineer is ready when they have completed shadow shifts and participated in resolving at least 3 real incidents, written multiple handover briefs, completed a simulated escalation drill, successfully handled business-hours primary shifts with backup, and can navigate all monitoring tools without guidance.

Q: How do you reduce on-call anxiety for new hires?

Normalize "I don't know yet" as a valid first response, give explicit permission to escalate early, assign an on-call buddy available during first solo shifts, and use structured onboarding (shadow shifts, reverse shadow, checklists) so the engineer feels prepared.

Q: What should an on-call runbook include for onboarding?

A plain-language service description, common failure modes and symptoms, step-by-step troubleshooting for frequent alerts, escalation criteria, links to dashboards and log queries, and known workarounds. If a new engineer cannot follow it without prior knowledge, the runbook needs improvement.

Q: Should new engineers be on-call during their first month?

No. During the first month, new engineers should be learning: reading architecture docs, reviewing postmortems, getting tool access, and observing incident response calls. Shadow rotations can start in month two. Primary on-call in the first month sets engineers up for a bad experience.

Q: What is reverse shadowing?

Reverse shadowing flips the traditional shadow arrangement. The new engineer takes the primary role while the experienced engineer acts as backup. The new engineer responds to alerts and makes decisions with a safety net available. This typically starts during business hours before expanding to full rotations.

Your new hire's first on-call rotation should not be trial by fire.

Picture this: a new engineer, three weeks into the job, gets their first page at 2 AM. They do not know the architecture. They do not know the escalation path. They do not know what "normal" looks like for the system that just fired an alert. They open a dashboard they have never seen, stare at a graph they cannot interpret, and spend 20 minutes trying to figure out who to call before they even begin investigating. By the time they reach someone senior, the incident has escalated, the client is affected, and the new engineer is convinced they are not cut out for this.

This is not a people failure. It is a process failure. The engineer did not lack ability — they lacked preparation. And most teams have no structured plan for getting a new engineer ready for on-call. They have 30-60-90 day plans for shipping code. They have onboarding checklists for tooling access and HR paperwork. But on-call readiness? That is left to "you'll pick it up as you go." This guide fixes that.

2 months

Google SRE's recommended shadow period before primary on-call

Google SRE Workbook

58%

of software engineers experience imposter syndrome

Technical.ly, 2024

58%

more likely to stay 3+ years with structured onboarding

HCI Research

Why on-call onboarding is different from regular onboarding

Regular onboarding teaches a new engineer how to build. Learn the codebase, set up the dev environment, ship small PRs, attend standups, get comfortable with the deployment pipeline. It is structured around creation — writing code, reviewing code, merging code.

On-call onboarding teaches a new engineer how to respond when things break. It is structured around pressure — understanding failure modes, navigating monitoring tools under stress, knowing when to escalate, and communicating with stakeholders while simultaneously investigating an issue they have never seen before.

The stakes are different too. A bad PR gets caught in review. If it somehow ships, it gets reverted. A bad on-call response means downtime, SLA breaches, client impact, and — in the worst case — a new engineer who is so shaken by the experience that they dread every future rotation.

Most companies have a 30-60-90 day plan for code contributions. Almost none have one for on-call readiness. The implicit assumption is that if someone can write code for the system, they can support it in production. That assumption is wrong. Writing a service and debugging it at 2 AM under an active incident are entirely different skills, and the second one requires deliberate preparation.

The 30-60-90 day on-call ramp-up plan

This framework gives new engineers a structured path from "I just started" to "I can handle primary on-call confidently." Adjust the timelines to your team's complexity — a small startup with one service might compress this; a large MSP with 50 client environments might extend it.

Days 1-30: Learn

The first month is about absorbing context — not responding to anything.

Study the architecture. Not just the happy path. Learn which services depend on each other, where the single points of failure are, and which components have historically been the most fragile.
Read past incident postmortems. Every team has a graveyard of past incidents. Reading 10-15 postmortems teaches the new engineer more about what actually breaks than any architecture diagram.
Review runbooks. For every service they will support, the new engineer should read the runbook cover to cover. If runbooks do not exist, that is a problem worth flagging now — before they are on-call without one.
Get access to all monitoring tools. Dashboards, alerting platforms, PSA systems, log aggregators. The new engineer should be able to log in and navigate each tool before they ever need to use one under pressure.
Read past handover briefs. If your team uses structured handovers(and after reading this guide, they should), two weeks of past briefs will teach the new engineer what "normal" looks like — common incidents, typical difficulty levels, system patterns, communication norms. This is one of the most underrated onboarding resources. More on this below.
Sit in on incident response calls as an observer. Do not ask them to contribute yet. Just let them watch how the team triages, investigates, communicates, and resolves. Observation builds pattern recognition.

Days 30-60: Shadow

The second month transitions from observation to participation — with a safety net.

Join the rotation as secondary/shadow. The new engineer receives pages alongside the primary engineer but does not respond alone. They see every alert in real time without the pressure of being the sole responder.
Follow the primary engineer's workflow. How do they triage an alert? What do they check first? How do they decide whether to escalate? The shadow shift is an apprenticeship — learning by watching someone experienced work through real incidents.
Write practice handover briefs. At the end of each shadow shift, the new engineer writes a handover brief as if they were the primary. The actual primary reviews it and gives feedback. This builds the muscle memory of documentation before the stakes are real.
Debrief with the primary after each shift. What went well? What was confusing? What would you have done differently? These 15-minute conversations are where the real learning happens.

Days 60-90: Reverse shadow

The third month flips the dynamic. The new engineer takes the lead; the experienced engineer becomes the safety net.

Take primary on-call during business hours. Start with daytime shifts where the backup engineer is awake, available, and likely in the same office or Slack channel. This removes the isolation of a solo night shift while giving the new engineer real responsibility.
Handle real pages with backup available. The new engineer responds to alerts, investigates, and makes decisions — but knows they can reach the backup within minutes if they get stuck. This builds confidence gradually.
Graduate to full rotation after successful business-hours shifts. Once the new engineer has handled a few real incidents during business hours without needing to escalate everything, they are ready for the full rotation including nights and weekends.
Run at least one simulated escalation before going solo.Walk through a scenario: "The database is down, the primary fix did not work, and you need to escalate. Who do you call? What information do you provide? What is the SLA deadline?" Rehearsing this once removes the panic of doing it for real.

Google's SRE teams require 2 months of shadowing before an engineer takes primary on-call. That is documented in the Google SRE Workbook. If Google — with some of the best engineers in the world and some of the most mature operational processes — thinks it takes that long, your team probably should not be doing it faster.

The on-call onboarding checklist

Use this as a living document. Print it, paste it into your wiki, or add it to your team's onboarding ticket. Each item should be checked off by the new engineer and verified by their manager or on-call buddy.

Before first shadow shift

Access to alerting tools (PagerDuty, Better Stack, Opsgenie, etc.) — can log in and see active alerts
Access to monitoring dashboards — knows where to find key metrics for each service
Read all runbooks for services they will support
Read the last 10 handover briefs (or past incident summaries if briefs do not exist yet)
Know the escalation path: who to call, when, and for what severity levels
Understand SLA commitments — especially important for MSPs with client-specific SLAs
Read at least 5 past incident postmortems
Completed a walkthrough of the architecture with a senior engineer

Before first primary shift

Successfully participated in resolving at least 3 incidents during shadow shifts
Written at least 3 handover briefs (even during shadow shifts) and received feedback
Completed a simulated escalation drill — walked through the full escalation process end to end
Can navigate all monitoring dashboards without guidance
Knows how to create a ticket in the PSA (ConnectWise, Autotask, HaloPSA) if applicable
Has identified and been introduced to their on-call buddy
Understands how to write and submit a shift handover brief
Has completed at least 2 successful business-hours primary shifts with backup available

After first solo shift

Debrief with manager or buddy — what went well, what was difficult, what was unclear
Review the handover brief they wrote — was it complete? Did the incoming engineer have questions?
Identify gaps in runbooks or documentation discovered during the shift
Update runbooks with anything they had to figure out on their own
Rate their confidence level — are they ready for another solo shift, or do they need more shadow time?

The buddy system: why it works

Pair every new on-call engineer with an experienced buddy. Not their manager — a peer. Someone who has been in the rotation long enough to know the common failure modes, the unwritten escalation rules, and the shortcuts that are not in any runbook.

The buddy serves a specific role during the new engineer's first 2-3 solo shifts: they are available on a separate channel (Slack DM, phone, whatever works) as a lifeline. The new engineer knows that if they get stuck, they can reach someone who will answer — not the formal escalation path, not a manager who might judge them, but a peer who has been through the same learning curve.

Research backs this up. HCI research found that 87% of organizations say buddy programs accelerate new hire proficiency. The buddy does not just answer questions — they review handover briefs, give feedback on incident responses, and provide the psychological safety that new engineers need to ask "dumb" questions without fear.

The buddy relationship also benefits the experienced engineer. Teaching forces you to articulate knowledge that has become intuitive. Reviewing someone else's handover brief reveals gaps in your own documentation. And mentoring a colleague through their first on-call shifts is one of the most impactful things a senior engineer can do for team resilience.

Handling on-call anxiety

On-call anxiety is real and it is normal. Even experienced engineers feel it before a high-stakes rotation. For a new engineer who has never been paged at 2 AM, the anxiety can be overwhelming — especially when combined with the imposter syndrome that 58% of software engineers already experience.

Here is how to address it as a manager or team lead:

Normalize "I don't know yet."Explicitly tell new engineers that "I don't know yet, I'm investigating" is a completely valid first response to any page. The pressure to immediately have an answer is self-imposed — and destructive. Give them permission to take a breath, read the alert, check the dashboard, and then respond.

Give explicit permission to escalate early and often.New engineers are terrified of escalating because they think it signals incompetence. Flip that narrative. Tell them directly: "During your first month of primary on-call, I would rather you escalate 10 things that did not need escalation than miss 1 thing that did. You will never be criticized for escalating too early."

Track difficulty ratings over time.If a new engineer's shifts are consistently rated 4-5 on a difficulty scale, the team should redistribute the load. It is not the new engineer's fault that they drew three P1 incidents in their first week — but it is the team's responsibility to make sure that bad luck does not become a traumatic introduction to on-call.

Never blame someone for a bad response when you gave them no preparation.This is the most important point. If a new engineer fumbles their first on-call incident, the correct response is not "you should have known to check the runbook" — it is "we should have made sure you knew the runbook existed." The worst thing a team can do is throw someone into on-call with no preparation and then blame them when something goes wrong.

How handover briefs accelerate onboarding

Here is something most teams do not realise: if you have been writing structured handover briefs, you already have an on-call onboarding corpus. You just have not used it that way yet.

Past handover briefs are onboarding documentation that writes itself. A new engineer who reads two weeks of handover history learns:

Common incidents: What alerts fire most often? What services are the noisiest? What is the typical resolution for a Redis memory warning vs. a database connection timeout?
Typical difficulty: Are most shifts quiet (1-2 difficulty) or consistently intense (4-5)? This sets realistic expectations.
System patterns: Does traffic spike at 9 AM? Do backup jobs cause CPU alerts every Tuesday at midnight? Knowing the patterns prevents false alarms.
Team communication norms: How much detail do engineers include? How do they describe incidents? What level of formality is expected?

This is far more valuable than a static wiki page that was written six months ago and never updated. Handover briefs are a living record of what actually happens during on-call shifts — updated every single shift by the engineers who are doing the work.

Shiftctlstores every handover brief in a searchable, structured history. When a new engineer joins the rotation, they can read through past briefs filtered by service, severity, or engineer — building context in hours instead of weeks. It is the difference between handing someone a dusty manual and giving them access to the team's collective memory.

Build on-call confidence from day one

Shiftctl's structured handover history doubles as an onboarding tool — new engineers can read past briefs to learn what breaks, how the team responds, and what "normal" looks like. Combined with shift difficulty tracking and buddy-reviewed sign-offs, it gives new hires a clear path from shadow to solo. Free for 2 users. No credit card required.

Get started free Read the docs

Frequently asked questions

How long should on-call onboarding take?

Plan for 60-90 days from first day to full primary on-call. The first 30 days are learning (architecture, runbooks, postmortems), days 30-60 are shadow shifts alongside an experienced engineer, and days 60-90 are reverse shadow shifts where the new engineer takes the lead with backup available. Google SRE recommends a minimum of 2 months of shadowing before primary on-call.

What is a shadow on-call rotation?

A shadow rotation means the new engineer is added to the on-call schedule as a secondary responder. They receive the same pages as the primary engineer but are not expected to respond alone. The purpose is to expose them to real incidents, real alerts, and real workflows without the pressure of being the sole responder. They observe, learn, and practice writing handover briefs.

When is a new engineer ready for primary on-call?

A new engineer is ready for primary on-call when they have: completed shadow shifts and participated in resolving at least 3 real incidents, written multiple handover briefs that were reviewed and approved, completed a simulated escalation drill, successfully handled business-hours primary shifts with backup available, and can navigate all monitoring and alerting tools without guidance. Readiness is a checklist, not a gut feeling.

How do you reduce on-call anxiety for new hires?

Four things: (1) normalize "I don't know yet, I'm investigating" as a valid first response, (2) give explicit permission to escalate early and often during initial rotations, (3) assign an on-call buddy who is available as a lifeline during the first 2-3 solo shifts, and (4) use structured onboarding (shadow shifts, reverse shadow, checklists) so the engineer feels prepared rather than thrown into the deep end.

What should an on-call runbook include for onboarding?

For onboarding purposes, runbooks should include: a plain-language description of what the service does, common failure modes and their symptoms, step-by-step troubleshooting procedures for the most frequent alerts, escalation criteria (when to escalate, to whom, and what information to provide), links to relevant dashboards and log queries, and known workarounds for recurring issues. If a new engineer cannot follow the runbook without prior knowledge of the system, the runbook needs improvement.

Should new engineers be on-call during their first month?

No — not as primary responders. During the first month, new engineers should be learning: reading architecture docs, reviewing postmortems, getting tool access, and sitting in on incident response calls as observers. They can be added to shadow rotations starting in month two. Putting a new engineer on primary on-call during their first month is setting them up for a bad experience that will shape their attitude toward on-call for years.

What is reverse shadowing?

Reverse shadowing (or reverse shadow) flips the traditional shadow arrangement. Instead of the new engineer watching the experienced engineer handle pages, the new engineer takes the primary role while the experienced engineer acts as backup. The new engineer responds to alerts, investigates, and makes decisions — but has a safety net available if they get stuck. This typically happens during business hours first, then expands to full rotations as confidence builds.