Senior Site Reliability Engineer

⏸ Applications are temporarily paused for this position
Apptoza Inc. Toronto, Ontario, CA

Published 2026-04-27

Description

Job Title: Senior Platform Engineer / Senior SRE Developer – Observability (Dynatrace)

Location: Toronto, ON

Work Style: Hybrid (2 days per week in-person at Toronto office preferred



Skills: Digital : Python~Digital : Node.js~Analytics Platform System (APS)

Experience Required: 8-10



Role Descriptions:

SRE Lead

• Deep application and system level knowledge across complex, end to end environments, including tightly integrated on prem and cloud native services supporting large scale, multi tier transaction flows

• Prior hands on experience with APM and observability platforms (Dynatrace or comparable enterprise tools), with the ability to instrument, analyze, and troubleshoot complex distributed applications

• Proven expertise in deep troubleshooting across multi layer, end to end (E2 E) environments, including application, infrastructure, network, and platform layers (on prem and cloud)

• Drive and execute the SRE / WCCS roadmap for BMO

• Hands on role from Day 1

• Strong observability experience (refer to Observability SME expectations below)

• Deep knowledge and experience implementing SRE practices and guiding complex SRE transformations across the industry



Key Contributions:

• Assess current SRE capabilities, identify gaps, and contribute to the SRE & WCCS roadmap

• Navigate and collaborate across multi team SRE and IT Operations environments to drive results

• Deliver creative workarounds and practical solutions to complex problems

________________________________________



SRE – Observability SME

• Hands on role from Day 1

• Strong Day 1 Dynatrace expertise, including:

o DQL

o Gen3 Dashboards

o Traces / Grail

o Active Gate and Plugins

o SRG / Workflow development

o Biz Events

• Prior hands on experience with APM and observability platforms (Dynatrace or equivalent), with the ability to instrument, analyze, and troubleshoot distributed applications

• Deep troubleshooting expertise using observability signals (Metrics, Events, Logs, Traces) to identify root causes across complex, multi layer E2 E environments

• Strong foundation in Observability fundamentals (MELT)

• Expert level dashboard design, including UI/UX best practices

• Extensive experience troubleshooting performance and non functional issues

• Familiarity with SRE concepts as outlined in the Google SRE book/workbook

• Strong expertise in AWS Observability, including:

o Cloud Watch

o Application Signals

o Metrics, Logs, and Traces

o Lambda and API Gateway

• Ability to design creative monitoring solutions for platforms with limited observability (e.g., IBM Data Power)

• Development experience with Python, AWS Lambda, ECS, and Azure Functions

• Understanding of AI based system fundamentals, including how such systems are built and monitored

• Background or working knowledge of Open Telemetry (OTEL)

• Experience in Financial Services or equivalent highly complex environments (e.g., 50+ systems collaborating to fulfill a single customer transaction)

Location

Toronto
Ontario
Canada
Advertisement:



Attributes

Job type Full time
Contract type Permanent
Salary type Monthly
Occupation Senior site reliability engineer
Send resume
Apptoza Inc.
Apptoza Inc.
1231 active jobs
Registered 2023-07-13
Canada
All vacancies from employers (1231) Report vacancy
Send resume
Are you looking for a job? Publish your resume
Non-logged user
Hello wave
Welcome! Sign in or register