Sr. Service Reliability Engineer (Cloud)

Role is based in Dubai UAE (Not a remote role)


SSRE (Cloud):-


Service Reliability Engineering (SRE) Unit is responsible for the availability, performance, monitoring and maintainability of services, applications and technology across the Group. This unit will ensure that customer-facing and core business applications are reliable, always available with appropriate uptimes as per defined service levels. SRE unit acts as a conduit between Development, Platform and IT Operations team to proactively identify technical debts, report non-conformance to design standards and drive remediation actions with Platform team. This ensures highly scalable and reliable systems in Production to upkeep the ‘Always Available Bank’ objective.


Senior Service Reliability Engineers (SSREs) advocate and augment the reliability engineering principles, guidelines and standards. SRE’s partner with Product Owners, Platform and Engineering Teams to drive the Availability, Reliability, Scalability, Usability, Recoverability of application services and technologies in the production environment. They combine engineering and development experience and an innate drive to improve existing and new systems and processes. They collaborate with Development, Platform, Operations team to build and run scalable, sustainable production services which can advance and adapt to evolving business needs.



Candidate should have knowledge in :-

- Responsible for overall health, High Availability, service resilience, Capacity and performance of customer facing services and business platforms in cloud infrastructure

- Ability to maintain SRE guardrails and achieve Highly scalable, Highly Available and resilient complex application ecosystem.

- Working experience in Public Cloud Platform like Azure, AWS, K8S/Openshift Container Platform and Hybrid Cloud.

- Assisted in roll-out, deployment and maintenance of microservices in large scale container platform.

- Integration experience with APM tools, Logging tools and DevOps Automation tools to support proactive detection of issues and closed feedback loop to safeguard availability of business critical applications deployed in Cloud platform.

- Proactive mindset to detect configuration drift, design deviations and identify technical debts in the platform which can lead to catastrophe.

- Hands-on and automation experience in chaos testing.

- Work closely with development team, architecture team on technical debts and design deviations.

- Ability to function in fast-paced and dynamic environment.

- Participate in a 24x7 rotation for any critical ecosystem/platform level changes

Post date: 13 January 2025
Publisher: LinkedIn
Post date: 13 January 2025
Publisher: LinkedIn