Nomba is a leading payments company with a mission to revolutionise the way businesses manage their financial transactions and affairs. We provide innovative, secure, and user-friendly solutions that enable businesses to streamline their payment processes, optimise their financial operations, and grow their businesses with confidence.
We are recruiting to fill the position below:
Job Title: Site Reliability Engineer
Location: Lekki, Lagos
Employment Type: Full-time
Job Summary
- As a Site Reliability Engineer at Nomba, you will play a crucial role in bridging the gap between software development and IT operations, with a strong focus on solving IT operations problems using software engineering.
- You will be responsible for designing, implementing, and maintaining the infrastructure, tools, and processes required to support our development and deployment pipelines.
- You will react in real time to production incidents and work to contain and resolve them as quickly as possible.
- Your expertise in automation, cloud technologies, and continuous integration/continuous deployment (CI/CD) will ensure that our software is delivered efficiently, reliably, and at scale.
About the Role
- Implement and maintain highly available, scalable, and secure production systems, emphasising automation and Infrastructure as Code (IaC) principles.
- Collaborate with software development teams to influence the architecture and design of applications for better scalability, reliability, and performance.
- Develop and maintain monitoring, alerting, and logging solutions to proactively detect and resolve system issues.
- Respond to incidents and outages, conducting root cause analysis, and implementing preventative measures to minimize future occurrences.
- Participate in on-call rotations and provide timely response to critical incidents.
- Continuously improve system performance through performance tuning, capacity planning, and load testing.
- Implement security best practices, ensuring that systems are compliant with industry standards and regulations.
- Automate routine operational tasks using scripting and programming languages.
- Work with cross-functional teams to define and document operational procedures and runbooks.
- Contribute to the improvement of the CI/CD pipelines to ensure seamless deployments.
- Keep abreast of industry trends, emerging technologies, and best practices in SRE and cloud infrastructure management.
About You
- Bachelor’s Degree in Computer Science, Engineering, or a related field (or equivalent practical experience).
- Proven experience as a Site Reliability Engineer, DevOps Engineer, or a similar role managing large-scale, highly available production systems.
- Solid experience with cloud platforms (e.g., AWS, Azure, GCP), including proficiency in provisioning and managing resources.
- Strong understanding of Linux/Unix systems and command-line utilities.
- Proficiency in at least one programming or scripting language (e.g., Python, Ruby, Bash, PowerShell).
- Experience with Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
- Knowledge of containerization and orchestration technologies (e.g., Docker, Kubernetes).
- Familiarity with monitoring tools and concepts (e.g., Prometheus, Grafana, ELK stack).
- Understanding of networking protocols, load balancing, and firewalls.
- Strong problem-solving and troubleshooting skills, with a focus on root cause analysis.
- Excellent communication and collaboration skills to work effectively with cross-functional teams.
Nice to Have:
- Relevant certifications in SRE, DevOps, or cloud technologies.
- Experience with databases and data management (e.g., SQL, NoSQL, caching systems).
- Knowledge of configuration management tools (e.g., Ansible, Puppet, Chef).
- Understanding of Agile methodologies and experience in Agile/Scrum environments.
- Familiarity with security practices and compliance frameworks.
Method of Application
Interested and qualified candidates should:
Click here to apply online
Leave a Reply