Create a well-informed cloud strategy and regularly evaluate cloud applications, hardware, and software.
Responsible for generating scripts and templates required for the automatic provisioning of resources in Public Clouds infrastructure.
Performing internal security audits of the infrastructure, working on customer security team findings and acting against real time security alerts.
Monitor SaaS infrastructure availability and performance 24x7x365.
Ensure the highest uptime for customers in our SaaS environment.
Act in real time for Production environment issues.
Adhere to the incident escalation procedure based on the severity.
Responsible for debugging technical issues inside a complex stack involving virtualization, containers, microservices, etc.
Test and Deploy the New Product releases to the Staging and Production Environments.
Work with product, service and application owners to integrate and automate processes to reduce repetitive tasks.
Review and recommend improvements to operational processes and procedures utilizing automation.
Create comprehensive list of automated performance alerts so that timely actions can be taken.
Collaborate with the engineering teams to enable their applications to run on Cloud infrastructure.
Maintain good design and verbatim of the infrastructure for the other members of the Cloud Operations team.
Maintain compliance with Security and Governance standards.
Possess technical understanding of DR, BCP strategies, meeting and aligning with defined SLA’s, preferably with practical implementation experience.
Managing backup and restore practices with periodic testing and reporting to meet agreed upon RPO / RTO.
Participate in incident response and related table top exercises, contribute in runbooks, maintain technical documentation, architecture references, change management and reporting.
Have awareness of emerging vulnerabilities, cyber threats and respond quickly to mitigate risks. Maintain and improve processes around regular patching, system hardening, malware controls, encryption etc.
Experience with AWS Services such as but not limited to CloudFront, EC2 Container Registry, ELB, KMS, Kinesis, Lambda, Redshift, Route 53, SES, EQS, SNS, S3, Glacier, Athena etc.
Technical Skills Required:
Sound knowledge of Kubernetes and commonly-used tools in the Kubernetes ecosystem, e.g. Helm, Calico, Flannel.
Sound knowledge of Amazon Web Services, working with services and configuring for availability and security.
Experience with networking both on an individual server as well as virtual configuration in AWS VPCs.
Knowledge of Docker for the purposes of running containers in production environments.
Experience building, deploying and supporting infrastructure for web-scale applications.
Practitioners of infrastructure-as-code, continuous integration and deployment (CI/CD)
Experience of tools like terraform, Salt Stack, Packer, Jenkins
Utilizes AWS DevOps tooling or Azure DevOps and Git for Source control and versioning
The candidate should be aware of industry best-practices, have awareness of formal accreditations such as SOC-II, and be able to make recommendations for practices and policies.
Qualification Required:
BE/B.Tech/MCA/MSc. from premier institute and with 8-15 years of industrial exposure.
Knowledge of one or more tools of the following is an added advantage for this profile: Riemann, OpenTSDB, Grafana, Elasticsearch, Logstash, Kibana, Beats, Tensor, StatsD, Jenkins, Phabricator, Vault, Consul.
Knowledge of other container orchestration platforms.
Certification: Should have relevant cloud certifications like AWS Cloud Practitioner or AWS Certified Solution Architect.