About the Role
We are seeking a highly skilled DevOps Engineer to design, implement, and manage the scalable and secure infrastructure for the AI Security Academy platform. The role focuses on setting up containerized environments, sandboxed labs, and cloud-based infrastructure for hosting simulation-based and advanced browser-based learning labs. The ideal candidate will ensure high availability, cost efficiency, and robust security for the platform
Requirements
Infrastructure Design and Implementation
Design and manage a scalable, secure, and cost-effective infrastructure using cloud platforms (e.g., AWS, Azure, GCP).
Build and maintain containerized environments using Docker and orchestrate using Kubernetes or ECS/EKS.
Implement sandboxed environments for browser-based IDEs (e.g., JupyterHub, Theia) and advanced lab setups.
Automation and Configuration Management
Automate infrastructure provisioning using tools like Terraform or CloudFormation.
Manage configuration and deployments using Ansible, Chef, or Puppet.
Monitoring and Performance Optimization
Set up monitoring and logging tools (e.g., Prometheus, Grafana, AWS CloudWatch) for system performance and resource utilization.
Optimize compute, storage, and network resources to reduce costs without compromising performance.
Security and Compliance
Ensure the platform adheres to best practices in cloud security and data privacy (e.g., RBAC, MFA, encryption).
Implement network firewalls, access controls, and container isolation to secure sandboxed environments.
Conduct regular vulnerability assessments and implement security patches.
Collaboration and Support
Work closely with development, content, and AI security teams to deploy and maintain sandboxed labs.
Collaborate on infrastructure requirements for advanced AI/ML training tasks (e.g., model training, adversarial dataset generation).
Provide technical support to troubleshoot and resolve infrastructure-related issues.
Continuous Improvement
Research and implement innovative solutions to improve platform reliability, scalability, and cost-efficiency.
Develop CI/CD pipelines to streamline code and lab deployments.
Required Skills and Qualifications and Technical Skills
Cloud Platforms: Proficiency in AWS, Azure, or Google Cloud (with hands-on experience in managing compute, storage, and networking services).
Containerization and Orchestration:Â Strong expertise in Docker, Kubernetes, or ECS/EKS.
Infrastructure as Code (IaC):Â Experience with Terraform, CloudFormation, or similar tools.
Automation:Â Proficiency in tools like Ansible, Chef, or Puppet.
Monitoring and Logging:Â Familiarity with tools like Prometheus, Grafana, CloudWatch, or Elasticsearch.
Networking: Knowledge of firewalls, VPCs, and load balancers for secure infrastructure.
Version Control: Experience with Git and CI/CD tools like Jenkins, GitLab CI, or CircleCI.
AI/ML and Security Knowledge (Preferred but not necessary)
Familiarity with AI/ML frameworks like TensorFlow, PyTorch, and lab tools like JupyterHub.
Understanding of sandbox security and container isolation techniques.
Education and Experience:
Bachelor’s degree in Computer Science, Information Technology, or a related field.
3+ years of experience as a DevOps Engineer, Cloud Engineer, or similar role.
Experience in deploying and maintaining education platforms or sandboxed environments is a plus.
About the Company
At AMDCYBERSEC, we are revolutionizing cybersecurity education, with a specialized focus on AI security. As a trusted name in the cybersecurity domain, we are committed to equipping professionals, researchers, and enthusiasts with the tools, knowledge, and skills to address the rapidly evolving challenges posed by AI, Machine Learning (ML), and large language models (LLMs).