Summary: Maintain and optimize AI platform infrastructure.
Responsibilities:
- Monitor platform health, performance, and security.
- Troubleshoot issues and manage user access controls.
- Coordinate upgrades and backups.
Skills: - Experience with Kubernetes, Linux, and cloud administration.
Key Process: AI Platform Maintenance & Support
- Inputs: User access requests, system alerts, upgrade schedules.
- Activities:
- Monitor platform health and security.
- Troubleshoot performance bottlenecks.
- Manage backups and disaster recovery.
- Outputs: SLA reports, system uptime metrics, access logs.
- Stakeholders: Platform users, cybersecurity, IT teams.
- Tools: Kubernetes, Prometheus, Grafana.