Summary: Enhance IT operations through AI-driven automation, monitoring, and incident resolution.
Responsibilities:
- Implement AI tools for real-time system monitoring and anomaly detection.
- Automate incident response and root cause analysis using ML models.
- Collaborate with DevOps/MLOps teams to optimize infrastructure.
- Manage AIOps platforms (e.g., Splunk, Moogsoft).
Skills: - Proficiency in Python, TensorFlow, or PyTorch.
- Experience with cloud platforms (AWS/GCP/Azure) and Kubernetes.
- Knowledge of ITIL processes and AI-driven analytics.
Key Process: Enhancing IT Operations with AI-Driven Automation
- Inputs: System logs, performance metrics, incident tickets.
- Activities:
- Develop ML models for anomaly detection.
- Automate root cause analysis and remediation.
- Integrate AIOps tools with existing IT infrastructure.
- Outputs: Real-time dashboards, reduced downtime, automated incident resolution.
- Stakeholders: DevOps teams, IT support, CTO.
- Tools: Splunk, Moogsoft, Python, Kubernetes.