Summary: Build robust data pipelines for AI/ML initiatives.
Responsibilities:
- Develop ETL/ELT processes for structured/unstructured data.
- Manage data lakes/warehouses (Snowflake, Databricks).
- Ensure data quality and accessibility for model training.
Skills: - Expertise in Spark, Kafka, SQL, and dbt.
Key Process: Data Pipeline Engineering for AI
- Inputs: Raw data (structured/unstructured), storage requirements.
- Activities:
- Build scalable ETL pipelines.
- Clean and preprocess data for model training.
- Manage data versioning and lineage.
- Outputs: Processed datasets, data catalogs, pipeline logs.
- Stakeholders: Data scientists, analysts, AI engineers.
- Tools: Apache Spark, Snowflake, dbt.