Data Engineering: ETL/ELT pipeline design; ingestion & normalization of semi-structured CSVs; workflow orchestration with Airflow; GCS → BigQuery data loads; data modelling (star schema / dimensional modelling); dbt-style testing (schema + data quality checks); Docker/Compose; CI/CD basics (GitHub Actions); logging & validation.
Other: Machine learning for genomics (foundations and prototyping); documentation best practices; familiarity with clinical data standards (CDISC: SDTM/ADaM) and SAS (basic).
Soft Skills
Teamwork
Leadership (technical mentoring and task coordination)
Scientific communication (translating complex analyses to non-technical stakeholders)