Data Science Program

RESEARCH

RESEARCH // WHAT WE DO // Data Science Program

Data Science Program

Focuses on maximizing the power of data for population and health research in Africa through the creation of platforms for Africa-led data sharing, data custody, and application of state-of-the-art big data analytics and artificial intelligence to foster advances in health and wellbeing in Africa.

OVERVIEW

Our work in this area leverages advances in platform development to create robust data systems that ensure data are shared, governed, and analyzed with novel methods. The Data Science program leverages internally and externally generated “big data” to explore patterns and predictions using data science, artificial intelligence tools, and modeling approaches to inform population health.

Units and working groups 

  • Data Platforms and Systems. The team focuses on creating platforms and systems that support the data value chain. Current and planned platforms include;
    • Data Science and Sharing Platform (DASSA). This is a data-sharing platform with interfaces that support stories on data sharing, information on legal policies and frameworks for data protection in various African countries, provides modules for data sharing, and collates data from various sources – including internally generated research datasets at the center.
    • No-Code Machine Learning Platform. This supports codeless machine learning algorithms easily deployed by researchers who are not necessarily data science professionals. The graphical user face allows users to use research datasets uploaded to the platform, and real-time predictive analytics are generated with accompanying interpretations for the user. 
  • Data Governance. The data governance team works closely with the data synergy team to develop a data governance framework for APHRC. Additional work includes creating a data governance curriculum on data governance, data anonymization, privacy-preserving technologies, and responsible data use. The Data Science team collaborates with the RRCS to deliver the proposed training.
  • Data Harmonization and FAIR. The team of data documentationists and data scientists create data pipelines for various use cases and support on-premise and cloud-based data analysis through a federated approach. The team uses the Observational Medical Outcomes Partnership (OMOP)  Common Data Model – a standardized data model for health data with internationally recognized vocabularies. The platform harmonizes data generated internally and externally through the Center’s partnership projects across Africa and beyond. In addition, metadata for APHRC research datasets is indexed and made machine searchable using tools such as Schema.org to increase visibility and allow global data sharing. 
  • Data Analytics and Evaluation. A team of experienced data scientists, statisticians,s and mathematical modelers support data analytical support for “big-data” driven projects. The team uses machine learning techniques and new tools like Generative Artificial Intelligence to develop robust outputs that inform decision-making and impact lives through research.

INSPIRE Network

The Implementation Network for Sharing Population Information from Research Entities (INSPIRE) is hosted by the Data Science Program. INSPIRE was birthed in 2019 as a Health and Demographic Surveillance Sites (HDSS) network in East Africa. The vision has since changed and now hosts about 20 HDSS sites in Eastern Africa (Ethiopia, Kenya, Tanzania, Uganda),  Western Africa (Senegal, Burkina Faso), and Southern (Malawi) Africa. The INSPIRE secretariat provides; 

  • An annual general meeting to discuss value addition and collaboration among HDSS
  • Period hybrid training in data harmonization for data managers at the respective sites
  • Promotes federated data-sharing models for collaborative and joining analyses
  • Addressed recurrent challenges faced by HDSS e.g. record linkage
  • A platform for joint grant application across the network members