DATA SCIENTISTS

DEADLINE: June 8, 2022
OPEN JOB

The African Population and Health Research Center (APHRC) is a leading Africa-based, African-led, international research institution headquartered in Nairobi, Kenya. APHRC conducts policy-relevant research on population, health, education, urbanization and related development issues in sub-Saharan Africa.

APHRC seeks to recruit two Data Scientists for the INSPIRE Platform for Evaluation and Analysis of COVID-19 Harmonized (PEACH) data project.

BACKGROUND

The PEACH data project is hosted within the Implementation Network for Sharing Population Information from Research Entities (INSPIRE) (https://aphrc.org/inspire/).

INSPIRE is building a generic model for health data from longitudinal population studies (LPS) using OMOP (Observational Medical Outcomes Partnership) database. The INSPIRE PEACH proposes to develop the key elements of a coordinated Pan-African COVID-19 data ecosystem. We will build a robust suite of data standards and technologies and diverse data integration methodologies, using the power of Artificial Intelligence and Data Science for analysis and oversight through a trusted governance and policy environment.

  • Development and training

On the job training will be provided by personnel within the INSPIRE, both within the employing institution and from other institutions that are affiliated with INSPIRE. The role holder will attend meetings and workshops held by the designated studies they are working on. There may be opportunities for further studies in Data Science commissioned by the Network as resources may allow; potential for a PhD subject to satisfactory performance and funding.

  • Relationships

The post holder will report to the Project Lead or Project Members within the INSPIRE in their institution, that is, the APHRC in Kenya. S/he will work very closely, and on a day-to-day basis with their counterpart(s) based at Malawi Epidemiology and Intervention Research Unit (MEIRU) in Malawi (https://meiru.lshtm.ac.uk/).S/he will have routine interactions with other INSPIRE partners affiliated with the African NCD Longitudinal Data Alliance (ANDLA), the Analyzing Longitudinal Population-based HIV/AIDS data on Africa (ALPHA) network based at the London School of Hygiene & Tropical Medicine (LSHTM) in the United Kingdom, South African Population Research Infrastructure Network (SAPRIN) in South Africa, and Committee on Data of the International Science Council (CODATA) in France.

Duties/Responsibilities

The Data Scientist will be primarily responsible for defining data specifications needed for COVID-19 data and metadata including alternative data sources. S/he will be guided by population health knowledge gaps identified in the INSPIRE PEACH knowledge translation hub. Informed by cohort definitions, s/he will work with the Data Trackers to ensure the data they find is prepared to these specifications and will develop AI search programs for finding data that can populate these data specifications. S/he will extract, transform and load (ETLs) the data and associated metadata to data specifications defined by INSPIRE Network into the INSPIRE common data model. The Data Scientist would work with INSPIRE staff (including the Data Trackers) to build “on-ramps” which transfer COVID-19 related data from agreed data specifications to the INSPIRE Common Data Model. The Data Scientist would be expected to learn about the CDM using OMOP data under the direction of INSPIRE partners within the first six months of their employment.

Additionally, the Data Scientist, working with the INSPIRE PEACH knowledge translation hub team again, will develop cohort off ramps for AI-infused population health research on top of the OMOP CDM. AI initiatives might include the construction of cohorts based on synthetic data that has been trained with real data as well as the conduct of simulated trials with population health “treatments” using synthetic and/or real data. These “treatments” would be in line with policy initiatives currently under review by MOHs. In this way, the Data Scientist would be responsible for developing, conducting and eventually analyzing the results of more or less “natural experiments”.

The Data Scientist will:

Data collection

  • To prepare final list of the data specifications used by the INSPIRE network for COVID-19 data.
  • To perform data extraction work from source databases.
  • To perform data profiling and quality assessments for the gathered data.

 Data processing and storage in database systems

  • To transform collected COVID-19 data into the INSPIRE PEACH data exchange protocols.
  • To ensure quality of the data transformations and the resulting data provided by data trackers for data specs.
  • Develop on-ramps for putting data into OMOP CDM in consultation with INSPIRE partners.
  • Implement the on-ramps using the data from Data trackers in Malawi and Kenya.
  • Develop cohort off-ramps from the OMOP CDM suitable for the conduct of natural experiments with “treatments” that take the form of previous and future public health interventions.
  • Write up the results of these experiments and place them in the INSPIRE PEACH knowledge base for vetting by the knowledge translation hub team and future publications.

 Data Cataloguing and sharing

  • Develop minimum metadata requirements to accompany the source data.
  • Manage and document the Common Data Model to ensure provenance of the data in the CDM.
  • Support the data preparation of off – ramps data products including with metadata required for sharing data.

 Overarching

  • Ensure data standards are aligned with program and project priorities.
  • Take part in training and workshops organized by INSPIRE, both physically and virtually.
  • Under the direction of the INSPIRE team, engage with the training and mentoring of data staff of INSPIRE network members to ensure continuity of data and data provenance.
  • Prepare monthly progress reports on their work.
  • Inform and take directions from their line managers in INSPIRE to ensure continuity of data operations.
  • Liaise with the team managing the CDM, including INSPIRE Network Partners, to ensure their work fits within the scope of the INSPIRE CDM.
  • Attend meetings and workshops organized by INSPIRE, as required; the workshops may be around data management, upload, analysis, writing up and planning.
  • Provide administrative support across work-streams; handle meeting invitations, bookings, training venues, training materials and support the organization of periodic meetings for the INSPIRE.
  • Internalize the project work plan and anticipate administrative needs to support implementation and project work-streams. This will include working with partners to gather project requirements, maintaining a system for monitoring project activities, milestones and deliverables on a monthly basis as well as maintaining the INSPIRE learning platform and provide support to partners using the platform as needed.
  • Prepare quarterly, intermediate and annual program status reports required for management and donors. These reports will reflect achievements made, challenges and solutions.
  • Establish and maintain technical contacts with other stakeholders and partners, lead on communication with INSPIRE members and respond to queries as needed, provide information to concerned parties on progress, problems, required changes and document actions to the project’s implementation for the consideration of the team.
  • Provide administrative support for proposal development for continued funding of the INSPIRE activities.
  • Assist in completion of administrative forms and requests.

Qualifications, Skills, and Experience 

The ideal candidate would have worked with health data (preferably longitudinal health data), has experience with health and demographic surveillance systems (HDSS) and is familiar with the data procedures from INDEPTH network (http://www.indepth- network.org/).

S/he would have excellent skills in data management and programming (relational databases).  The expectation is for the team of Data Scientists (based in both Kenya and Malawi) to write programmes for data extraction, transform and loading (ETL) in a variety of languages. The ideal post holder will have the following:

  • Master of Statistics, Data Science, M&E, Econometrics, Software Engineering, Demographic Research, Information Systems or equivalent in relevant area.
  • At least 3-5 years’ post first-degree experience with data management of longitudinal, medical research studies and in handling large datasets.
  • Knowledge of a programming language such as Python, Perl, R, JAVA, or equivalent and in ETL transfers.
  • Experience of DB servers e.g., PostgreSQL, MySQL, SQL Server, or Oracle or equivalent.
  • Experience querying databases using SQL language.
  • Experience conducting and/or managing health/research projects.
  • Experience in conduct and analysis of quantitative research.
  • Excellent communication (written and spoken) and interpersonal skills.
  • Strong organizational and program management skills.
  • Ability to take initiative and work both independently and in teams.
  • Fluent in English.

This position is classified under Nationally Recruited Positions (NRP), Grade V in our scales. The appointment will be for a one –year period renewable subject to satisfactory performance and funding.

Interested candidates are encouraged to apply through our recruitment portal https://aphrc.org/vacancies/ by June 08, 2022. Only shortlisted candidates will be contacted; shortlisted candidates will be required to have a Police Clearance Certificate. Cover letters should be addressed to:

The Human Resources Officer

African Population and Health Research Center, Inc

APHRC Campus, Manga Close, off Kirawa Road, Kitisuru

P.O Box 10787-GPO, Nairobi

Website: www.aphrc.org

APHRC is an equal opportunity employer and is committed to the protection of vulnerable persons