James Black

James Black

PhD (Cantab)

Roche / Genentech


I’m James Black, director for Roche’s late-stage Insights codebase and business lead for our clinical and RWD scientific computing environment environments. Previously, I was studying at the MRC Epidemiology Unit/Jesus College, University of Cambridge, worked at the London School of Hygiene and Tropical Medicine, and did my earlier degrees and schooling in New Zealand.

  • Data to insight flows
  • Data Science and Informatics interface
  • Pragmatic data science
  • Reproducibility
  • Real World Data
  • Leading technical teams
  • PhD in Medical Science, 2015

    Cambridge University

  • MPhil in Epidemiology, 2012

    Cambridge University

  • Masters in Public Health, 2010

    Otago University

  • BBioMedSci in Infection & Immunity, 2008

    Otago University


My open-source commit activity on github.com ( ).

Github activity

Google analytics data on site visits over the last ~6 months.



Insights Engineering People and Product Family Lead
Roche / Genentech
Apr 2021 – Present Basel, Switzerland

Responsibilities include:

  • Leading a dedicated team of data scientists, developers and engineers that build R/SAS/Python and other language data products for insights
  • Business responsible for our Scientific Computing Environment development
  • Promote a collaborative codebase, and develop tools, frameworks and mechanisms to support around 1,000 data scientists productionising their code
  • Core member of Roche’s Corporate Executive Committee sponsored effort to evaluate our platform and tools across Roche Pharma
  • Promote, facilitate and track our internal open source code base
  • Represent Pharma Development in Roche’s Inner and Open Source office
Associate Director, Personalised Healthcare Analytics
Roche / Genentech
Aug 2018 – Mar 2021 Basel, Switzerland

Responsibilities include:

  • Leading a team of 10 data scientists and 4 engineers
  • Partnering with collaborations group to better understand real world data
  • Leading an engineering team responsible for enterprise level dashboards and tools used by >700 internal users
  • Driving the insight engineering roadmap, partnering closely with informatics
  • Managing data scientists working across the organisation to deliver real world evidence to benefit patients
Data Scientist, Personalised Healthcare Analytics
Roche / Genentech
Oct 2015 – Jul 2018 Basel, Switzerland
Delivering on molecule needs and supporting development of more robust data science workflows.
Grad Student
MRC Epidemiology Unit | Jesus College, University of Cambridge
Oct 2011 – Sep 2015 Cambridge, England
Completed MPhil in Epidemiology followed by a PhD exploring individualised care in populations with screen-detected diabetes, with a focus on preventing tertiary outcomes like cardiovascular disease. During my PhD I also undertook teaching opportunities. This included MPhil students, as well as teaching my own one-month course on medical science at a summer school held at Jesus College.
Research Assistant
Health Services Research Centre, Victoria University
Jan 2010 – Aug 2010 Wellington, New Zealand
I analysed the relationship between health-related quality of life and waiting lists for elective surgery. I approached this by using a multi-level mixed effects model to assess the ability of the health system to minimise the burden the waiting list population represents. After completing this project before the end of my contract this was expanded to explore the existence of different health trajectories present after referral using a group-based approach to finite mixture modelling.
Undergrad | Grad Student
School of Medicine, University of Otago
Jan 2008 – Apr 2010 Wellington, New Zealand
Completed a BBioMedSci in Infection and Immunity, and a research MPH. My thesis was an individual patient data meta-analysis of patients health related quality of life after injury for a thesis only Masters in Public Health. This thesis aimed to provide a descriptive benchmark of quality of life norms after injury and has since been included as a data source for injury estimates in the GBD-2010 project.

Recent Posts

Smart Home 2.0
Tracking our attempts to setup our new home
Smart Home 2.0

Recent Publications

Quickly discover relevant content by filtering publications.
(2021). Case study in the development of a framework for quality and reproducibility in inner-sourced packages and self-service analytic dashboards to accelerate common study types. American Medical Informatics Association Annual Conference.


(2020). A comparison of sampling methods to select real-world relapsed/refractory (R/R) diffuse large-cell B-cell lymphoma (DLBCL) patients with multiple eligible index dates. Pharmacoepidemiology and Drug Safety.

(2018). Process mining for exploring treatment patterns in Chronic Lymphocytic Leukemia (CLL) in a real world oncology database. 34th International Conference on Pharmacoepidemiology and Therapeutic Risk Management.


(2018). Validation of Clinical vs Algorithmic Definition of Line of Therapy in Multiple Myeloma. 34th International Conference on Pharmacoepidemiology and Therapeutic Risk Management.


(2016). Health Status in Patients with Severe Haemophilia a According to Treatment Regimen and Age. Value in Health.



AWS Cloud Practitioner Essentials
Executive Data Science Capstone
Data Science in Real Life
Building a Data Science Team
Managing Data Analysis
A Crash Course in Data Science
Diversity and inclusion in the workplace
Design and Interpretation of Clinical Trials
DevOps Culture and Mindset
Agile with Atlassian Jira
How to manage a remote team
IT Accelerate - Innovation in IT
2-week onsite in Pittsburgh executive education course run by Tepper School of Business at Carnegie Mellon University with 2 month remote project.
Intro to Python for Data Science
Data visualisation with ggplot (part 1)
Data manipulation in R with dplyr
Writing Functions in R
Machine Learning Toolbox
R Programming
Diabetes - a Global Challenge
Diabetes. Diagnosis, Treatment, and Oppurtunities
Maps and Geospatial Revolution