[Designing & Implementing
Data Pipelines for Scientific Discovery]

Designing and implementing pipelines for scientific discovery

Location
London, Bristol, Manchester, Newcastle, Edinburgh.
Cohort Size
Maximum 15 participants per cohort.
Course Fees
The standard course fee is £400.

A reduced rate of £200 is available for current PhD and MSc students at select UK universities and research groups. Please contact us to find out if your group/institution is eligible.

To help you decide with confidence, we allow full payment to be made after the first session of the course.
Duration
12 hours spread over 2 days. Please see below for course schedule.
Dates
The course runs weekly in London and biweekly in other locations.
Level
Minimal programming and mathematics background required.
Introducing our unique approach to data pipelines, summarised in our philosophy: “Pipelines as Artefacts.”
This course offers a fresh perspective on one of the most overlooked yet foundational parts of modern research: the data pipeline.
Developed through years of teaching and collaborating with researchers at top-tier institutions like the University of Cambridge and Imperial College London, as well as leading industrial labs, this approach encourages researchers to treat pipelines not as disposable code, but as lasting, inspectable, and shareable research outputs in their own right.

We show you how designing thoughtful, well-structured pipelines can do far more than clean data and prepare it for analysis, they can spark new research directions, uncover hidden patterns, and turn your workflows into publishable, fundable contributions.

In an academic landscape increasingly shaped by open science and reproducibility, this approach not only strengthens the rigour of your work but also enhances your academic profile, positioning you to publish tools as well as findings.

The course is designed to be field-agnostic: whether you're working in bioinformatics, economics, or digital humanities, the principles and practices we teach are widely applicable and immediately actionable. Finely calibrated through years of delivery, the course benefits researchers at all levels, including those with little or no prior programming experience.

We also prepare you to engage with the wider research economy, equipping you with skills that make your work more attractive to non-traditional funders outside academia, like tech companies, innovation labs, and public/private digital initiatives.

As a further benefit, well-architected pipelines lay the groundwork for applying machine learning and AI to your research, opening doors to automation, modelling, and new collaborations you may not have considered before.

A key highlight of the course is “Your Data in Focus: Expert Consultation”, an interactive group session where you’ll have the chance to bring your own datasets and research challenges to discuss with all three instructors. This collaborative dialogue helps you directly apply the course concepts to your work, making the training immediately relevant and impactful.

This is more than a technical course: it’s a rethinking of how you approach data, discovery, research impact, and even your career trajectory as a researcher.
Register interest
Course structure
Day 1
10:00-12:00
Lecture: Introduction to data pipelines: what are they and why are they important?

What do academics get wrong about pipelines? How to bridge the gap between best-practice adopted in leading industrial labs and academic workflows?
12:00-13:00
Lab: Case studies of high-profile pipelines from various fields and industries (cohort-specific)
13:00-14:00
Break
14:00-16:00
Lecture: How to design and structure data pipelines to maximise scientific discovery.

How to think of pipelines as artefacts, discussion on how to approach pipeline design as a scientific and technical craft, and walk through the key components, choices, and trade-offs involved in crafting pipelines.

16:00-17:00
Lab: Example of crafting pipelines from various fields and highlighting key decisions (cohort-specific)

Day 2

10:00-12:00
Lecture: How to publish data pipelines

What does it mean to "publish" a data pipeline? What are the different "levels" of publishing pipelines and how to strategically align each level with different goals, including academic recognition, funding impact, commercialisation, and long-term research value.
12:00-13:00
Lab: Publishing workflows examples (cohort-specific examples)

13:00-14:00
Break
14:00-17:00
Your data in focus: expert conultation

This session involves participants conducting an open discussion with the instructors on their own data and how they can  apply what they learnt in the course to their own research.
Meet the instructors
Ahmad is an experienced educator and AI practitioner. He has held senior research and teaching roles at institutions including University of Cambridge, Imperial College London and LSE.

His expertise combines deep technical skill in machine learning with a strong focus on research impact and effective communication across disciplines.
Steffen is an AI practitioner with a track record of applying machine learning across sectors - from FTSE 100 companies to startups and academic research.

He has held roles such as Senior AI Engineer at Rolls-Royce and has recently focused on applying his knowledge in the AI startup ecosystem. Alongside his applied work, he teaches and consults on AI across universities and industry.
Jan is an expert in big data, scientific computing, and AI, with senior roles across industry and academia. He has led large-scale projects at organisations such as Royal Mail and Citi Bank’s Innovation Lab, where he serves as Senior Vice President and Lead Software Engineer.

Known for bridging research and real-world application, Jan has lectured widely on AI, and has mentored over 100 students by integrating industry best practices into academic training.
Register your interest
FAQs
Who is this course for?
I’m not sure if this course is right for me, can I speak to someone?
Is the course in-person? Can I attend remotely?
How do I enrol to the course?
What do participants receive upon completion?
What is the minimum level of programming requirement to participate in the course?
Do I need to be familiar with machine learning to participate?