Designing and implementing data pipelines for scientific discovery

Location
London, Bristol, Manchester, Newcastle, Edinburgh.
Cohort Size
Maximum 15 participants per cohort.
Course Fees
The standard course fee is £400.

A reduced rate of £200 is available for current PhD and MSc students at select UK universities and research groups. Please contact us to find out if your group/institution is eligible.

To help you decide with confidence, we allow full payment to be made after the first session of the course.
Duration
12 hours spread over 2 days. Please see below for course schedule.
Dates
The course runs weekly in London and biweekly in other locations.
Level
Minimal programming and mathematics background required.
Certificate
An example course certificate can be viewed in this link.
Introducing our unique approach to data pipelines, summarised in our philosophy: “Pipelines as Artefacts”.
This course offers a fresh perspective on one of the most overlooked yet foundational parts of modern research: the data pipeline.
Data pipelines are often undervalued in academic research. Even researchers with programming experience may see them as little more than tools for cleaning or preparing data. For those newer to data-intensive work, pipelines may go entirely unrecognised, built on-the-fly without clear structure or long-term value.

This stands in stark contrast to their role in industry and leading technology research labs, where data pipelines are treated as foundational infrastructure and viewed as strategic assets, central to delivering consistent, high-quality research at scale.

Well-crafted pipelines can spark new research ideas, enable collaboration across disciplines, and even serve as the foundation for entire ecosystems of research tools and platforms. Not least, they boost visibility, credibility, and open up opportunities for industrial careers.

However the skills to design, craft and implement them are rarely taught well or coherently in academic settings. As a result, many researchers are left to piece together best practices informally often through years of trial, error, and word-of-mouth.

Developed through years of collaboration with researchers at leading institutions—including the University of Cambridge and LSE —as well as top industrial labs, this course helps scientists bridge the gap between academic research practices and industrial best practices, distilling knowledge and strategies on innovating high-quality data pipelines and using them to accelerate scientific discovery.

Finely tuned through extensive delivery and many iterations, this course is field-agnostic and designed to benefit researchers at all levels. Whether you’re just starting out with data workflows or refining your established practices, it provides structured, relevant and concise guidance delivered in a fun, anecdotal style by seasoned data pipeline experts.

The course also equips you with skills that make your work more attractive to non-traditional funders outside academia, like tech companies, innovation labs, and public/private digital initiatives, or helping you shape a niche academic consultancy profile.

One of the course highlights is “Your Data in Focus: Expert Consultation,” a supervisory group session where you bring your datasets and research questions to discuss with the instructors. This guided dialogue serves as a mini-supervision meeting, providing tailored advice to help you directly apply what you’ve learned to your research.

This is more than a technical course: it’s a rethinking of how you approach data, discovery, research impact, and even your career trajectory as a researcher.
Register interest

Endorsements
“I have got in touch with LON.AI to help me navigate a new Python codebase that I inherited from a research team related to applying advanced deep learning techniques to an econometric dataset.

Even as an experienced researcher and lecturer myself, LON.AI's training still managed to provide me with excellent advice that not only improved the design, efficiency and correctness of my code but much more importantly also sparked new research ideas to pursue.

It’s hard to find experts who not only understand software engineering and cutting-edge AI, but who also appreciate the unique needs and constraints of academic research - and can communicate these complex ideas clearly across disciplines.

That’s why I strongly recommend the training offered by LON.AI to any researcher or PhD student curious about using AI in their research.”
Dr Malvina Marchese
Associate Professor in Finance
Bayes Business School

Director,
International Institute of Forecasters
"We have been working with LON.AI since 2020 to help us supervise students working on their Applied Research Projects with industrial partners, and so far they helped us supervise more than 200 students on cutting-edge research projects related to LLMs, medical technology, advanced business analytics and quantum computing.

LON.AI’s efforts have definitely played an important role in making our programme one of the top ranked programmes in the UK and enahnced the employability of our cohorts with top-tier clients.

They have consistently provided students with outstanding technical guidance on software engineering techniques and on using frontier AI models to extract insight from their datasets.

I am sure if you get a chance to talk with Ahmad, Steffen or Jan you will see that they are best in class when it comes to educating the next generation of scientists on using AI.”
Prof Vali Asimit
Programme Director
City, University of London
Meet the instructors
Jan is an expert in big data, scientific computing, and AI, with senior roles across industry and academia. He has led large-scale projects at organisations such as Royal Mail and Citi Bank’s Innovation Lab, where he serves as Senior Vice President.

Known for bridging research and real-world application, Jan has lectured widely on AI, and has mentored over 100 students by integrating industry best practices into academic training.
Ahmad is an experienced educator and AI practitioner. He has held senior research and teaching roles at institutions including University of Cambridge, Imperial College London and LSE.

His expertise combines deep technical skill in machine learning with a strong focus on research impact and effective communication across disciplines.
Steffen is an AI practitioner with a track record of applying machine learning across sectors - from FTSE 100 companies to startups and academic research.

He has held roles such as Senior AI Engineer at Rolls-Royce and has recently focused on applying his knowledge in the AI startup ecosystem. Alongside his applied work, he teaches and consults on AI across universities and industry.
Course structure
Day 1
10:00-12:00
Lecture: Introduction to Data Pipelines
What do academics often get wrong about pipelines, and what are the common blind spots in how they approach, develop, and market them? How can you bridge the gap between best practices used in leading industrial labs and typical academic workflows? And how can you get more mileage out of your pipelines to boost visibility, collaboration, and momentum, wherever you are in your academic career?
12:00-13:00
Lab: Case studies of high-profile pipelines from a range of fields and industries, selected to align with the cohort’s research interests.
13:00-14:00
Break
14:00-16:00
Lecture: How to design and structure data pipelines to maximise scientific discovery: The "craft" of pipeline design.

How to think of pipelines as artefacts, exploring pipeline design as both a scientific and technical craft. We’ll cover the essential components, choices, and trade-offs involved. Crucially, the session focuses on coupling pipelines with the discovery process, moving beyond simple data preprocessing to using pipelines as a way to inspire new hypotheses and directions, allowing the pipeline to lead your research instead of merely supporting it.
16:00-17:00
Lab: In-depth exploration of how different pipeline design choices impact discovery, using real-world examples tailored to the cohort’s research interests.

Day 2

10:00-12:00
Lecture: How to Publish Data Pipelines

What does it mean to "publish" a data pipeline? What are the different "levels" of publishing pipelines, and how can you strategically align each level with various goals, including: academic recognition, funding impact, commercialisation, and long-term research value? We’ll also explore how to effectively market your pipeline to reach wider audiences and foster both vertical collaboration within your field and horizontal collaboration across disciplines.
12:00-13:00
Lab: Publishing workflows (cohort-specific examples) featuring demos and hands-on exercises covering the process end-to-end.
13:00-14:00
Break
14:00-17:00
Your data in focus: expert consultation

This session involves participants conducting an open discussion with the instructors on their own data and how they can  apply what they learnt in the course to their own research.
Register your interest
FAQs
Who is this course for?
I’m not sure if this course is right for me, can I speak to someone?
Is the course in-person? Can I attend remotely?
How do I enrol to the course?
What do participants receive upon completion?
What is the minimum level of programming requirement to participate in the course?
Do I need to be familiar with machine learning to participate?