[Designing & Implementing
Data Pipelines for scientific Discovery]

Designing and implementing pipelines for scientific discovery

Location

London, Bristol, Manchester, Newcastle, Edinburgh

Duration

12 hours spread over 2 days. Please see below for course schedule.

Dates

The course runs once every two weeks in each location

Level

Minimal programming and mathematics background required.

Course Fees

Standard course fee is £400. A reduced rate of £300 is available for current PhD and MSc students.

To help you decide with confidence, we allow full payment to be made after the first session of the course.

How can researchers design and implement data pipelines for scientific research?

Our Data Pipelines for Science School will help scientists to learn how to correctly, efficiently and robustly prepare your datasets for machine learning in your scientific projects.

Well-curated and managed data is central to the effective use of AI, in science and elsewhere. How can scientists build the data pipelines they need to accelerate their research with AI?

Machine learning is an important tool for researchers across disciplines. Scientists today have access to more data, from a greater range of sources and at greater speed than ever before, and opportunities to extract insights from this data using AI. But before deploying AI, researchers must have a data pipeline that transforms their data into a state that is suitable for the machine learning algorithms being used.

These pipelines are important independent research outputs as they enable others to easily inspect, reproduce, refine or extend the scientist’s work. However, implementing data pipelines present numerous software challenges that might be difficult to resolve or even identify to scientists who do not have a significant expertise in software engineering concepts.

Such challenges include: how do I ensure the correctness of my pipeline? How do I structure my pipeline in a way that makes it easier for others to reuse and extend? How do I ensure my pipeline is robust enough to deal with different types and volumes of data? How do I document and publish my pipeline?

LON.AI’s Data Pipelines for Science School helps scientists overcome such data pipeline challenges by equipping them with the latest best-practice software techniques. It consists of a blend of lectures and labs, with a focus on discussing general principles and case-studies during the lectures, and a focus on hands-on exercises in Python during the labs. Participants also have the opportunity to discuss and share data pipeline issues encountered in their own research with the course instructor and cohort, and to relate it to the course content.

Course structure

Day 01

01 Lecture

Introduction to data pipelines

01 Lab

Automating pipelines in Python

02 Lecture

Introduction to data pipelines

02 Lab

Automating pipelines in Python

Day 02

03 Lecture

Publishing Data Pipelines

03 Lab

Publishing workflow

04 Lecture

End-to-end example

04 Lab

Pipeline Deployment using cloud computing

Meet the instructors

Dr Ahmad Abu-Khazneh

Head of curriculum design

Ahmad is a seasoned data scientist, educator, and machine learning engineer with a diverse background spanning academia, industry, and public sector projects.

Dr Steffen Issleib

Managing director

Steffen is an AI practitioner and educator with an extensive track record of applying AI in various domains across academia, industry, and startups.

Dr. Jan Rock

Head of research computing platforms

Jan is an expert in Big Data, Scientific Computing, and Artificial Intelligence, with a career spanning both industry and academia.

Find more about the instructors

FAQs

[Designing & Implementing Data Pipelines for scientific Discovery]

Designing and implementing pipelines for scientific discovery

Day 02

[Designing & Implementing
Data Pipelines for scientific Discovery]