[Designing & Implementing
Data Pipelines for scientific Discovery]

Designing and implementing pipelines for scientific discovery

Location
London, Bristol, Manchester, Newcastle, Edinburgh
Duration
12 hours spread over 2 days. Please see below for course schedule.
Dates
The course runs once every two weeks in each location
Level
Minimal programming and mathematics background required.
Course Fees
Standard course fee is £400. A reduced rate of £300 is available for current PhD and MSc students.

To help you decide with confidence, we allow full payment to be made after the first session of the course.

How can researchers design and implement data pipelines for scientific research?
Our Data Pipelines for Science School will help scientists to learn how to correctly, efficiently and robustly prepare your datasets for machine learning in your scientific projects.
Well-curated and managed data is central to the effective use of AI, in science and elsewhere. How can scientists build the data pipelines they need to accelerate their research with AI?

Machine learning is an important tool for researchers across disciplines. Scientists today have access to more data, from a greater range of sources and at greater speed than ever before, and opportunities to extract insights from this data using AI. But before deploying AI, researchers must have a data pipeline that transforms their data into a state that is suitable for the machine learning algorithms being used.

These pipelines are important independent research outputs as they enable others to easily inspect, reproduce, refine or extend the scientist’s work. However, implementing data pipelines present numerous software challenges that might be difficult to resolve or even identify to scientists who do not have a significant expertise in software engineering concepts.

Such challenges include: how do I ensure the correctness of my pipeline? How do I structure my pipeline in a way that makes it easier for others to reuse and extend? How do I ensure my pipeline is robust enough to deal with different types and volumes of data? How do I document and publish my pipeline?

LON.AI’s Data Pipelines for Science School helps scientists overcome such data pipeline challenges by equipping them with the latest best-practice software techniques. It consists of a blend of lectures and labs, with a focus on discussing general principles and case-studies during the lectures, and a focus on hands-on exercises in Python during the labs. Participants also have the opportunity to discuss and share data pipeline issues encountered in their own research with the course instructor and cohort, and to relate it to the course content.
Register interest
Course structure
Day 01
01 Lecture
Introduction to data pipelines
01 Lab
Automating pipelines in Python
02 Lecture
Introduction to data pipelines
02 Lab
Automating pipelines in Python

Day 02

03 Lecture
Publishing Data Pipelines
03 Lab
Publishing workflow
04 Lecture
End-to-end example
04 Lab
Pipeline Deployment using cloud computing
Meet the instructors
Ahmad is a seasoned data scientist, educator, and machine learning engineer with a diverse background spanning academia, industry, and public sector projects.
Steffen is an AI practitioner and educator with an extensive track record of applying AI in various domains across academia, industry, and startups.
Jan is an expert in Big Data, Scientific Computing, and Artificial Intelligence, with a career spanning both industry and academia.
Register your interest
FAQs
Who is this course for?
How will this course help my research?
What is the minimum level of programming requirement to participate in the course?
Do you cover any other programming language apart from Python?
Do I need to be familiar with machine learning to participate?