Apache Airflow for Data Science: Automating Machine Learning Pipelines Training Course
Apache Airflow is an open-source platform for orchestrating workflows and automating complex data pipelines.
This instructor-led, live training (online or onsite) is aimed at intermediate-level participants who wish to automate and manage machine learning workflows, including model training, validation, and deployment using Apache Airflow.
By the end of this training, participants will be able to:
- Set up Apache Airflow for machine learning workflow orchestration.
- Automate data preprocessing, model training, and validation tasks.
- Integrate Airflow with machine learning frameworks and tools.
- Deploy machine learning models using automated pipelines.
- Monitor and optimize machine learning workflows in production.
Format of the Course
- Interactive lecture and discussion.
- Lots of exercises and practice.
- Hands-on implementation in a live-lab environment.
Course Customization Options
- To request a customized training for this course, please contact us to arrange.
Course Outline
Introduction to Apache Airflow for Machine Learning
- Overview of Apache Airflow and its relevance to data science
- Key features for automating machine learning workflows
- Setting up Airflow for data science projects
Building Machine Learning Pipelines with Airflow
- Designing DAGs for end-to-end ML workflows
- Using operators for data ingestion, preprocessing, and feature engineering
- Scheduling and managing pipeline dependencies
Model Training and Validation
- Automating model training tasks with Airflow
- Integrating Airflow with ML frameworks (e.g., TensorFlow, PyTorch)
- Validating models and storing evaluation metrics
Model Deployment and Monitoring
- Deploying machine learning models using automated pipelines
- Monitoring deployed models with Airflow tasks
- Handling retraining and model updates
Advanced Customization and Integration
- Developing custom operators for ML-specific tasks
- Integrating Airflow with cloud platforms and ML services
- Extending Airflow workflows with plugins and sensors
Optimizing and Scaling ML Pipelines
- Improving workflow performance for large-scale data
- Scaling Airflow deployments with Celery and Kubernetes
- Best practices for production-grade ML workflows
Case Studies and Practical Applications
- Real-world examples of ML automation using Airflow
- Hands-on exercise: Building an end-to-end ML pipeline
- Discussion of challenges and solutions in ML workflow management
Summary and Next Steps
Requirements
- Familiarity with machine learning workflows and concepts
- Basic understanding of Apache Airflow, including DAGs and operators
- Proficiency in Python programming
Audience
- Data scientists
- Machine learning engineers
- AI developers
Open Training Courses require 5+ participants.
Apache Airflow for Data Science: Automating Machine Learning Pipelines Training Course - Booking
Apache Airflow for Data Science: Automating Machine Learning Pipelines Training Course - Enquiry
Upcoming Courses
Related Courses
AdaBoost Python for Machine Learning
14 HoursThis instructor-led, live training in Slovakia (online or onsite) is aimed at data scientists and software engineers who wish to use AdaBoost to build boosting algorithms for machine learning with Python.
By the end of this training, participants will be able to:
- Set up the necessary development environment to start building machine learning models with AdaBoost.
- Understand the ensemble learning approach and how to implement adaptive boosting.
- Learn how to build AdaBoost models to boost machine learning algorithms in Python.
- Use hyperparameter tuning to increase the accuracy and performance of AdaBoost models.
Anaconda Ecosystem for Data Scientists
14 HoursThis instructor-led, live training in Slovakia (online or onsite) is aimed at data scientists who wish to use the Anaconda ecosystem to capture, manage, and deploy packages and data analysis workflows in a single platform.
By the end of this training, participants will be able to:
- Install and configure Anaconda components and libraries.
- Understand the core concepts, features, and benefits of Anaconda.
- Manage packages, environments, and channels using Anaconda Navigator.
- Use Conda, R, and Python packages for data science and machine learning.
- Get to know some practical use cases and techniques for managing multiple data environments.
AutoML with Auto-Keras
14 HoursThis instructor-led, live training in Slovakia (online or onsite) is aimed at data scientists as well as less technical persons who wish to use Auto-Keras to automate the process of selecting and optimizing a machine learning model.
By the end of this training, participants will be able to:
- Automate the process of training highly efficient machine learning models.
- Automatically search for the best parameters for deep learning models.
- Build highly accurate machine learning models.
- Use the power of machine learning to solve real-world business problems.
AutoML
14 HoursThis instructor-led, live training in Slovakia (online or onsite) is aimed at technical persons with a background in machine learning who wish to optimize the machine learning models used for detecting complex patterns in big data.
By the end of this training, participants will be able to:
- Install and evaluate various open source AutoML tools (H2O AutoML, auto-sklearn, TPOT, TensorFlow, PyTorch, Auto-Keras, TPOT, Auto-WEKA, etc.)
- Train high quality machine learning models.
- Efficiently solve different types of supervised machine learning problems.
- Write just the necessary code to initiate the automated machine learning process.
Creating Custom Chatbots with Google AutoML
14 HoursThis instructor-led, live training in Slovakia (online or onsite) is aimed at participants with varying levels of expertise who wish to leverage Google's AutoML platform to build customized chatbots for various applications.
By the end of this training, participants will be able to:
- Understand the fundamentals of chatbot development.
- Navigate the Google Cloud Platform and access AutoML.
- Prepare data for training chatbot models.
- Train and evaluate custom chatbot models using AutoML.
- Deploy and integrate chatbots into various platforms and channels.
- Monitor and optimize chatbot performance over time.
DataRobot
7 HoursThis instructor-led, live training in Slovakia (online or onsite) is aimed at data scientists and data analysts who wish to automate, evaluate, and manage predictive models using DataRobot's machine learning capabilities.
By the end of this training, participants will be able to:
- Load datasets in DataRobot to analyze, assess, and quality check data.
- Build and train models to identify important variables and meet prediction targets.
- Interpret models to create valuable insights that are useful in making business decisions.
- Monitor and manage models to maintain an optimized prediction performance.
Data Mining with Weka
14 HoursThis instructor-led, live training in Slovakia (online or onsite) is aimed at beginner to intermediate-level data analysts and data scientists who wish to use Weka to perform data mining tasks.
By the end of this training, participants will be able to:
- Install and configure Weka.
- Understand the Weka environment and workbench.
- Perform data mining tasks using Weka.
Google Cloud AutoML
7 HoursThis instructor-led, live training in Slovakia (online or onsite) is aimed at data scientists, data analysts, and developers who wish to explore AutoML products and features to create and deploy custom ML training models with minimal effort.
By the end of this training, participants will be able to:
- Explore the AutoML product line to implement different services for various data types.
- Prepare and label datasets to create custom ML models.
- Train and manage models to produce accurate and fair machine learning models.
- Make predictions using trained models to meet business objectives and needs.
Kaggle
14 HoursThis instructor-led, live training in Slovakia (online or onsite) is aimed at data scientists and developers who wish to learn and build their careers in Data Science using Kaggle.
By the end of this training, participants will be able to:
- Learn about data science and machine learning.
- Explore data analytics.
- Learn about Kaggle and how it works.
Machine Learning for Mobile Apps using Google’s ML Kit
14 HoursThis instructor-led, live training in (online or onsite) is aimed at developers who wish to use Google’s ML Kit to build machine learning models that are optimized for processing on mobile devices.
By the end of this training, participants will be able to:
- Set up the necessary development environment to start developing machine learning features for mobile apps.
- Integrate new machine learning technologies into Android and iOS apps using the ML Kit APIs.
- Enhance and optimize existing apps using the ML Kit SDK for on-device processing and deployment.
Accelerating Python Pandas Workflows with Modin
14 HoursThis instructor-led, live training in Slovakia (online or onsite) is aimed at data scientists and developers who wish to use Modin to build and implement parallel computations with Pandas for faster data analysis.
By the end of this training, participants will be able to:
- Set up the necessary environment to start developing Pandas workflows at scale with Modin.
- Understand the features, architecture, and advantages of Modin.
- Know the differences between Modin, Dask, and Ray.
- Perform Pandas operations faster with Modin.
- Implement the entire Pandas API and functions.
Machine Learning with Random Forest
14 HoursThis instructor-led, live training in Slovakia (online or onsite) is aimed at data scientists and software engineers who wish to use Random Forest to build machine learning algorithms for large datasets.
By the end of this training, participants will be able to:
- Set up the necessary development environment to start building machine learning models with Random forest.
- Understand the advantages of Random Forest and how to implement it to resolve classification and regression problems.
- Learn how to handle large datasets and interpret multiple decision trees in Random Forest.
- Evaluate and optimize machine learning model performance by tuning the hyperparameters.
Advanced Analytics with RapidMiner
14 HoursThis instructor-led, live training in Slovakia (online or onsite) is aimed at intermediate-level data analysts who wish to learn how to use RapidMiner to estimate and project values and utilize analytical tools for time series forecasting.
By the end of this training, participants will be able to:
- Learn to apply the CRISP-DM methodology, select appropriate machine learning algorithms, and enhance model construction and performance.
- Use RapidMiner to estimate and project values, and utilize analytical tools for time series forecasting.
RapidMiner for Machine Learning and Predictive Analytics
14 HoursRapidMiner is an open source data science software platform for rapid application prototyping and development. It includes an integrated environment for data preparation, machine learning, deep learning, text mining, and predictive analytics.
In this instructor-led, live training, participants will learn how to use RapidMiner Studio for data preparation, machine learning, and predictive model deployment.
By the end of this training, participants will be able to:
- Install and configure RapidMiner
- Prepare and visualize data with RapidMiner
- Validate machine learning models
- Mashup data and create predictive models
- Operationalize predictive analytics within a business process
- Troubleshoot and optimize RapidMiner
Audience
- Data scientists
- Engineers
- Developers
Format of the Course
- Part lecture, part discussion, exercises and heavy hands-on practice
Note
- To request a customized training for this course, please contact us to arrange.
GPU Data Science with NVIDIA RAPIDS
14 HoursThis instructor-led, live training in Slovakia (online or onsite) is aimed at data scientists and developers who wish to use RAPIDS to build GPU-accelerated data pipelines, workflows, and visualizations, applying machine learning algorithms, such as XGBoost, cuML, etc.
By the end of this training, participants will be able to:
- Set up the necessary development environment to build data models with NVIDIA RAPIDS.
- Understand the features, components, and advantages of RAPIDS.
- Leverage GPUs to accelerate end-to-end data and analytics pipelines.
- Implement GPU-accelerated data preparation and ETL with cuDF and Apache Arrow.
- Learn how to perform machine learning tasks with XGBoost and cuML algorithms.
- Build data visualizations and execute graph analysis with cuXfilter and cuGraph.