Sela

Machine learning: From foundation to Large Language Models

Description
This course provides a comprehensive introduction to modern machine learning, covering the mathematical foundations, core architectures, and cutting-edge applications of neural networks. Students will explore the evolution from classical perceptrons to state-of-the-art Large Language Models (LLMs), understanding both theoretical principles and practical implementations. The course includes hands-on exercises using PyTorch and Google Colab, enabling students to develop practical skills in implementing, training, fine-tuning, and optimizing deep learning models, as well as evaluating their performance on real datasets. By the end of the course, students will understand how contemporary AI systems work, including models like GPT, BERT, and vision transformers, and will be equipped to design, train, and deploy machine learning solutions for diverse applications. Students will gain hands-on experience in implementing, training, fine-tuning, and optimizing deep learning models.
Intended audience
This course is intended for students and professionals with a technical background who wish to gain a rigorous introduction to modern machine learning and deep learning. It is suitable for computer science and engineering students, as well as software developers and data scientists seeking to understand the foundations of contemporary AI models.

Topics

Introduction and History: From deductive reasoning to inductive learning; the biological inspiration for neural networks.
Artificial Neural Networks (ANN): Input, hidden, and output layers; fully connected feed-forward topologies.
Mathematical Components: Weights, bias, and activation functions (Sigmoid, Tanh, ReLU).
Matrix Representation: Expressing layers as matrix operations and tensors.
The Learning Process: Forward pass, loss computation (MSE, Cross-Entropy), and the backward pass (Backpropagation).
Gradient Descent: Iterative optimization, learning rates, and mini-batch training.
Model Configuration: Balancing depth vs. width; managing underfitting and overfitting.
Enhancements: Scaling, normalization (Batch Norm), and regularization (Dropout).
Convolutional Fundamentals: Kernels, filters, feature maps, and spatial resolution.
CNN Layers: Pooling (Max/Average), strided convolutions, and padding strategies.
Advanced Architectures: Residual Networks (ResNet) and the solution to vanishing gradients; Bottleneck blocks.
Object Detection: Two-stage (R-CNN family, Faster R-CNN) vs. Single-stage detectors (YOLO, SSD).
Multi-scale Features: Feature Pyramid Networks (FPN) and Path Aggregation Networks (PAN).
Transfer Learning: Feature extraction vs. fine-tuning using pretrained models (e.g., ImageNet).
Efficiency Techniques: Cross Stage Partial Networks (CSP) for optimized gradient flow and reduced computation.
Data Augmentation: Techniques to improve generalization through random transformations.
Recurrent Neural Networks (RNN): Handling sequential data, hidden states, and Backpropagation Through Time (BPTT).
Long-Term Dependencies: The LSTM architecture and gating mechanisms (Forget, Input, Output gates).
The Transformer Revolution: Attention mechanisms, parallel sequence processing, and the foundation of GPT/LLMs.
Transformers: Self-attention, multi-head attention, positional encoding and transformer blocks.
Large Language Models: Decoder-only models, training pipeline, pretraining, SFT, RLHF, DPO, LoRA and parameter-efficient fine-tuning.
Reasoning and Tool-Augmented AI: Chain-of-thought, reasoning models, external tools and agents, RAG and memory systems, multimodal models.

רוצה לדבר עם יועץ?

האם אתה בטוח שאתה רוצה לסגור את הטופס ולאבד את כל השינויים?