Sela

Data Engineering

Description
Get hands-on experience with designing and building data processing systems on Google Cloud. This course uses lectures, demos, and hands-on labs to show you how to design data processing systems, build end-to-end data pipelines, analyze data, and implement machine learning. This course covers structured, unstructured, and streaming data.
Intended audience
This class is intended for developers who are responsible for: Extracting, loading, transforming, cleaning, and validating data. Designing pipelines and architectures for data processing. Integrating analytics and machine learning capabilities into data pipelines. Querying datasets, visualizing query results, and creating reports.

Topics

Explore the role of a data engineer
Analyze data engineering challenges
Introduction to BigQuery
Data lakes and data warehouses
Transactional databases versus data warehouses
Partner effectively with other data teams
Manage data access and governance
Build production-ready pipelines
Review Google Cloud customer case study
Introduction to data lakes
Data storage and ETL options on Google Cloud
Building a data lake using Cloud Storage
Securing Cloud Storage
Storing all sorts of data types
Cloud SQL as a relational data lake
The modern data warehouse
Introduction to BigQuery
Getting started with BigQuery
Loading data
Exploring schemas
Schema design
Nested and repeated fields
Optimizing with partitioning and clustering
EL, ELT, ETL
Quality considerations
How to carry out operations in BigQuery
Shortcomings
ETL to solve data quality issues
The Hadoop ecosystem
Run Hadoop on Dataproc
Cloud Storage instead of HDFS
Optimize Dataproc
Introduction to Dataflow
Why customers value Dataflow
Dataflow pipelines
Aggregating with GroupByKey and Combine
Side inputs and windows
Dataflow templates
Dataflow SQL
Building batch data pipelines visually with Cloud Data Fusion
Components
UI overview
Building a pipeline
Exploring data using Wrangler
Orchestrating work between Google Cloud services with Cloud Composer
Apache Airflow environment
DAGs and operators
Workflow scheduling
Monitoring and logging
Process Streaming Data
Introduction to Pub/Sub
Pub/Sub push versus pull
Publishing with Pub/Sub code
Steaming data challenges
Dataflow windowing
Streaming into BigQuery and visualizing results
High-throughput streaming with Cloud Bigtable
Optimizing Cloud Bigtable performance
Analytic window functions
Use With clauses
GIS functions
Performance considerations
What is AI?
From ad-hoc data analysis to data-driven decisions
Options for ML models on Google Cloud
Unstructured data is hard
ML APIs for enriching data
What’s a notebook?
BigQuery magic and ties to Pandas
Ways to do ML on Google Cloud
Vertex AI Pipelines
AI Hub
BigQuery ML for quick model building
Supported models
Why AutoML?
AutoML Vision
AutoML NLP
AutoML tables

רוצה לדבר עם יועץ?

האם אתה בטוח שאתה רוצה לסגור את הטופס ולאבד את כל השינויים?