Sunday, December 26, 2021

What is ETL - Kapil Sharma (Class-102/S2021)

 ETL: Extraction, Transform and Load

Its a generic process of extracting data from one or more systems and loading it into a data warehouse or databases after performing some intermediate transformations.

In simple terms, data extracted from different data sources like website, IoT, mobile apps, web-apps, ATM, PoS, SaaS, PaaS, BaaS, DB, IaaS, GPS, any electronic device which is capable to generate or collect data in different forms need to submit it in different data format format's like CSV, SQL, JSON etc. 

ETL is used for impeccable migrate data from one database to another DB. 

And this data needs to process further to drive a value out of the collected data requires a specific process and this is called ETL (Extract, Transform and Load) so that, the new processed data can be loaded in the data warehouse and later can be used for the data analysis by using BI Tools for granular level data analysis.




What is Apache Spark - Kapil Sharma (Class-101/S2021)

Apache Spark: 

Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters. 

In simple terms it's a ETL tool.