Added Data Engineering Resources

pull/17/head
Nawin Raj Kumar 2023-05-16 11:39:42 +05:30
parent 8935f5b3b9
commit a685aa1e64
2 changed files with 36 additions and 20 deletions

View File

@ -47,26 +47,6 @@ _Navaneeth Malingan_
- [PyImageSearch](https://www.pyimagesearch.com/start-here/)
- [5 Beginner Friendly Steps to Learn Machine Learning and Data Science with Python](https://www.mrdbourke.com/5-beginner-friendly-steps-to-learn-machine-learning/)
## Data Engineering
### Batch Proceesing
- [Understanding Data Engineering by Datacamp](https://app.datacamp.com/learn/courses/understanding-data-engineering)
- [Introduction to Data Engineering by Datacamp](https://app.datacamp.com/learn/courses/introduction-to-data-engineering)
- [Apache Spark Tutorial (used for Large Scale Data Processing using SQL commands)](https://spark.apache.org/docs/latest/sql-getting-started.html)
- [Test your knowledge using ProjectPro](https://www.projectpro.io/article/big-data-interview-questions-/773)
### Stream Processing
- [Introduction to Apache Kafka Streams](https://kafka.apache.org/documentation/streams/)
- [Apache Flink Documentation](https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/try-flink/datastream/)
- [Stream Processing Quiz](https://chauff.github.io/documents/bdp-quiz/streaming.html)
### Data Pipelines and Integration
- [Building Data Engineering Pipelines in Python](https://app.datacamp.com/learn/courses/building-data-engineering-pipelines-in-python)
- ["What is Data Integration?" by talend](https://www.talend.com/resources/what-is-data-integration/)
- [Data Cleaning Challenge: Handling missing values](https://www.kaggle.com/code/rtatman/data-cleaning-challenge-handling-missing-values/notebook)
## Intro to ML

View File

@ -0,0 +1,36 @@
# Data Engineering Resources
Data engineering is a field of work that involves **designing, building, and managing the infrastructure** and systems required to **collect, store, process, and analyze data**. Data engineers play a crucial role in the data lifecycle, ensuring that data is available, accessible, and reliable for various data-driven applications and decision-making processes.
---
## Here are some key resources for Data Engineering
---
### Batch Proceesing
**Batch processing** is a data processing technique where a set of data is collected over a period of time and processed as a group or batch. In batch processing, data is processed in predefined batches rather than being processed in real-time or immediately upon arrival. to understand the basics of Data Engineering, see this resources.
- [Understanding Data Engineering by Datacamp](https://app.datacamp.com/learn/courses/understanding-data-engineering)
- [Introduction to Data Engineering by Datacamp](https://app.datacamp.com/learn/courses/introduction-to-data-engineering)
- [Apache Spark Tutorial (used for Large Scale Data Processing using SQL commands)](https://spark.apache.org/docs/latest/sql-getting-started.html)
- [Test your knowledge using ProjectPro](https://www.projectpro.io/article/big-data-interview-questions-/773)
### Stream Processing
**Stream processing** is a method of data processing that involves continuously processing and analyzing data as it is generated or received in real-time. It enables the handling and analysis of data in motion, allowing for immediate insights and actions based on the streaming data. Here are some resources to refer to,
- [Introduction to Apache Kafka Streams](https://kafka.apache.org/documentation/streams/)
- [Apache Flink Documentation](https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/try-flink/datastream/)
- [Stream Processing Quiz](https://chauff.github.io/documents/bdp-quiz/streaming.html)
### Data Pipelines and Integration
**Data pipelines and integration** are critical components of data engineering that involve the movement, transformation, and integration of data from various sources to a destination for further processing, analysis, or storage. They ensure that data flows seamlessly and reliably across different systems, enabling efficient data management and utilization. Refer these resources for reference.
- [Building Data Engineering Pipelines in Python](https://app.datacamp.com/learn/courses/building-data-engineering-pipelines-in-python)
- ["What is Data Integration?" by talend](https://www.talend.com/resources/what-is-data-integration/)
- [Data Cleaning Challenge: Handling missing values](https://www.kaggle.com/code/rtatman/data-cleaning-challenge-handling-missing-values/notebook)
---
Data engineering requires knowledge of programming languages (such as Python, Java, or Scala), database systems, big data technologies, cloud platforms, data modeling, and data warehousing concepts. Data engineers also need to keep up with the evolving landscape of data technologies and best practices to ensure efficient and effective data management.