Level 1
<aside>
🛠 Data Engineering
</aside>
- [ ] Small datasets
- [ ] Tabular data
- [ ] Simple data sources
(CSV, Excel, JSON)
- [ ] Simple preprocessing steps
<aside>
📊 Data Analysis
</aside>
- [ ] Data Visualization Libraries
- [ ] Statistics and Probability Fundamentals
- [ ] Basic Exploratory Data Analysis
<aside>
🤖 Machine Learning
</aside>
- [ ] Simple Regression
- [ ] Simple Classification
- [ ] Evaluation Metrics
<aside>
👩💻 Software Engineering
</aside>
- [ ] Jupyter Notebooks
- [ ] Code versioning (git)
- [ ] Terminal navigation
- [ ] Virtual environments & dependencies
Level 2
<aside>
🛠 Data Engineering
</aside>
- [ ] Medium size datasets
- [ ] Time-Series data
- [ ] Text data (NLP)
- [ ] Relational Databases
(SQL)
- [ ] Preprocessing Pipelines
<aside>
📊 Data Analysis
</aside>
- [ ] Chart Design Principles
- [ ] Advanced Exploratory Data Analysis
- [ ] Data Storytelling
<aside>
🤖 Machine Learning
</aside>
- [ ] Advanced Regression
- [ ] Advanced Classification
- [ ] Time-Series specific models
- [ ] Text specific (NLP) models
- [ ] Clustering
<aside>
👩💻 Software Engineering
</aside>
- [ ] Advanced IDEs
(PyCharm, VSCode, etc.)
- [ ] Unit Testing
- [ ] CI/CD Pipelines
Level 3
<aside>
🛠 Data Engineering
</aside>
- [ ] Large datasets
- [ ] Image data
- [ ] Audio data
- [ ] Imbalanced datasets
- [ ] Collect / generate own data
(Web Scraping, APIs)
- [ ] Non-Relational Databases
(No-SQL)
<aside>
📊 Data Analysis
</aside>
- [ ] Dashboard Design Principles
- [ ] BI-Tools
(Tableau, PowerBI, etc.)
<aside>
🤖 Machine Learning
</aside>
- [ ] AutoMLs
- [ ] Model tuning
- [ ] Feature Importance
- [ ] Experiment tracking
- [ ] Neural networks
<aside>
👩💻 Software Engineering
</aside>
- [ ] Cloud provider environment
(AWS, GCP, Azure)
- [ ] Model Deployment (Docker)
- [ ] Model lifecycle management
(MLFlow, Metaflow, etc.)