Data |

Global Stability & Risk Forecasting (GDELT)

Thu, 15 Jan 2026 00:00:00 +0000

What I built

Analyzed subsets of a 2PB dataset to identify global risk trends and support stability forecasting.
Developed and tuned a Random Forest model and validated it against historical logs.
Supported automated anomaly detection and data-driven recommendations.

Tools & methods

Python, SQL, Random Forest, statistical validation, anomaly detection

Outcome

A forecasting + anomaly detection workflow designed to support stakeholder-ready insights from extremely large datasets.

TransLink: Transit Insights (ETL + Dashboards)

Mon, 15 Dec 2025 00:00:00 +0000

What I built

Architected scalable ETL pipelines using PySpark + Spark SQL to ingest 23GB of logs into a Data Lake.
Implemented transformations across bronze/silver/gold layers to ensure consistency and data quality.
Built interactive Tableau and Power BI dashboards for KPI tracking and performance trends.

Tools & methods

PySpark, Spark SQL, Data Lake, Medallion Architecture, Tableau, Power BI

Outcome

A clean pipeline-to-dashboard workflow that enables business-facing KPI visibility and faster decision-making.

Steam-200k Recommender System (Implicit ALS)

Sat, 15 Nov 2025 00:00:00 +0000

Overview

This project builds a Top-N game recommendation tool using the Steam-200k dataset. It models user-game interactions and recommends games based on similarity between player profiles.

What I built

Implemented a recommender pipeline using implicit-feedback ALS (Alternating Least Squares).
Transformed gameplay behavior into user preference signals to create user profiles.
Generated Top-N recommendations by identifying players with similar profiles and surfacing games they play that the user hasn’t seen.

Approach

Treat play behavior as implicit feedback rather than explicit ratings.
Learn latent factors for users and games using ALS.
Recommend games with the highest predicted relevance for each user.

Tools & tech

Python, implicit ALS, matrix factorization, data preprocessing, evaluation/validation

Outcome

A working recommendation tool that produces personalized Top-N suggestions from large-scale interaction data.