<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Data |</title><link>https://dhruvsaikia.com/tags/data/</link><atom:link href="https://dhruvsaikia.com/tags/data/index.xml" rel="self" type="application/rss+xml"/><description>Data</description><generator>HugoBlox Kit (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Thu, 15 Jan 2026 00:00:00 +0000</lastBuildDate><image><url>https://dhruvsaikia.com/media/icon_hu_f7f9cb5c139bd8fc.png</url><title>Data</title><link>https://dhruvsaikia.com/tags/data/</link></image><item><title>Global Stability &amp; Risk Forecasting (GDELT)</title><link>https://dhruvsaikia.com/projects/gdelt-project/</link><pubDate>Thu, 15 Jan 2026 00:00:00 +0000</pubDate><guid>https://dhruvsaikia.com/projects/gdelt-project/</guid><description>&lt;h2 id="what-i-built"&gt;What I built&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Analyzed subsets of a 2PB dataset to identify global risk trends and support stability forecasting.&lt;/li&gt;
&lt;li&gt;Developed and tuned a Random Forest model and validated it against historical logs.&lt;/li&gt;
&lt;li&gt;Supported automated anomaly detection and data-driven recommendations.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tools--methods"&gt;Tools &amp;amp; methods&lt;/h2&gt;
&lt;p&gt;Python, SQL, Random Forest, statistical validation, anomaly detection&lt;/p&gt;
&lt;h2 id="outcome"&gt;Outcome&lt;/h2&gt;
&lt;p&gt;A forecasting + anomaly detection workflow designed to support stakeholder-ready insights from extremely large datasets.&lt;/p&gt;
&lt;hr&gt;</description></item><item><title>TransLink: Transit Insights (ETL + Dashboards)</title><link>https://dhruvsaikia.com/projects/translink-transit/</link><pubDate>Mon, 15 Dec 2025 00:00:00 +0000</pubDate><guid>https://dhruvsaikia.com/projects/translink-transit/</guid><description>&lt;h2 id="what-i-built"&gt;What I built&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Architected scalable ETL pipelines using PySpark + Spark SQL to ingest 23GB of logs into a Data Lake.&lt;/li&gt;
&lt;li&gt;Implemented transformations across bronze/silver/gold layers to ensure consistency and data quality.&lt;/li&gt;
&lt;li&gt;Built interactive Tableau and Power BI dashboards for KPI tracking and performance trends.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tools--methods"&gt;Tools &amp;amp; methods&lt;/h2&gt;
&lt;p&gt;PySpark, Spark SQL, Data Lake, Medallion Architecture, Tableau, Power BI&lt;/p&gt;
&lt;h2 id="outcome"&gt;Outcome&lt;/h2&gt;
&lt;p&gt;A clean pipeline-to-dashboard workflow that enables business-facing KPI visibility and faster decision-making.&lt;/p&gt;
&lt;hr&gt;</description></item><item><title>Steam-200k Recommender System (Implicit ALS)</title><link>https://dhruvsaikia.com/projects/steam200k-recommender/</link><pubDate>Sat, 15 Nov 2025 00:00:00 +0000</pubDate><guid>https://dhruvsaikia.com/projects/steam200k-recommender/</guid><description>&lt;h2 id="overview"&gt;Overview&lt;/h2&gt;
&lt;p&gt;This project builds a Top-N game recommendation tool using the Steam-200k dataset. It models user-game interactions and recommends games based on similarity between player profiles.&lt;/p&gt;
&lt;h2 id="what-i-built"&gt;What I built&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Implemented a recommender pipeline using implicit-feedback ALS (Alternating Least Squares).&lt;/li&gt;
&lt;li&gt;Transformed gameplay behavior into user preference signals to create user profiles.&lt;/li&gt;
&lt;li&gt;Generated Top-N recommendations by identifying players with similar profiles and surfacing games they play that the user hasn’t seen.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="approach"&gt;Approach&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Treat play behavior as implicit feedback rather than explicit ratings.&lt;/li&gt;
&lt;li&gt;Learn latent factors for users and games using ALS.&lt;/li&gt;
&lt;li&gt;Recommend games with the highest predicted relevance for each user.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="tools--tech"&gt;Tools &amp;amp; tech&lt;/h2&gt;
&lt;p&gt;Python, implicit ALS, matrix factorization, data preprocessing, evaluation/validation&lt;/p&gt;
&lt;h2 id="outcome"&gt;Outcome&lt;/h2&gt;
&lt;p&gt;A working recommendation tool that produces personalized Top-N suggestions from large-scale interaction data.&lt;/p&gt;
&lt;hr&gt;</description></item></channel></rss>