Hi, I'm Pius Mutuma

I am an entry-level data scientist with 2 years of experience in data analysis, visualization, and reporting. Skilled in Python, R, Excel, Tableau, Power BI, SQL, and machine learning. I've completed projects like a Weekly Engagement Tracker, Sendy ETA Prediction, Car Price Predictor, and Citi Bike NYC Data Analysis & ETL. My experience includes roles at SunCulture, Fiverr - Freelance, and Kenya Revenue Authority. I hold a B.Sc. in Mathematics and Computer Science from Machakos University, with certifications in Data Analysis from DataCamp, Google, and Microsoft.

Citi Bike NYC ETL

Developed and implemented an ETL (Extract, Transform, Load) pipeline for Citi Bike NYC data, processing information from February 2021 to the present. The project involved extracting raw data through web scraping techniques, transforming it by cleaning, preprocessing, and calculating derived metrics, and finally loading it into a PostgreSQL database. This structured approach enabled efficient analysis through SQL queries, exploring ride frequency patterns, trip duration statistics, popular station rankings, and peak riding times

Sendy ETA Prediction

Developed a machine learning model using Python to predict the Estimated Time of Arrival (ETA) for orders placed on the Sendy platform, aiming to enhance customer satisfaction and optimize delivery operations. The project involved comprehensive data exploration, preprocessing, and feature engineering using multiple datasets, including order details, rider metrics, and external weather data. Utilized advanced data handling techniques to manage missing values, standardize formats, and create derived features.

Phoenix Analytics Community Survey

Following survey data collected by the group lead; I analyzed the salary trends across various roles within the job market. The main questions to answer were the distribution of monthly gross salaries across different roles, the trend of salary variance by gender across different levels of experience, and benefits by industry.

Vehicle Price Predictor

Developed a Streamlit-based web application for predicting current and future vehicle prices using a trained XGBoost model. The application allows users to input vehicle details to receive price predictions and calculate future value and usage costs by considering factors like depreciation and fuel consumption. The project involved model development using features such as engine type, mileage, and year, as well as the implementation of a user-friendly interface. Technical aspects include virtual environment setup, package management (Streamlit, pandas, scikit-learn), and deployment to a public platform.

Reach out to me