Masum Rumi

2020 - Present

Data Scientist/Lead Data Engineer

Developed and implemented statistical analysis, data mining, and other aspects of data analytics to collect, explore, and extract insights from structured and unstructured data using Spark and Python.Developed and implemented statistical analysis, data mining, and other aspects of data analytics to collect, explore, and extract insights from structured and unstructured data using SQL, Spark and Python.
Automate manual reports using SQL, business intelligence tools, and Python, improving the completion time of reports.
Built data warehousing solution with custom ETL scripts in Python to extract and house data from multiple sources.
Developed and maintained tools and pipelines for automating big data processes using SQL.
Constructed end-to-end custom pipelines using object-oriented programming in Python in order to inherit and implement Scikit-Learn API functionality.

2018 - 2020

Data Scientist

Developed and implemented predictive modeling, statistical analysis, machine learning, data mining and other aspects of data analytics to collect, explore and extract insights from structured and unstructured data using Spark and Python.
Constructed machine learning algorithms such as Regression, Classification, Cluster Analysis, Segmentation, Time Series and A/B testing to improve the performance of overall models without compromising model quality.
Built data warehousing solution with custom ETL scripts in Python to extract and house data from multiple sources.
Developed and maintained tools and pipelines for automating big data process using PySpark and SQL.

2017 - 2018

Data Science Instructor

Taught fundamentals of data science, probability and statistics, and python programming concepts to new cohorts.
Updated and modified existing lessons to help students better understand data science and machine learning topics.
Automated dashboards that log data, evaluate key metrics, accelerates created reports and dashboards using Tableau.
Developed and maintained tools and pipelines for automating big data process using PySpark and SQL.
Developed and maintained tools and pipelines for automating big data process using PySpark and SQL.

2017

Data Science Immersive Fellow

Learned and implimented the fundamental of Data Science and Machine Learning using Python.
Picked up advanced concepts like deep learning, neural networks and image recognition.
Automated dashboards that log data, evaluate key metrics, accelerates created reports and dashboards using Tableau.
Developed and maintained tools and pipelines for automating data process using mysql and pandas.

2015 - 2017

Data Analyst

Participated in all parts of ETL development and building data pipelines such as data mining, data collection, data cleaning, developing models, validation, visualization, performed gap analysis and data identification.
Translated the business requirements and performed ETL into the applications.
Explored and manipulated datasets from SQL Server and Hadoop to build cases for product development.
Created interactive reports and visualization for technical documentation using advanced techniques in Tableau.
Mapped cross-functionally reporting requirements, monitor data trends and performed statistical analysis with Pandas.

My Projects

Titanic: Survival Prediction

This project is for all aspiring data scientists to learn the fundamentals of statistical analysis and classification models.

We will have a detailed statistical analysis of Titanic data set along with Machine learning models. I am super excited to share my first kernel with the Kaggle community, and I think my journey of data science can leap from this community. As I go on in this journey and learn new topics, I will incorporate them with updates…

House Pricing: Advanced Regression Techniques

This project uses detailed statistical analysis and machine learning algorithms to predict house pricing in Boston dataset.

This kernel is the “regression siblings” of my other Classification kernel. As the name suggests, this kernel goes on a detailed analysis journey of most of the regression algorithms. In addition to that, this kernel uses many charts and images to make things easier for readers to understand.

Web Scraping: iMDB Movie Ranking Prediction

This project uses python's Beautifulsoup library to web-scrape iMDB website and predicts movie ranking using machine learning algorithms

This project uses python beautifulsoup library to web scrape iMDB website. The idea of this project was to collect as much info about each movie and then use machine learning algorithms to predict their ranking. Whether you are a movie nerd or not, you will definitely find some interesting gems in there. I hope you enjoy reading it.

DonorsChoose: A Visual Data Analysis

This project uses pandas and plotly to do an extensive visual data analysis for DonorsChoose

DonorsChoose was founded in 2000 by a Bronx history teacher, DonorsChoose.org has raised $685 million for America’s classrooms. Teachers at three-quarters of all the public schools in the U.S. have come to DonorsChoose.org to request what their students need, making DonorsChoose.org the leading platform for supporting public education.

Kiva: Loans that Change Lives

This project uses pandas and plotly to do an extensive visual data analysis for Kiva

This project is for all aspiring data scientists to learn from and for the pros to review their knowledge. We will have a detailed statistical analysis of Titanic data set along with Machine learning models. I am super excited to share my first kernel with the Kaggle community, and I think my journey of data science can leap from this community. As I go on in this journey and learn new topics, I will incorporate them with updates…

Data Scientist | Big Data Engineer

Experience

Data Scientist/Lead Data Engineer

Data Scientist

Data Science Instructor

Data Science Immersive Fellow

Data Analyst

Education

M.A. in Computer Science (2020-)

Data Science Immersive(2017)