Alex Chung’s Portfolio

Data Science Portfolio


Alex Chung’s Portfolio

1. Malaria Detection using Convolutional Neural Network

In 2019, there were a reported 229 million cases of Malaria infection causing 400,000 malaria related deaths. Creating an automated system to help with the early and accurate detection of malaria would save countless hours and associated costs over manual classification methods. We built convolutional neural network (CNN) model to classify infected and non-infected blood cell samples from images.

Key takeaways:

  • Our final model correctly predicted infected blood cell samples with 93% accuracy on the recall score and 97% on the precision score on unseen data.
  • The model performed the task at a significantly reduced time than it would have taken for manual detection methods.

2. Predicting Term Deposit Conversion

Project Goal: Our task was to analyze data from direct marketing efforts of a European banking institution to provide data driven insights and recommendations. Our data has 40,000 observations.

Key takeaways:

  • Identified key features that have high impact on campaign success and provided recommendations.
  • Built a machine learning model that corrected predicted customers who would respond positively to the marketing campaign with 99% recall score saving the client a potential 2,000 hours call time.

Project Goal: The goal of this project is to analyze and predict what YouTube videos are trending from a dataset scraped from YouTube’s Trending Page in order to find common characteristics of trending videos and attempt to predict and reproduce results.

Our Approach:

  • Data Wrangling and EDA to get an understanding of our data
  • Feature engineering along the way
  • Visualizing our data
  • Applying NLP on video descriptions to find most viewed topics and videos
  • Cross validating our model

4. Finding Topic Clusters thru Natural Language Processing of Covid19 Tweets

2020 has been a unique year with the Coronavirus putting the world in lockdown, forcing us to adjust to “new normals” . There hasn’t been a time when people all over the world go into self-seclusion for an entire year such as we have done. Our goal is to work with twitter feeds scraped from March 30th to April 30th, 2020 and to see if we could explore some questions in this worldwide laboratory.

Our Process:

  • Prepping our data thru exploratory data analysis and data wrangling
  • Cleaning, lemmatizing and tokenizing our text
  • Vectorizing our processed text and start the natural learning process using scikit-learn and spaCy.
  • Using KMeans Clustering to discover topical clusters
  • Finally, extracting important topical keywords using Latent Dirichlet Allocation