Claire Hester

Hi there! I am a Data Scientist with 10 years of experience in data analytics, process improvement, and account management. I am an investigative, data-driven problem solver and I bring an astute global awareness to data storytelling.

I recently completed a Data Science bootcamp to round out my analytics and machine learning toolkit, and had the opportunity to work with cutting edge techniques in natural language processing, computer vision, time series analysis, and more.


Projects

Face Mask Detection


The worldwide COVID-19 pandemic has led to the widespread wearing of face masks to protect ourselves and others from spreading the disease.  For my Data Science Immersive capstone project, I created an object detection model that can identify whether a person is wearing a mask correctly, incorrectly, or not at all. To solve this problem, I took a two-step approach: first, use a face detector model to locate all of the faces in an image or video. Second, build a face mask classifier using MobileNetV2.  

For face detection, I used two different models. MTCNN  had a higher degree of accuracy and was able to detect a higher number of faces.  I also used an OpenCV built-in DNN. Though this model had a lower accuracy, the frames per second was 4 times faster than the MTCNN model.  The face mask classifier was built on a MobileNetV2 base with a custom head layer. I achieved a 94% accuracy after 25 epochs, with a loss of 0.19. The "With Mask" category had the most samples for training and had a recall of 99%.
To learn more about this project, click here.

California Wildfires: Measuring the Economic Impact


Catastrophic wildfires in California have sweeping impacts: environmental costs, poor air quality, carbon emissions, homes lost, large numbers of people displaced, and suppression costs, to name a few. Our goal is to build a convincing argument for the redirection of funds to catastrophic wildfire prevention. We focused on three factors for prediction: acres burned, fire suppression costs, and structures destroyed over the next 10 years.

Using an ensemble time series model we predicted that nearly 13 million acres will burn, which will lead to 61,000 structures destroyed and fire suppression costs of almost $40 billion. As it currently stands, there are 10 million acres of forest that are considered "high risk." At a forest restoration cost of $2350 per acre, we believe that if funds were allocated to forest restoration we would see an overall savings of $16 billion in public funds over the next 10 years. To learn more about this project, click here.

Reddit Classifier using Natural Language Processing


Reddit calls itself the front page of the internet, and one very popular use of Reddit is asking questions. The goal is to build a classifier that can determine which subreddit a post is from, given the text of the post. I used the Pushshift API to scrape 60,000 posts from the Legal Advice and No Stupid Questions subreddits. Data was cleaned to address removed posts, missing values, and duplicates, as well as special characters and text features.

I used Vader sentiment analysis to explore key differences, as well as word frequency by subreddit. CountVectorizer was used to turn all words into numerical values. I tested out several different types of classification models and utilized GridSearch and pipelines to hypertune parameters. Logistic regression performed the most strongly, with a 94% test accuracy score. To learn more about this project, click here.


Experience

Data Analyst

Select Rehabilitation

Use Python and SQL to build complex client-facing reports. Built a process for extracting text from scanned PDF documents, thereby replacing a manual data entry process. Took initiative to learn laTeX document creation and taught a team workshop, unlocking the ability to automatically build and format complex PDF reports.

April 2021-present

Data Science Fellow

General Assembly

480-hour immersive program applying data collection and cleaning, analysis, modeling, data visualization, and machine/deep learning techniques to solve real-world data problems. Used Natural Language Processing to analyze and classify 60,000 Reddit posts. Completed client project forecasting the cost of California wildfires using time series analysis and Tableau. Capstone project utilizing computer vision and neural nets to build a face mask detection model.

August 2020-November 2020

Associate Planner

Marine Layer

Responsible for inventory allocation and management for 45+ stores aimed to maximize sales. Managed inventory accuracy for all retail stores and warehouses across our ERP and POS systems. Led initiative to ensure data integrity and improve system accuracy through investigation, testing, and working directly with the Director of Finance.

April 2019 - April 2020

North America Wholesale Account Manager & Corporate Sales Representative

Timbuk2 Designs

Managed 40 accounts including Zappos, Backcountry.com, and The Sports Basement. Created and executed growth and revenue strategies alongside the Head of Global Sales. Developed merchandising and marketing plans aligned with company initiatives. Streamlined and executed logistics for biannual Global Sales Meeting and seasonal trade shows. Managed development of top growth accounts through outreach, research, and custom tailored product recommendations.

April 2016 - April 2019

Education

General Assembly

Data Science Immersive
480+ hour immersive Data Science program
August-November 2020

Colorado College

Bachelor of Arts
Majored in Mathematical Economics
Class of 2012

Skills

Programming Languages & Libraries
  • Python
  • Pandas
  • LaTeX
  • Sci-Kit Learn
  • Numpy
  • Tensorflow/Keras
  • SQL
  • HTML
  • Git

Data Science & Machine Learning Skills
  • Data Collection and Cleaning
  • Computer Vision
  • Natual Language Processing
  • Tableau
  • Data Visualization
  • Neural Networks
  • Webscraping

Interests

Apart from being a data scientist, I spend most of my time outdoors. I love taking advantage of the natural beauty of Northern California whether hiking, biking, skiing, or hanging out at the beach. I have run two marathons, and I look forward to a time when it is safe to participate in races again!

I also love to travel and experience other foods and cultures. Highlights have included exploring the mountains and glaciers of Patagonia, discovering murals and artwork throughout Valparaiso, and tasting my way through the pintxo bars in San Sebastian.