Ryan Clark

Data Scientist at Harvard Medical School and HHMI.

View Projects View Resume

About Me

Data Scientist with a B.S. in Computer Science and a minor in Finance from Boston College, specializing in turning large, complex datasets into actionable insights. At Harvard Medical School and HHMI, I design and deploy scalable data pipelines, build interactive dashboards, and apply statistical modeling and machine learning to high-dimensional datasets. With a strong foundation in computer science and experience working with massive datasets with billions of records, I bring a data-first approach to solving the most difficult challenges.

Ryan Clark Portrait

Projects

Large-Scale Data Engineering & Pattern Mining (FoxP3)

Large-Scale Data Engineering & Pattern Mining

First co-author: designed a leakage-safe, terabyte-scale ETL + pattern-mining pipeline (1B+ rows) to surface stable motifs and interaction rules; hardened via cross-dataset checks and FDR-controlled statistics.

Read More →
MarketMotif Visualization

MarketMotif AI

Adapted genomics-style motif mining to markets to flag volatility regimes early; time-aware CV and threshold calibration reduced false positives under a fixed alert budget for actionable risk signals.

Read More →
SignalFrame Visualization

SignalFrame Library

Built a Python toolkit for fast interval-signal analytics (BigWig/BED): vectorized merges/intersections and windowed stats turn hours-long feature jobs into minutes with predictable latency.

Read More →
Music Genre NLP Project

Genre & Sentiment Analysis of Song Lyrics

Used NLTK, scikit-learn, and DistilBERT to classify 150,000 songs by genre and analyze lyrical sentiment. Combined models to uncover patterns between genre and emotional tone.

Read More →
Leaf Health Classifier

Tomato Leaf Disease Detection with CNNs

Developed clustering and CNN-based models to classify PlantVillage tomato leaf images as healthy or diseased. Focused on minimizing false negatives to aid early disease detection.

Read More →
Dementia Detection with CNNs

fMRI Dementia Detection with Deep Learning

Developed a multi-model CNN pipeline (ResNet50, VGG16, Inception-V3, AlexNet) to classify dementia stages using fMRI scans. Achieved 93% accuracy with ResNet50 and visualized important brain regions using Grad-CAM.

Read More →

Skills & Technologies

Python
AWS
SQL
ML & AI
R
Tableau
Git
Pandas & NumPy
TensorFlow & PyTorch
Linux

Get In Touch