Sareh Soltani Nejad

Data Scientist / Machine Learning Researcher

BrainsCAN

About

I’m a Data Scientist and Machine Learning Specialist with over 5 years of hands-on experience developing scalable ML solutions and extracting insights from complex, high-dimensional datasets. I am passionate about creating impactful solutions, especially in the healthcare and finance sectors, and I am always keen to learn and grow in areas that merge ML and real-world applications.

I graduated from my M.Sc. in Computer Science from Western University where I developed a weakly supervised anomaly detection model for surveillance videos using a Two-Stream I3D Convolutional Network as my thesis. I completed my B.Sc. in Computer Engineering at Amirkabir University, focusing on real-time tracking systems.

My research interests include ML fundamentals, Computer Vision, Biomedical Computing, and natural lanaguage processsing. On the applied side, I’m interested in ML problems at scale, modelling, training, deployment and evaluation. I am currently looking for my next job opportunity in machine learning.

Personal email: sarehsoltani.inbox@gmail.com
Work email: ssolta7@uwo.ca

Interests

Machine Learning
Applied ML in Healthcare
Computer Vision
Biomedical Computing
Natural Language Processing (NLP)

Education

MSc in Computer Science
Western University, Canada
BSc in Computer Engineering
Amirkabir University

Experience

Data Scientist - AI Engineer

BrainsCAN

February 2024 – Present London, ON

Collaborated on the large-scale OMMABA project to explore the impact of music perception on brain functionality and develop a multimodal dataset integrating behavioral, EEG, and fMRI data from 60 participants.
Developed a robust data preprocessing pipeline to enhance data quality, consistency, and usability for downstream analysis.
Achieved 94% accuracy in ECG arrhythmia classification by developing deep learning models (1D CNNs, RNNs) and traditional ML algorithms (SVM, XGBoost), with statistical insights into model performance.
Deployed a production-ready pipeline via Dockerized FastAPI, enabling scalable, real-time arrhythmia detection.
Annotated immune cell types from 20K single-cell RNA-seq blood samples using generative variational autoencoders (VAE) and scGPT (transformer-based model), leading to improved accuracy and interpretability.
Tools: Pandas, Scikit-learn, TensorFlow, HuggingFace, scanPy, sciPy, Seaborn, Optuna, MLflow, FastAPI, Docker, AWS

Machine Learning Engineer Technical Facilitator

Vector Institute for AI

September 2023 – December 2023 Toronto, Canada

Conducted two cohorts of Anomaly Detection Bootcamp as part of the Vector ML Experts team.
Deployed tailored ML solutions to address anomaly detection use cases for 12 companies by collaborating with stakeholders.
Developed an ML-based fraud detection framework using ensemble methods (LightGBM), boosting model accuracy by 27% and reducing false positives by 15%.
Implemented a DL-based TabNet model for financial fraud detection, achieving 88% accuracy on transaction data.
Built a scalable video anomaly detection framework using multiple instance ranking, reaching an AUC of 85%
Tools: Docker, GCP, Pandas, SQL, PySpark, Scikit-learn, PyTorch, Streamlit, T-Test, Matplotlib, Tableau, Wandb, Git, Slurm

Data Scientist Intern

Vector Institute for AI

May 2023 – August 2023 Toronto, Canada

Led an Anomaly Detection Workshop for 50+ professionals, delivering hands-on training on advanced techniques.
Delivered a reference fraud detection demo using a credit card dataset, achieving 93% AUC with an AutoEncoder.
Conducted analysis for a pharma company by integrating lab data from 1,000 patients to assess their drug impact on BMI and blood pressure, applying statistical, subgroup, and outlier analyses with external data to optimize clinical trial design.
Leveraged supervised models (Random Forest, XGBoost) to estimate treatment effects, predicting a 12% reduction in BMI and a 3% reduction in blood pressure.

Machine Learning Research Engineer

Western University

September 2021 – August 2023 London, Canada

Designed a novel weakly-supervised video-anomaly detection system built on a two-stream I3D ConvNet.
Built a data pipeline to process 1TB+ of video data from UCF-Crime benchmark, leveraging Multiple Instance Learning.
Automated extraction of appearance (RGB) and motion (optical - flow) embeddings through parallel two- stream I3D encoders
Devised a late-fusion strategy that improved accuracy by 20%, achieving an 85% AUC and surpassing published baselines.
Tools: Python, PyTorch, OpenCV, Weights & Biases, Matplotlib, Git

Machine Learning Engineer

IPM & Sharif Brain Center

July 2020 – July 2021

Applied NER and topic modeling to extract structured insights from unstructured EHR doctors’ notes, streamlining patient records and reducing manual chart review time by 30%.
Used large language models like BERT for clinical treatment categorization.
Developed 3D medical imaging visualizations from CBCT data, improving diagnostic accuracy for 50 patients.
Tools: Python, Pandas, LLM, HuggingFace, BERT, PyTorch, NLTK, spaCy

Projects

Example Project

An example of using the in-built project page.

Immune Cell Type Annotation with scGPT

Built an end-to-end deep learning pipeline to classify immune cell types from scRNA-seq blood samples during respiratory infection. Applied scGPT transformers for feature embedding and accurate cell-type annotation, achieving high-resolution mapping of immune subpopulations.

Recent Publications

DRL-GAN: A Hybrid Approach for Binary and Multiclass Network Intrusion Detection

Our increasingly connected world continues to face an ever-growing

Caroline Strickland, Muhammad ZakarORCID, Chandrika Saha, Sareh Soltani Nejad, Noshin Tasnim, Daniel J. LizotteORCID, Anwar Haque

Weakly-Supervised Anomaly Detection in Surveillance Videos Based on Two-Stream I3D Convolution Network

The widespread implementation of urban surveillance systems has necessitated more sophisticated techniques for anomaly detection to …

Sareh Soltani Nejad, Anwar Haque

Weakly-Supervised Anomaly Detection in Surveillance Videos Based on Two-Stream I3D Convolution Network