The DataMine Research Group

Deriving actionable insights from real-world data


Our research focuses on understanding machine learning models and real-world data through quantitative metrics and analyses. We work on interdisciplinary and crossdisciplinary research projects in biomedical, life, health and social sciences and Computer Science education. Below are ongoing projects in our research group.

Data Valuation in Machine Learning

Data-driven modeling and machine learning are rapidly changing the process of scientific discovery and development of solutions to real-world problems. We are interested in understanding and quantifying the value of the real-world data used in training of such models. We are applying data valuation techniques to improve virtual screening of ligands and acquire better fitness tracking data from IoT cycling devices. Funding: 2020 WFU Pilot Research Grant.

Skills needed: Python programming, machine learning, game theory, high performance computing, computer vision.

Retrieval and Extraction of Pediatric Information

To better understand the safety and efficacy of approved drugs in pediatric populations, we are building a PediatricDB portal powered by the state-of-the art natural language processing and machine learning. We are developing text classifiers and hybrid topic modeling approaches to accurately screen millions of unstructured biomedical texts and extract drug-patient relations.

Skills needed: web development, Javascript and Python programming, natural language processing

CRISPR Genome Editing

We are developing recurrent neural networks to predict targets for CRISPR-Cas9 genome editing in mammalian genomes. Additionally, we are building a web-based application powered by deep learning, to predict CRISPR arrays in newly sequenced bacterial and archaeal genomes.

Skills needed: bioinformatics, web development, Javascript and Python programming, deep learning, cloud computing

Context-aware Deep Learning

Single-cell RNA-sequencing is increasingly used in biomedical domains. Computational analyses of these high-dimensional, zero-inflated datasets focus mostly on the gene expression. We are developing a software package, NLPSeq, which allows for the analyses of gene expression data along with the clinical annotations. Funding: 2019 WFU Biomedical Informatics Pilot Research Grant.

Skills needed: bioinformatics, biomedical informatics, deep learning, high performance computing, cloud computing

Educational Data Mining

Meta-research is the process of organizing, producing and communicating scientific research. Its overall aim is to contribute to the scientific ecosystem by identifying gaps in knowledge as well as in transparency, rigor and reproducibility. We are particularly interested in understanding the trends and interconnections between CS education research and scholarly works at a large scale. We are developing open source computational approaches to studying the education literature at a large scale. Funding: 2019 NCWIT Academic Alliance Seed Award; 2020 WFU Leadership and Character Course Development Grant.

Skills needed: natural language processing, machine learning, high performance computing


Current Group Members

Natalia Khuri, Principal Investigator

link to personal page

Sarah Parsons, Staff Research Scientist

Sarah is developing a multi-stage data valuation method for machine learning and hybrid methods for generative topic modeling.

Sapan Bhandari, Graduate Research Assistant

Sapan is developing hierarchical deep learning models for cell type prediction from heterogeneous single-cell RNA-sequencing data.

Joshua Mannion, Graduate Research Assistant

Joshua is building a data-driven predictor of target sites for CRISPR-CAS9 genome editing.

Reyna Wu, CS Honor's Research

Reyna is evaluating the utility of generative adversarial network (GAN) for dimensionality reduction of single-cell RNA-sequencing data.

Jasmine Xu, CS Honor's Research

Jasmine is studying how coalition game theory can be used to improve data acquisition from fitness tracking IoT devices.

Nathan Whitener, Undergraduate Researcher

Nathan is developing computational approaches for the discovery of novel biomarkers of cancer immunotherapies.

Past Members

Han Bao, Undergraduate Researcher (2019-2020)
Andrew Greene, Early-College Undergraduate Research and URECA Undergraduate Scholar (2019-2020)
Andrew Knox, Intern (Summer 2020)
Tianen Liu, CS Honor's Project (2019-2020)
Caitlyn Marsac, URECA Undergraduate Scholar (Summer 2020)
Esteban Murillo Burford, MSCS Thesis (2019-2020)
Jackson Shapiro, CS Honor's Project (2019-2020)
Xiaochen Wang, CS Honor's Project (Fall 2020)
Tian Yun, CS Honor's Project and Undergraduate Researcher (2019-2020)

Get in touch, if you are interested in collaborations, have project ideas, or want to discuss our research.