The DataMine Research Group

Deriving actionable insights from real-world data


Our research focuses on understanding machine learning models and real-world data through quantitative metrics and analyses. We work on interdisciplinary and crossdisciplinary research projects in biomedical, life, health and social sciences and Computer Science education. Below are ongoing projects in our research group.

Data Valuation in Machine Learning

Data-driven modeling and machine learning are rapidly changing the process of scientific discovery and development of solutions to real-world problems. We are interested in understanding and quantifying the value of the real-world data used in training of such models. We are applying data valuation techniques to improve virtual screening of ligands and acquire better fitness tracking data from IoT cycling devices. Funding: 2020 WFU Pilot Research Grant.

Skills needed: Python programming, machine learning, game theory, high performance computing, computer vision.

Retrieval and Extraction of Pediatric Information

To better understand the safety and efficacy of approved drugs in pediatric populations, we are building a PediatricDB portal powered by the state-of-the art natural language processing and machine learning. We are developing text classifiers and hybrid topic modeling approaches to accurately screen millions of unstructured biomedical texts and extract drug-patient relations.

Skills needed: web development, Javascript and Python programming, natural language processing

CRISPR Genome Editing

We are developing recurrent neural networks to predict targets for CRISPR-Cas9 genome editing in mammalian genomes. Additionally, we are building a web-based application powered by deep learning, to predict CRISPR arrays in newly sequenced bacterial and archaeal genomes.

Skills needed: bioinformatics, web development, Javascript and Python programming, deep learning, cloud computing

Context-aware Deep Learning

Single-cell RNA-sequencing is increasingly used in biomedical domains. Computational analyses of these high-dimensional, zero-inflated datasets focus mostly on the gene expression. We are developing a software package, NLPSeq, which allows for the analyses of gene expression data along with the clinical annotations. Funding: 2019 WFU Biomedical Informatics Pilot Research Grant.

Skills needed: bioinformatics, biomedical informatics, deep learning, high performance computing, cloud computing

Educational Data Mining

Meta-research is the process of organizing, producing and communicating scientific research. Its overall aim is to contribute to the scientific ecosystem by identifying gaps in knowledge as well as in transparency, rigor and reproducibility. We are particularly interested in understanding the trends and interconnections between CS education research and scholarly works at a large scale. We are developing open source computational approaches to studying the education literature at a large scale. Funding: 2019 NCWIT Academic Alliance Seed Award; 2020 WFU Leadership and Character Course Development Grant.

Skills needed: natural language processing, machine learning, high performance computing


Current Group Members

Natalia Khuri, Principal Investigator

link to personal page

Nathan Whitener, Undergraduate Researcher

Nathan is developing computational approaches for the discovery of novel biomarkers of cancer immunotherapies.

Ria Xia, URECA-X Undergraduate Researcher

Ria is developing methods for the analysis of citation trends in neurosurgical publications.

Shelton Zhang, Undergraduate Researcher

Shelton is interested in applying data-valuation techniques in data mining and artificial intelligence.

Past Members

Han Bao, Undergraduate Researcher (2019-2020)
Sapan Bhandari, MSCS Thesis and Summer Research Assistanship (2020-2021)
Andrew Greene, Early-College Undergraduate Research and URECA Undergraduate Scholar (2019-2020)
Andrew Knox, Undergraduate Researcher (Summer 2020)
Tianen Liu, CS Honor's Project (2019-2020)
Caitlyn Marsac, URECA Undergraduate Scholar (Summer 2020)
Joshua Mannion, MSCS Thesis (2020-2021)
Esteban Murillo Burford, MSCS Thesis (2019-2020)
Sarah Parsons, Staff Research Scientist (2020-2021)
Jackson Shapiro, CS Honor's Project (2019-2020)
Xiaochen Wang, CS Honor's Project (Fall 2020)
Reyna Wu, CS Honor's Project and Undergraduate Researcher (2020-2021)
Jasmine Xu, CS Honor's Project (Spring 2021)
Tian Yun, CS Honor's Project and Undergraduate Researcher (2019-2020)

Get in touch, if you are interested in collaborations, have project ideas, or want to discuss our research.