Jeremy Joseph Yang, Ph.D.
Senior Research Scientist
UNM SoM, DoIM, Translational Informatics Division
Research scientist in the
Translational Informatics Division
UNM School of Medicine
Department of Internal Medicine
, focused on biomolecular and biomedical data science, and computational and informatics methodology. Projects include
Illuminating the Druggable Genome (IDG)
Common Fund Data Ecosystem
. Past projects include
, and screening informatics support for the
. See our
Public Web Apps
, several of which I develop and maintain.
Ph.D., Informatics (Dissertation:
"Evidence evaluation in biomedical knowledge graphs for pharmaceutical discovery"
, 2022), advised by
Prof. David Wild
, who leads the
Integrative Data Science Lab (IDSL)
Crisis Technologies Innovation Lab
Indiana University School of Informatics, Computing and Engineering
. Current project:
. Also contributor to
, IU spin-off company founded by
Knowledge graph analytics platform with LINCS and IDG for Parkinson's disease target illumination
, Yang et al., BMC Bioinformatics, 2022.
Getting Started with the IDG KMC Datasets and Tools
, Kropiwnicki et al., Current Protocols, 2022.
TIGA: Target illumination GWAS analytics
, Yang et al., Bioinformatics, 2021.
A machine learning platform to estimate anti-SARS-CoV-2 activities
, Govinda et al., Nature Machine Intelligence, 2021.
Analyzing knowledge entities about COVID-19 using entitymetrics
, Yu et al., Scientometrics volume 126, 2021.
TCRD and Pharos 2021: mining the human proteome for disease biology
, T. Sheils et al., Nucleic Acids Research, 2020.
DrugCentral 2021 supports drug discovery and repositioning
, S. Avram et al., Nucleic Acids Research, 2020.
How to Illuminate the Druggable Genome Using Pharos
, T. Sheils et al, Curr Prot Bioinfo, 2020.
edge2vec: Representation learning using edge semantics for biomedical knowledge discovery
, Zheng Gao et al, BMC Bioinformatics, 2019.
DrugCentral 2018: an update
, O. Ursu et al., Nucleic Acids Research, 2018.
Unexplored therapeutic opportunities in the human genome
, T.I. Oprea et al., Nature Reviews Drug Disc, 2018.
"PIBAS FedSPARQL: a web-based platform for integration and exploration of bioinformatics datasets"
, Djokic-Petrovic et al., J Biomed Semantics, 2017.
Formalizing drug indications on the road to therapeutic intent
, SJ Nelson et al., J Am Med Inform Assoc, 2017.
TIN-X: Target Importance and Novelty Explorer
, D.C. Cannon et al., Bioinformatics, 2017.
Badapple: promiscuity patterns from noisy evidence
, J.J. Yang et al., J. Cheminfo., 2016.
Novel Phenotypic Outcomes Identified for a Public Collection of Approved Drugs from a Publicly Accessible Panel of Assays
, J. Lee, et al, PLoS ONE, 2015.
BioAssay Research Database (BARD): chemical biology and probe-development enabled by structured metadata and result types
, E.A. Howe et al., Nucleic Acids Res, 2015.
The CARLSBAD Database: A Confederated Database of Chemical Bioactivities
, S.L. Mathias et al., Database, 2013.
Associating Drugs, Targets and Clinical Outcomes into an Integrated Network Affords a New Platform for Computer-Aided Drug Repurposing
; T.I. Oprea et al., J. Mol. Info., 2011.
Analysis and hit filtering of a very large library of compounds screened against Mycobacterium tuberculosis
, S. Ekins et al., Mol. BioSyst., 2010.
Knowledge graph analytics platform combining LINCS and IDG for drug target illumination
, presented at the ISMB-ECCB Bioinformatics Open Source Conference (BOSC2021), July 25-30, 2021.
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
UNM Tech Days
, June 4, 2020.
Illuminating the Druggable Genome with Knowledge Engineering and Machine Learning
14th Annual NMBIST Symposium
, March 14-15, 2019.
Bibliological data science and drug discovery
, ACS National Meeting in Philadelphia, Aug 21, 2016.
The Language Diversity of Computing
, UNM Biomedical Info Seminar Series, Oct 15, 2015.
Molecular scaffolds are special and useful guides for discovery
, ACS National Meeting, Sept. 8, 2013, Indianapolis, IN.
The BADAPPLE promiscuity plugin for BARD: Evidence-based promiscuity scores
, ACS National Meeting, Sept. 9, 2013, Indianapolis, IN.
How am I supposed to organize a protein database when I can't even organize my address book?
, CINF Flash session, ACS National Meeting, March 25, 2012, San Diego, CA.
Cheminformatics Software Development Case Studies
, guest lecture given via webcast to SOIC I571 "Chemical Information Technology" class on Oct 24, 2011.
Applications in Biocomputing
, UNM Cyberinfrastructure Day, April 22, 2010.
TIGA: Target Illumination GWAS Analytics
, IDG Annual Meeting, Feb 9-11, 2021.
Mining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
, IDG F2F Meeting, Arlington, VA, February 2020.
TIN-X v2: modernized architecture with REST API for sustainability & interoperability
, IDG Annual Meeting, Arlington, VA, March 2019.
Exfiles: Sex-Specific Gene Expression Profiles Explorer
, 2018, scientific use case for NIH Data Commons Pilot Project Consortium (DCPPC).
Badapple: promiscuity patterns from noisy evidence
(poster for UNM Staff Research Expo, Jan 27,2017.)
Open Phenotypic Drug Discovery Resource
, Open PHACTS: Linking life science data, Feb 18-19, 2016, Vienna, Austria.
Development of a Screening Informatics System at the UNM Center for Molecular Discovery
, ACS National Meeting, March 26, 2012, San Diego, CA.
CARLSBAD: Confederated Annotated Research Libraries of Small-molecule Biological Activity Data
, OpenEye CUP meeting, Santa Fe, NM, 2012.
UNM Division of Biocomputing public web applications: Computational tools for cheminformatics and molecular discovery
, ChemAxon US User Group Meeting, Boston, September 13-15, 2010.
RMSD: routine measure stirs doubts
, 230th National ACS meeting, Washington DC, 2005.
Canonicalized systematic nomenclature in cheminformatics
, 229th National ACS Meeting, San Diego, 2005.
Independent Study in Biomedical Data Science
Project templates available in bioinformatics, cheminformatics, systems biology and other areas.
Offered beginning Fall 2021 as BIOM505 (Biomedical Sciences Special Topics: "Ind Study Biomed Data Sci") For more information:
ISBDS Course Home Page
UNM:BIOMED_505, "Introduction to Biocomputing". Bioinformatics, cheminformatics, computational structure and ligand based drug discovery informatics via online community resources and methods, This online graduate course was introuced by Prof. Tudor Oprea in 2007 and offered through 2019. I served as course coordinator 2008-2019, with responsibilities including student advisement and instruction, curricular maintenance and updates, and software support.
IU:INFO_I-590, "Topics in Informatics: Data Science for Drug Discovery, Health and Translational Medicine"
: This innovative data science course was introduced by Prof. Wild in 2013 and updated for 2017 by Prof. Joanne Luciano, assisted by JT Wolohan and myself as associate instructor.
IU:INFO_I-590, "Applied Data Science "
: The classic data science workflow is the framework for understanding how to apply data wrangling, semantics, machine learning and other skills in realistic scenarios. Developed by Prof. Joanne Luciano in 2017, assisted by JT Wolohan, Kaicheng Yang, and myself as associate instructor.
IU:INFO_I-590, "Real World Data Science ". Developed by Prof. Wild, Prof. Luciano, and industry partner Sara Bigelow, in collaboration with Lilly, with de-identified clinical trials datasets, employing analysis tools KNIME and Tableau. Associate instructor, spring 2018.
Some publicly available software:
) - Hierarchical molecular scaffold analysis.
) - Python package to access biomedical APIs.
Some data science videos
Some visualizations via RPubs