“Special ISP AI-Forum: Making Multi-Omics Data ML-Ready”
Tuesday, April 27 | 3 – 4 p.m.
Dr. Abhishek Jha, CEO and Co-Founder, Elucidata
Bio: Dr. Abhishek Jha was an early member of the platform team at Agios Pharmaceuticals and supported multiple drug discovery programs, two of which have been approved by the FDA. This was an incredibly humbling and rewarding journey. Among other things, it inspired him to start on a journey to be an entrepreneur. His worldview about the enduring impact that science and technology have on the lives of patients and their families in the most intimate way was shaped by his days at Agios. He believes that the biotechnology and pharmaceutical industry of the future will be dramatically different, primarily defined by their approach to large complex data that a scientist has at her disposal. As a founder of Elucidata, he is committed to building a biotech company for the future that will transform drug discovery by integrating different forms of biomedical data. Previously, he received his academic training from UChicago and MIT.
“Automated Question Generation to Support Parents During Dialogic Reading”
Friday, April 23| 1 – 1:30 p.m.
Arun Balajiee and Lekshmi Narayanan, Graduate Students, Intelligent Systems Program
Abstract: Dialogic reading is a shared reading activity in which parents support children's reading comprehension by engaging in conversation about the text. Parents can ask different types of questions (Concrete, Abstract, or Relational) as conversation prompts for the child to engage in dialogue during reading. However, parents may not always ask the right questions during dialogic reading. They can be supported by a system that suggests sample questions. In our work, we present a model to generate questions for dialogic reading on EMBRACE, an embodied reading comprehension program for dual-language learners, using ITS (Intelligent Tutoring System). Our work is inspired from a state-of-art template-based system for question generation with rules to extract words from source text and EMBRACE ITS metadata. Our model annotates questions for dialogic reading using dependency parsing and implements question templates with Part-of-Speech (POS) tags to fill words and generate questions for dialogic reading from source text on individual pages of stories in EMBRACE. With our work, we show that our model generates sufficient and an approximately equal number of Concrete, Abstract and Relational questions for every page on EMBRACE and annotates with the same efficiency as in human crowd-sourced annotation of questions implying a cost-efficient automated support system for parents during dialogic reading.
“Machine Learning Predictions for Neurologically Injured Patients in the Pediatric Intensive Care Unit: Promises and Challenges”
Friday, April 23 | 12:30 – 1 p. m.
Neil Munjal, Graduate Student, Intelligent Systems Program
Abstract: Acute neurological injury is the most common cause of death in critically ill children admitted to the pediatric intensive care unit (PICU). Improvements in the quality of care delivered in PICUs has led to low mortality and shifted attention to morbidity outcomes including survival without new neurological morbidity and long-term neurodevelopment. Unfortunately, there remains a paucity of data describing this population. We describe our initial use of Machine Learning models to exploit a general PICU dataset to study this population of interest. With this approach comes many new challenges that must be appropriately handled in the study of Electronic Health Record (EHR) datasets. Along with predictive modeling, another domain with little evidence in the PICU realm is in identifying definitive causality. The Randomized Controlled Trial (RCT) is the gold-standard method to determine causal relationships and yet has been so challenging to perform successfully for numerous reasons. We will show an initial proof-of-concept exploration on the use of causal discovery to "re-demonstrate" a known causal association, that of hyperthermia causing secondary injury in brain-injured patients. By reproducing this relationship using strictly observational data, we hope to demonstrate the hopes and significant challenges of this underexplored technique.
“Biomarker Discovery for Early Detection of Beryllium Exposure Diseases”
Friday, April 16 | 1 – 1:30 p.m.
Tushar Kansal, Graduate Student, Intelligent Systems Program
Abstract: Beryllium is an essential metal used in various industries such as electronics, aerospace, and mineral extraction. This metal is extremely useful as it is lightweight, with good electrical and thermal conductivity along with non-magnetism and other properties. Despite the many advantages, cumulatively higher levels of exposure to Beryllium can cause cancer in humans. Even lower exposure levels can result in diseases such as Beryllium Sensitization (BeS) and Chronic Beryllium Disease (CBD), with BeS being a precursor for CBD. Early detection of BeS and CBD are crucial since symptoms often present themselves at a late stage, and CBD is currently incurable. Therefore, in this collaborative research, we apply machine learning to discover biomarkers from proteomic data on cohorts with varying levels of environmental and occupational exposures to Beryllium. In our preliminary analysis, two computational experiments were performed on a dataset containing proteomic markers from blood samples of 64 human subjects. In the first experiment, we learned classifiers for predicting non-exposed subjects from those who were exposed to some level of Beryllium, regardless of outcome with respect to disease development. This resulted in approximately 85% accuracy over 5-fold cross-validation using various classification algorithms. The second experiment was conducted to classify the human subjects in the exposed groups based on outcome of Normal (no disease), BeS and CBD. This experiment resulted in 87% accuracy over fivefold cross-validation using logistic regression as the best preliminary classifier. The study also identified specific proteins that were consistently discovered between the two experiments and played a role in predicting the disease. The classification results and the proteins identified herein are promising for further analysis and toward the development of screening methods that will aid in the early detection of Beryllium exposure diseases.
“Explaining Natural Product Drug Interactions with Biomedical Knowledge Graphs”
Friday, April 16 | 12:30 – 1 p.m.
Sanya Taneja, Graduate Student, Intelligent Systems Program
Abstract: Use of natural products such as green tea has increased in the US in the past few decades. While these products are not intended to replace conventional medicines, concomitant intake with prescription medicines is common. Natural products, however, can interact with conventional medicines and lead to adverse events in certain cases. While there is a plethora of research on drug-drug interactions for conventional drugs, similar attention is not given to natural product-drug interactions (NPDIs). With the growing use of natural products, it is important to understand the mechanisms underlying their interactions with other chemical substances as well as address safety concerns related to the use of natural products to prevent adverse interactions. Biomedical knowledge graphs can be effectively utilized in this case to investigate biological mechanisms using existing curated knowledge. This talk focuses on developing a heterogeneous knowledge graph with biomedical data sources and machine reading to find mechanistic explanations of NPDIs. It further explores the idea of using graph representation learning or embeddings for knowledge graph completion and inference. We show the utility of knowledge graphs for this task with case studies of natural products and share preliminary results and future directions.
“Characterizing the Hidden Layer Representation Impact of FGSM Adversarial Attacks”
Friday, April 9 | 1 – 1:30 p.m.
Daniel Steinberg, Graduate Student, ISP
Abstract: Neural network research has progressed for many decades. Recent advancements in computing hardware coupled with a proliferation of data has led to widespread adoption of deep learning for solving an assortment of machine learning problems. Along with its success, it has been shown that deep learning models can be tricked by adversarial examples, input instances that are carefully crafted to appear unaltered and cause incorrect model output.
Much of the existing work on adversarial examples seeks to develop new attack methods and/or defense techniques. Our work is motivated by trying to understand how adversarial attacks—generated with the fast gradient sign method (FGSM) in particular—effect change on the hidden layer representations of their target networks. A further understanding of adversarial examples could provide insights on the inner workings of neural networks, including how their functioning for computer vision differs from human visual processing. We explore the layer-by-layer consequence of adversarial perturbations and characterize the layers by how their representations reveal perturbed inputs. We show how combining representations from multiple models impacts the performance of discriminative models trained to identify adversarial perturbations.
“Score-Based Causal Discovery”
Friday, April 9 | 12:30 – 1 p.m.
Bryan Andrews, Graduate Student, ISP
Abstract: With the current abundance and variety of passively collected data, methods for discovering cause and effect relationships—from non-experimental data—have the potential to make a tremendous impact. In this talk, we will explore this task from the perspective of graph-based independence models, better known as graphical Markov models; a Bayesian network is a well-known example. We express causal discovery in the familiar framework of model selection with a score / objective function. In recent work, we developed a method to apply this approach to models capable of representing latent confounding. To illustrate the application of this causal discovery method on real data, we will discuss the results of applying it to an environmental and clinical data set in order to investigate cause and effect relationships between air pollutants and heart / lung disease.
“Two Brains are Better than One: User Control in Adaptive Information Access”
Friday, March 26 | 12:30 – 1:30 p.m.
Peter Brusilovsky, Professor, School of Computing and Information
Abstract: In the recent years, the use of Artificial Intelligence (AI) technologies expanded to many areas where they directly affect lives of many people. AI-based approaches advise human decision makers who should be released on bail, whether it is a good time to discharge a patient from hospital and whether a specific student is at risk to fail a course. Such an extensive use in AI in decision making came with a range of protentional problems that have been extensively studie over the last few years. A recognition of these problems motivated a rapid rise of research on “human-centered AI”, which attempted to address and minimize negative effects of using AI technologies. Among the ideas of human-centered AI is user control - engaging users in affecting AI decision making to prevent possible errors and biases. In my talk I will focus on the application of user control in one popular area of AI application, adaptive information access. Adaptive information access systems such as personalized search and recommender systems attempt to model their users to help them in finding most relevant information. Yet, user modeling and personalization mechanisms might not always work as expected resulting in errors, biases, and suboptimal behavior. Combining the decision power or AI with the ability of the user to guide and control it combines strong side of artificial and humans intelligence and could lead to a better results. In my talk, I review several projects focused on user control in adaptive information access systems and discuss benefits and challenges of this approach.
“Multi-Task Learning to Incorporate Clinical Knowledge into Deep Learning for Breast Cancer Diagnosis”
Friday, March 12 | 1 – 1:30 p.m.
Giacomo Nebbia, Graduate Student, ISP
Abstract: Deep learning models are traditionally trained purely in a data-driven approach; the information for the model training usually only comes from a single source of the training data. In this work, we investigate how to supply additional clinical knowledge that is associated with the training data. Our goal is to train deep learning models for breast cancer diagnosis using mammogram images. Along with the main classification task between clinically proven cancer vs negative/benign cases, we design two auxiliary tasks each capturing a form of additional knowledge to facilitate the main task. Specifically, one auxiliary task is to classify images according to the radiologist-made BI-RADS diagnosis scores and the other auxiliary task is to classify images in terms of the BI-RADS breast density categories. We customize a Multi-Task Learning model to jointly perform the three tasks (main task and two auxiliary tasks). We test four deep learning architectures: CBR–Tiny, ResNet18, GoogleNet, and DenseNet and we investigate the benefit of incorporating such knowledge over ImageNet pre-trained models and in the case of randomly initialized models. We run experiments on an internal dataset consisting of screening full field digital mammography images for a total of 1,380 images (341 cancer and 1,039 negative or benign). Our results show that, by adding clinical knowledge conveyed through the two auxiliary tasks to the training process, we can improve the performance of the target task of breast cancer diagnosis, thus highlighting the benefit of incorporating clinical knowledge into data-driven learning to enhance deep learning model training.
“Box-Adapt: Domain-Adaptive Medical Image Segmentation Using Bounding Box Supervision”
Friday, March 12 | 12:30 – 1 p.m.
Yanwu Xu, Graduate Student, ISP
Abstract: Deep learning has achieved remarkable success in medicalimage segmentation, but it usually requires a large number of imageslabeled with fine-grained segmentation masks, and the annotation ofthese masks can be very expensive and time-consuming. Therefore, recentmethods try to use unsupervised domain adaptation (UDA) methods toborrow information from labeled data from other datasets (source do-mains) to a new dataset (target domain). However, due to the absenceof labels in the target domain, the performance of UDA methods is muchworse than that of the fully supervised method. In this paper, we pro-pose a weakly supervised domain adaptation setting, in which we canpartially label new datasets with bounding boxes, which are easier andcheaper to obtain than segmentation masks. Accordingly, we proposea new weakly-supervised domain adaptation method called Box-Adapt,which fully explores the fine-grained segmentation mask in the source do-main and the weak bounding box in the target domain. Our Box-Adaptis a two-stage method that first performs joint training on the source andtarget domains, and then conducts self-training with the pseudo-labelsof the target domain. We demonstrate the effectiveness of our method inthe liver segmentation task.
“Predictive Cell-Specific Gene Regulatory Models”
Friday, February 19 | 12:30 – 1 p.m.
Hatice Ulku Osmanbeyoglu, Assistant Professor, Department of Biomedical Informatics
Abstract: The development and function of specialized cell types are dependent on the interplay between complex signaling and transcriptional programs. We present SPaRTAN (Single-cell Proteomic and RNA based Transcription factor Activity Network), a computational method to link surface proteins to transcription factors (TFs) by exploiting cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) datasets with cis-regulatory information. SPaRTAN is applied to peripheral blood mononuclear cells (PBMCs) and tumor tissue datasets to demonstrate its utility in predicting cell-specific TF activity and their coupling to signaling pathways. To validate SPaRTAN-derived predictions, we perform flow cytometry analyses in peripheral blood from healthy donors and conﬁrm the context-specific differential activity of TFs associated with surface proteins. SPaRTAN provides critical biological insights into the signal-regulated TFs that underlie key developmental or differentiation transitions and activation states of cells (e.g. within the immune system).
Bio: Hatice Ülkü Osmanbeyoğlu is an Assistant Professor of the Biomedical Informatics Department and UPMC Hillman Cancer Center at University of Pittsburgh Medical School. Her research focuses on developing data-driven computational approaches to understand disease mechanisms in order to assist in the development of personalizing anticancer treatments. Previously, she was a postdoctoral research associate at Memorial Sloan Kettering Cancer Center (MSKCC). She obtained her Ph.D. in Biomedical Informatics from University of Pittsburgh and holds a MS degree in Electrical and Computer Engineering from Carnegie Mellon University and MS in Bioengineering from University of Pittsburgh. She completed her BS in Computer Engineering from Northeastern University (Summa Cum Laude). She is a recipient of the NIH NCI Pathway to Independence Award, Memorial Sloan Kettering Postdoctoral Research Award and the Innovation in Cancer Informatics Award.
“CAASI Grief to Action Initiative: Co-Designing Platforms with BLM Community Partners”
Friday, February 5 | 1 – 1:30 p.m.
Sera Linardi, Associate Professor, Graduate School of Public and International Affairs
Abstract: Grief to Action (G2A) is an interdisciplinary working group at GSPIA that is dedicated to supporting and integrating existing efforts to fight systemic racism. Started by the Center for Analytical Approaches to Social Innovation (http://CAASI.pitt.edu) in June 2020, our nearly 100 volunteers comprise of community members and members of the academic community in Pitt and beyond (UG, Masters and PhDs, staff and faculty) use their diverse skills to co-design and build web platforms that increase access to data and bring communities together. We will illustrate our process by walking through our two projects: 1) a tool to compare police union contracts and police misconduct complaint processes across cities, and 2) a gaming platform to encourage engagement with Pittsburgh Black-owned businesses.
“Semi-Supervised Knowledge Graph Cultivation for Contract Segment Classification”
Friday, February 5 | 12:30 – 1 p.m.
Mengdi Wang, Graduate Student, ISP
Abstract: Written contracts are formally legal agreements between parties and are a fundamental framework for goods exchange or services. Contract segment classification is an important problem in legal contract analysis allowing for useful insights such as rights and obligations extraction. Most of existing work only tried BoW machine learning classifiers (e.g. SVM) to small datasets consisting of a few hundred of sentences due to the rarity of such annotated data. In this paper however, we propose an end-to-end semi-supervised generative model, by leveraging a legal knowledge graph and unlabeled data. The proposed model can classify the contract clauses, while activating a sub-graph on the knowledge graph and expand the knowledge graph simultaneously. Experiment results on a real-world contract dataset show that our proposed model achieves improvement compared to state-of-the-art baselines.
“Clinician Focused Machine Learning”
Friday, January 22 | 1 – 1:30 p.m.
Harry Hochheiser, Associate Professor, Intelligent Systems Program and Department of Biomedical Informatics
Abstract: Despite the huge interest in using machine learning techniques to improve health care, many models and systems succeed well in early stages, yet eventually fail when put into practice. These unsuccessful efforts demonstrate the importance of centering the goals and workflow of the clinicians who must weigh computerized predictions and classifications alongside their own training and experience. Our efforts combine the exploration of novel problems in machine learning with qualitative inquiry into clinician information needs, preference, and workflow, to inform the design of novel tools and information presentation approaches aimed at successfully building machine learning into clinical practice. Example projects discussing predictions of patient outcomes, adaptive electronic medical records capable of highlighting high-value information, and identification of fine-grained disease subphenotyping will be used to illustrate our approach.
“Neurobiology of Language”
Friday, January 22 | 12:30 – 1 p.m.
Steven Small, Dean and Margareta Moller Distinguished Professor, School of Behavioral and Brain Sciences, UT Dallas
Abstract: The advent of brain imaging has for the first time permitted the physiological investigation of human language. We define “neurobiology of language” as the biological implementation and linking relations for representations and processes necessary and sufficient for production and understanding of speech and language in context. Biological disciplines that are highly relevant to the neurobiology of language include the anatomy and physiology of the human brain, the network connectivity of the brain, and the multiple roles of different brain areas. Importantly, the neurobiology of language is defined as a subfield of neuroscience and sharing in its primary assumptions, methods, and questions. By way of explanation, whereas psychology is the scientific study of the human mind and its functions, especially those affecting behavior in particular contexts, and linguistics is the scientific study of language and its structure, including the study of morphology, syntax, phonetics, and semantics, neurobiology is the study of the biology of the nervous system. In this talk, I will review some perspectives on this new paradigm.