I'm advised by Prof. Dan Jurafsky. I'm a member of the Stanford NLP Group and Stanford AI Lab. I'm interested in generative models for text representation, discourse coherence, domain adaptation, relation extraction, knowledgebase construction, and mental health applications.
I've also worked with David Grangier in Google Brain and Kelvin Guu at Google AI Research as well as Alon Halevy, Prof. Chris Re and Prof. Keith Winstein on various topics in AI and Systems.
Palo Alto, CA 94306 US
my full name AT stanford.edu
Full list on Google Scholar
On the Complementarity of Data Selection and Fine Tuning for Domain Adaptation [PDF]
Dan Iter, David Grangier
The Trade-offs of Domain Adaptation for Neural Language Models [PDF]
David Grangier, Dan Iter
Focus on what matters: Applying Discourse Coherence Theory to Cross Document Coreference [PDF]
William Held, Dan Iter, Dan Jurafsky
Pretraining with Contrastive Sentence Objectives Improves Discourse Performance of Language Models [PDF]
Dan Iter, Kelvin Guu, Larry Lansing, Dan Jurafsky
Entity Attribute Relation Extraction with Attribute-Aware Embeddings [PDF]
Dan Iter, Xiao Yu, Fangtao Li
EMNLP - DeeLIO 2020
From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers [PDF]
Sadjad Fouladi, Francisco Romero, Dan Iter, and Qian Li, Shuvo Chatterjee, Christos Kozyrakis, Matei Zaharia, and Keith Winstein
USENIX ATC 2019
FrameIt: Ontology Discovery for Noisy User-Generated Text [PDF]
Dan Iter, Alon Halevy, Wang-Chiew Tan
EMNLP - W-NUT 2018
Automatic Detection of Incoherent Speech for Diagnosing Schizophrenia [PDF]
Dan Iter, Jong H. Yoon and Dan Jurafsky
NAACL HLT - CLPsych 2018
A Thunk to Remember: make -j1000 (and other jobs) on functions-as-a-service infrastructure [PDF]
Sadjad Fouladi, Dan Iter, Shuvo Chatterjee, Christos Kozyrakis, Matei Zaharia, and Keith Winstein
(Under review, 2017)
Flipper: A Systematic Approach to Debugging Training Sets [PDF]
Paroma Varma, Dan Iter, Christopher De Sa, Christopher Ré
Socratic Learning: Empowering the Generative Model [PDF]
Paroma Varma, Rose Yu, Dan Iter, Christopher De Sa, Christopher Ré
NIPS - FILM 2016
Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs [PDF]
Stefan Hadjis, Ce Zhang, Ioannis Mitliagkas, Dan Iter, Christopher Ré
CoRR - arXiv:1606.04487
Identifying Content for Planned Event Across Social Media Sites [PDF]
Hila Becker, Dan Iter, Mor Naaman, Luis Gravano
Automatic Identification and Presentation of Twitter Content for Planned Events [PDF]
Hila Becker, Feiyang Chen, Dan Iter, Mor Naaman, Luis Gravano
ICWSM - Weblogs and Social Media 2011
Schizophrenia is a mental disorder which afflicts an estimated 0.7% of adults worldwide (Saha et al., 2005). It affects many areas of mental function, often evident from incoherent speech. Diagnosing schizophrenia relies on subjective judgments resulting in disagreements even among trained clinicians. Recent studies have proposed the use of natural language processing for diagnosis by drawing on automatically-extracted linguistic features, and particularly the use of discourse coherence. Here, we present the first benchmark comparison of previously proposed coherence models for detecting symptoms of schizophrenia and evaluate their performance on a new dataset of recorded interviews between subjects and clinicians. We also present two improved coherence metrics based on modern sentence embedding techniques that outperform the previous methods on our dataset. Finally, we propose a novel computational model for reference incoherence based on ambiguous pronoun usage and show that it is a highly predictive feature on our data. While the number of subjects is limited in this pilot study, our results suggest new directions for diagnosing common symptoms of schizophrenia.CLPsych Workshop paper - NAACL 2018
Modern machine learning techniques, such as deep learning, often use discriminative models that require large amounts of labeled data. An alternative approach is to use a generative model, which leverages heuristics from domain experts to train on unlabeled data. Domain experts often prefer to use generative models because they "tell a story" about their data. Unfortunately, generative models are typically less accurate than discriminative models. Several recent approaches combine both types of model to exploit their strengths. In this setting, a misspecified generative model can hurt the performance of subsequent discriminative training. To address this issue, we propose a framework called Socratic learning that automatically uses information from the discriminative model to correct generative model misspecification. Furthermore, this process provides users with interpretable feedback about how to improve their generative model. We evaluate Socratic learning on real-world relation extraction tasks and observe an immediate improvement in classification accuracy that could otherwise require several weeks of effort by domain experts.Workshop paper
A blog post about how to implement data programming in TensorFlowBlog post
We study the factors affecting training time in multi-device deep learning systems. Given a specification of a convolutional neural network, our goal is to minimize the time to train this model on a cluster of commodity CPUs and GPUs. We first focus on the single-node setting and show that by using standard batching and data-parallel techniques, throughput can be improved by at least 5.5x over state-of-the-art systems on CPUs. This ensures an end-to-end training speed directly proportional to the throughput of a device regardless of its underlying hardware, allowing each node in the cluster to be treated as a black box. Our second contribution is a theoretical and empirical study of the tradeoffs affecting end-to-end training time in a multiple-device setting. We identify the degree of asynchronous parallelization as a key factor affecting both hardware and statistical efficiency. We see that asynchrony can be viewed as introducing a momentum term. Our results imply that tuning momentum is critical in asynchronous parallel configurations, and suggest that published results that have not been fully tuned might report suboptimal performance for some configurations. For our third contribution, we use our novel understanding of the interaction between system and optimization dynamics to provide an efficient hyperparameter optimizer. Our optimizer involves a predictive model for the total time to convergence and selects an allocation of resources to minimize that time. We demonstrate that the most popular distributed deep learning systems fall within our tradeoff space, but do not optimize within the space. By doing this optimization, our prototype runs 1.9x to 12x faster than the fastest state-of-the-art systems.Arxiv paper
A blog post describing Socratic learning.Blog post
In collaboration with Prof. Bailis at Stanford University we are trying to design the future
of mobile sensor systems. Here's a preview:
We have seen a massive proliferation of autonomous, mobile sensing agents in recent years, and this growth promises to continue into the next decade. With the rapid commoditization of these devices, autonomous sensors will likely become the next major platform for big data analytics and application development. In this paper, we introduce EAGLE, the first exploration of building control into a modern data system. EAGLE provides a platform for users to query the physical world through virtualized sensing, abstracting away an underlying network of data-collecting agents. EAGLE also explores the challenges of building a platform for application development on fleets of mobile sensors. As a prototype of our proposed system, we implemented a mobile sensing agent composed of a programmable Roomba vacuum cleaner and a video-enable smartphone. The robot is able to move autonomously, collect data that it aggregates to a central server, and respond to commands from a high level user interface.
Tracking an unknown number of targets given noisy measurements from multiple sensors is critical to autonomous driving. Rao-Blackwellized particle filtering is well suited to this problem. Monte Carlo sampling is used to determine whether measurements are valid, and if so, which targets they originate from. This breaks the problem into single target tracking sub-problems that are solved in closed form (e.g. with Kalman filtering). We compare the performance of a traditional Kalman filter with that of a recurrent neural network for single target tracking. We show that LSTMs outperform Kalman filtering for single target prediction by 2x. We also present a unique model for training two dependent LSTMs to output a Gaussian distribution for a single target prediction to be used as input to multi-target tracking. We evaluate the end to end performance of an LSTM and a Kalman filter for simultaneous multiple target tracking. In the end to end pipeline, LSTMs do not provide a significant improvement.Github Repo
Using various machine learning tactics to automatically detect shapes in physical chromosome strutures.Academic projects
User-contributed Web data contains rich and diverse information about a variety of events in the physical world, such as shows, festivals, conferences and more. This information ranges from known event features (e.g., title, time, location) posted on event aggregation platforms (e.g., Last.fm events, EventBrite, Facebook events) to discussions and reactions related to events shared on different social media sites (e.g., Twitter, YouTube, Flickr). In this paper, we focus on the challenge of automatically identifying user-contributed content for events that are planned and, therefore, known in advance, across different social media sites. We mine event aggregation platforms to extract event features, which are often noisy or missing. We use these features to develop query formulation strategies for retrieving content associated with an event on different social media sites. Further, we explore ways in which event content identified on one social media site can be used to retrieve additional relevant event content on other social media sites. We apply our strategies to a large set of user-contributed events, and analyze their effectiveness in retrieving relevant event content from Twitter, YouTube, and Flickr.Undergraduate research
As machine learning methods gain popularity across dierent elds, acquiring labeled training datasets has become the primary boleneck in the machine learning pipeline. Recently, generative models have been used to create and label large amounts of training data, albeit noisily. e output of these generative models is then used to train a discriminative model of choice, such as logistic regression or a complex neural network. However, any errors in the generative model can propagate to the subsequent model being trained. Unfortunately, these generative models are not easily interpretable and are therefore dicult to debug for users. To address this, we present our vision for Flipper, a framework that presents users with high-level information about why their training set is inaccurate and informs their decisions as they improve their generative model manually. We present potential tools within the Flipper framework, inspired by observing biomedical experts working with generative models, which allow users to analyze the errors in their training data in a systematic fashion. Finally, we discuss a prototype of Flipper and report results of a user study where users create a training set for a classication task and improve the discriminative model’s accuracy by 2.4 points in less than an hour with feedback from Flipper.Workshop Paper
With the proliferation of natural language interfaces on mobile devices and in home personal assistants such as Siri and Alexa, many services and data are becoming available through transcription from a speech recognition system. One major risk factor in this trend is that a malicious adversary may attack this system without the primary user noticing. One way to accomplish this is to use adversarial examples that are perceived one way by a human, but transcribed differently by the Automatic Speech Recognition (ASR) system. For example, a recording that sounds like ”hello” to the human ear, but is transcribed as “goodbye” by the ASR system. Recent work has shown that adversarial examples can be created for convolutional neural networks to fool vision recognition systems. We show that similar methods can be applied to neural ASR systems. We show successful results for two methods of generating adversarial examples where we fool a high quality ASR system but the difference in the audio is imperceptible to the human ear. We also present a method for converting the adversarial MFCC features back into audio.Class Project
Tracking an unknown number of targets given noisy measurements from multiple sensors is critical to autonomous driving. RaoBlackwellized particle filtering is well suited to this problem. Monte Carlo sampling is used to determine whether measurements are valid, and if so, which targets they originate from. This breaks the problem into single target tracking sub-problems that are solved in closed form (e.g. with Kalman filtering). We compare the performance of a traditional Kalman filter with that of a recurrent neural network for single target tracking. We show that LSTMs outperform Kalman filtering for single target prediction by 2x. We also present a unique model for training two dependent LSTMs to output a Gaussian distribution for a single target prediction to be used as input to multi-target tracking. We evaluate the end to end performance of an LSTM and a Kalman filter for simultaneous multiple target tracking. In the end to end pipeline, LSTMs do not provide a significant improvement.Class Project
Deep neural networks are in vogue for text classification. The lack of interpretability and computational cost associated with deep architectures has led to renewed interest in effective baseline models. In this paper, we review several popular baseline models which strike a balance between traditional and neural approaches, and propose improvements by combining their key contributions. In particular, we study gradient-tuned word embeddings, modeling n-grams, and generative sentence representation methods. We evaluate our methods by comparing end performance and training time on sentiment analysis and topic classification tasks. By combining techniques of popular baseline models into a single shallow architecture, we outperformed the individual models on all tasks, were competitive with traditional and deep approaches, and maintained fast training times.Photography
US Patent No. : US 2021/0004438 A1 – Date of Patent: Jan 7, 2021US Patent
PhD in Computer Science • Present
I am focusing in artificial intelligence and natural language processing. I'm advised by Prof. Dan Jurafsky. My research interests are pretrained language models, discourse coherence, attribute extraction and knowledge base construction.
B.S. in Computer Science • May 2011
I graduated Summa Cum Laude. I did research in information retrieval in social data with Prof. Luis Gravano and Hila Becker. My focus was in systems. I was also a member of Bacchanal.
Research Intern/ Student Researcher • June 2020 - Present
Host: David Grangier
Analysis of data selection methods for domain adaptation in machine translation and language modeling. Training transformer-based NMT models using contrastive methods and finetuned-BERT domain classifiers for selecting data most similar to a small target domain.
Research Intern • June 2019 - September 2019
Host: Kelvin Guu
Coherence objectives for language modeling - learning sentence representation by training transformer language models to understand sentence ordering and discourse distance.
Research Intern • June 2018 - September 2018
Entity attribute extraction - a special case of relation extraction where attributes describe searchable aspects of the entity.
Research Intern • June 2017 - August 2017
Advisors: Alon Halevy and Wang-Chiew Tan
FrameIt: A system to quickly build framings and SRLs for exploring large text corpora. In progress of building an ontology of happy moments in HappyDB.
Software Engineer (R&D) • June 2015 - December 2015
I helped design a new highly parallel processor architecture that offers GPU performance on an x86 instruction set. I profiled our design on a number of common machine working workloads.
Software Engineer • June 20012 - May 2015
As the 9th member of this startup I was involved from our first prototype to our second major product. I worked across the stack including implemented kernel drivers for high performance distributed caching for datacenters, MVC business logic for system managament and even built our light weight cross platform installer.
Our opponents maintain that we are confronted with insurmountable ... obstacles, but that may be said of the smallest obstacle if one has no desire to surmount it.Theodor Herzl