Below is a sample list of publications.

For a complete list, please visit the following links: Google Scholar / Semantic Scholar / ACL Anthology / DBLP.
  1. Adapting Language Models for Non-Parallel Author-Stylized Rewriting
    Bakhtiyar Syed, Gaurav Verma, Balaji Vasan Srinivasan, Anandhavelu N and Vasudeva Varma

    In 34th AAAI Conference on Artificial Intelligence, February 2020 , New York, USA. AAAI 2020.

    Given the recent progress in language modeling using Transformer-based neural models and an active interest in generating stylized text, we present an approach to leverage the generalization capabilities of a language model to rewrite an input text in a target author's style. Our proposed approach adapts a pre-trained language model to generate author-stylized text by fine-tuning on the author-specific corpus using a denoising autoencoder (DAE) loss in a cascaded encoder-decoder framework. Optimizing over DAE loss allows our model to learn the nuances of an author's style without relying on parallel data, which has been a severe limitation of the previous related works in this space. To evaluate the efficacy of our approach, we propose a linguistically-motivated framework to quantify stylistic alignment of the generated text to the target author at lexical, syntactic and surface levels. The evaluation framework is both interpretable as it leads to several insights about the model, and self-contained as it does not rely on external classifiers, e.g. sentiment or formality classifiers. Qualitative and quantitative assessment indicates that the proposed approach rewrites the input text with better alignment to the target style while preserving the original content better than state-of-the-art baselines.


  1. Stereotypical Bias Removal for Hate Speech Detection Task using Knowledge-based Generalizations
    Pinkesh Badjatiya, Manish Gupta and Vasudeva Varma

    In The Web Conference, San Francisco, USA (WWW-2019)

    With the ever-increasing cases of hate spread on social media platforms, it is critical to design abuse detection mechanisms to proactively avoid and control such incidents. While there exist methods for hate speech detection, they stereotype words and hence suffer from inherently biased training. Bias removal has been traditionally studied for structured datasets, but we aim at bias mitigation from unstructured text data. In this paper, we make two important contributions. First, we systematically design methods to quantify the bias for any model and propose algorithms for identifying the set of words which the model stereotypes. Second, we propose novel methods leveraging knowledge-based generalizations for bias-free learning. Knowledge-based generalization provides an effective way to encode knowledge because the abstraction they provide not only generalizes content but also facilitates retraction of informationfrom the hate speech detection classifier, thereby reducing the imbalance. We experiment with multiple knowledge generalization policies and analyze their effect on general performance and inmitigating bias. Our experiments with two real-world datasets, a Wikipedia Talk Pages dataset (WikiDetox) of size ∼96k and a Twitter dataset of size ∼24k, show that the use of knowledge-based generalizations results in better performance by forcing the classifier to learn from generalized content. Our methods utilize existing knowledge-bases and can easily be extended to other tasks.
  2. MVAE: Multimodal Variational Autoencoder for Fake News Detection
    Dhruv Khattar, Jaipal Singh Goud, Manish Gupta and Vasudeva Varma

    In The Web Conference, San Francisco, USA (WWW-2019)

    In recent times, fake news and misinformation have had a disruptive and adverse impact on our lives. Given the prominence of microblogging networks as a source of news for most individuals, fake news now spreads at a faster pace and has a more profound impact than ever before. This makes detection of fake news an extremely important challenge. Fake news articles, just like genuine news articles, leverage multimedia content to manipulate user opinions but spread misinformation. A shortcoming of the current approaches for the detection of fake news is their inability to learn a shared representation of multimodal (textual + visual) information. We propose an end-to-end network, Multimodal Variational Autoencoder (MVAE), which uses a bimodal variational autoencoder coupled with a binary classifier for the task of fake news detection. The model consists of three main components, an encoder, a decoder and a fake news detector module. The variational autoencoder is capable of learning probabilistic latent variable models by optimizing a bound on the marginal likelihood of the observed data. The fake news detector then utilizes the multimodal representations obtained from the bimodal variational autoencoder to classify posts as fake or not. We conduct extensive experiments on two standard fake news datasets collected from popular microblogging websites: Weibo and Twitter. The experimental results show that across the two datasets, on average our model outperforms state-of-the-art methods by margins as large as ∼6% in accuracy and ∼5% in F1 scores.


  1. Unity in Diversity: Learning Distributed Heterogeneous Sentence Representation for Extractive Summarization
    Abhishek Singh, Manish Gupta and Vasudeva Varma

    In 32nd AAAI Conference on Artificial Intelligence, February 2018 , New Orleans, USA. AAAI 2018.

    Automated multi-document extractive text summarization is a widely studied research problem in the field of natural language understanding. Such extractive mechanisms compute in some form the worthiness of a sentence to be included into the summary. While the conventional approaches rely on human crafted document-independent features to generate a summary, we develop a data-driven novel summary system called HNet, which exploits the various semantic and compositional aspects latent in a sentence to capture document independent features. The network learns sentence representation in a way that, salient sentences are closer in the vector space than non-salient sentences. This semantic and compositional feature vector is then concatenated with the documentdependent features for sentence ranking. Experiments on the DUC benchmark datasets (DUC-2001, DUC-2002 and DUC2004) indicate that our model shows significant performance gain of around 1.5-2 points in terms of ROUGE score compared with the state-of-the-art baselines.


  1. SSAS: Semantic Similarity for Abstractive Summarization
    Raghuram Vadapalli, Litton J Kurisinkel, Manish Gupta, and Vasudeva Varma

    8th International Joint Conference on Natural Language Processing, Taipei, Taiwan. (IJCNLP-2017)

    Ideally a metric evaluating an abstract system summary should represent the extent to which the system-generated summary approximates the semantic inference conceived by the reader using a humanwritten reference summary. Most of the previous approaches relied upon word or syntactic sub-sequence overlap to evaluate system-generated summaries. Such metrics cannot evaluate the summary at semantic inference level. Through this work we introduce the metric of Semantic Similarity for Abstractive Summarization (SSAS)1 , which leverages natural language inference and paraphrasing techniques to frame a novel approach to evaluate system summaries at semantic inference level. SSAS is based upon a weighted composition of quantities representing the level of agreement, contradiction, topical neutrality, paraphrasing, and optionally ROUGE score between a systemgenerated and a human-written summary.
  2. Towards Sub-Word Level Compositions for Sentiment Analysis of Hindi-English Code Mixed Text
    Aditya Joshi, Prabhu Ameya Pandurang, Manish Shrivastava and Vasudeva Varma

    In 26th International Conference on Computational Linguistics, (COLING-2016 2017)

    Sentiment analysis (SA) using code-mixed data from social media has several applications in opinion mining ranging from customer satisfaction to social campaign analysis in multilingual societies. Advances in this area are impeded by the lack of a suitable annotated dataset. We introduce a Hindi-English (Hi-En) code-mixed dataset for sentiment analysis and perform empirical analysis comparing the suitability and performance of various state-of-the-art SA methods in social media. In this paper, we introduce learning sub-word level representations in LSTM (Subword-LSTM) architecture instead of character-level or word-level representations. This linguistic prior in our architecture enables us to learn the information about sentiment value of important morphemes. This also seems to work well in highly noisy text containing misspellings as shown in our experiments which is demonstrated in morpheme-level feature maps learned by our model. Also, we hypothesize that encoding this linguistic prior in the Subword-LSTM architecture leads to the superior performance. Our system attains accuracy 4-5% greater than traditional approaches on our dataset, and also outperforms the available system for sentiment analysis in Hi-En code-mixed text by 18%.
  3. Interpretation of Semantic Tweet Representations
    Ganesh Jawahar, Manish Gupta and Vasudeva Varma

    In ASONAM 2017, Sydney, Australia

    Research in analysis of microblogging platforms is experiencing a renewed surge with a large number of works applying representation learning models for applications like sentiment analysis, semantic textual similarity computation, hashtag prediction, etc. Although the performance of the representation learning models has been better than the traditional baselines for such tasks, little is known about the elementary properties of a tweet encoded within these representations, or why particular representations work better for certain tasks. Our work presented here constitutes the first step in opening the black-box of vector embeddings for tweets. Traditional feature engineering methods for high-level applications have exploited various elementary properties of tweets. We believe that a tweet representation is effective for an application because it meticulously encodes the application-specific elementary properties of tweets. To understand the elementary properties encoded in a tweet representation, we evaluate the representations on the accuracy to which they can model each of those properties such as tweet length, presence of particular words, hashtags, mentions, capitalization, etc. Our systematic extensive study of nine supervised and four unsupervised tweet representations against most popular eight textual and five social elementary properties reveal that Bi-directional LSTMs (BLSTMs) and Skip-Thought Vectors (STV) best encode the textual and social properties of tweets respectively. FastText is the best model for low resource settings, providing very little degradation with reduction in embedding size. Finally, we draw interesting insights by correlating the model performance obtained for elementary property prediction tasks with the highlevel downstream applications.


  1. TweetGrep: Weakly Supervised Joint Retrieval and Sentiment Analysis of Topical Tweets
    Satarupa Guha, Tanmoy Chakraborty, Samik Datta, Mohit Kumar, and Vasudeva Varma

    In ICWSM 2016, Cologne, Germany

    An overwhelming amount of data is generated everyday on social media, encompassing a wide spectrum of topics. With almost every business decision depending on customer opinion, mining of social media data needs to be quick and easy. For a data analyst to keep up with the agility and the scale of the data, it is impossible to bank on fully supervised techniques to mine topics and their associated sentiments from social media. Motivated by this, we propose a weakly supervised approach (named, TweetGrep) that lets the data analyst easily define a topic by few keywords and adapt a generic sentiment classifier to the topic – by jointly modeling topics and sentiments using label regularization. Experiments with diverse datasets show that TweetGrep beats the state-ofthe-art models for both the tasks of retrieving topical tweets and analyzing the sentiment of the tweets (average improvement of 4.97% and 6.91% respectively in terms of area under the curve). Further, we show that TweetGrep can also be adopted in a novel task of hashtag disambiguation, which significantly outperforms the baseline methods.


  1. Timespent based models for predicting user retention
    Kushal S. Dave, Vishal Vaingankar, Sumanth Kolar and Vasudeva Varma

    In WWW 2013, Rio De Janeiro, Brazil

    Content discovery is fast becoming the preferred tool for user engagement on the web. Discovery allows users to get educated and entertained about their topics of interest. StumbleUpon is the largest personalized content discovery engine on the Web, delivering more than 1 billion personalized recommendations per month. As a recommendation system one of the primary metrics we track is whether the user returns (retention) to use the product after their initial experience (session) with StumbleUpon. In this paper, we attempt to address the problem of predicting user retention based on the user’s previous sessions. The paper first explores the different user and content features that are helpful in predicting user retention. This involved mapping the user and the user’s recommendations (stumbles) in a descriptive feature space such as the timespent by user, number of stumbles, and content features of the recommendations. To model the diversity in user behaviour, we also generated normalized features that account for the user’s speed of stumbling. Using these features, we built a decision tree classifier to predict retention. We find that a model that uses both the user and content features achieves higher prediction accuracy than a model that uses the two features separately. Further, we used information theoretical analysis to find a subset of recommendations that are most indicative of user retention. A classifier trained on this subset of recommendations achieves the highest prediction accuracy. This indicates that not every recommendation seen by the user is predictive of whether the user will be retained; instead, a subset of most informative recommendations is more useful in predicting retention.


  1. Dynamic energy efficient data placement and cluster reconfiguration algorithm for mapreduce framework
    Nitesh Maheshwari, Radheshyam Nanduri, and Vasudeva Varma

    In Future Generation Computer Systems 28 (1), 119-127

    With the recent emergence of cloud computing based services on the Internet, MapReduce and distributed file systems like HDFS have emerged as the paradigm of choice for developing large scale data intensive applications. Given the scale at which these applications are deployed, minimizing power consumption of these clusters can significantly cut down operational costs and reduce their carbon footprint—thereby increasing the utility from a provider’s point of view. This paper addresses energy conservation for clusters of nodes that run MapReduce jobs. The algorithm dynamically reconfigures the cluster based on the current workload and turns cluster nodes on or off when the average cluster utilization rises above or falls below administrator specified thresholds, respectively. We evaluate our algorithm using the GridSim toolkit and our results show that the proposed algorithm achieves an energy reduction of 33% under average workloads and up to 54% under low workloads.


  1. Modeling Action Cascades in Social Networks
    Kushal S. Dave, Rushi Bhatt, and Vasudeva Varma

    In International Conference on Weblogs and Social Media, Barcelona, Spain. (ICWSM 2011)

    The central idea in designing various marketing strategies for online social networks is to identify the influencers in the network. The influential individuals induce “word-of-mouth” effects in the network. These individuals are responsible for triggering long cascades of influence that convince their peers to perform a similar action (buying a product, for instance). Targeting these influentials usually leads to a vast spread of the information across the network. Hence it is important to identify such individuals in a network. One way to measure an individual’s influencing capability on its peers is by its reach for a certain action. We formulate identifying the influencers in a network as a problem of predicting the average depth of cascades an individual can trigger. We first empirically identify factors that play crucial role in triggering long cascades. Based on the analysis, we build a model for predicting the cascades triggered by a user for an action. The model uses features like influencing capabilities of the user and their friends, influencing capabilities of the particular action and other user and network characteristics. Experiments show that the model effectively improves the predictions over several baselines.