The Journal of Web Science > Vol 1 > Issue 1

ACQUA: Automated Community-based Question Answering through the Discretisation of Shallow Linguistic Features

George Gkotsis, Knowledge Media Institute, The Open University, UK, george.gkotsis@open.ac.uk Maria Liakata, Department of Computer Science, University of Warwick, UK, m.liakata@warwick.ac.uk Karen Stepanyan, London School of Business and Management, UK, Karen.Stepanyan@lsbm.ac.uk John Domingue, Knowledge Media Institute, The Open University, UK, john.domingue@open.ac.uk
 
Suggested Citation
George Gkotsis, Maria Liakata, Karen Stepanyan and John Domingue (2015), "ACQUA: Automated Community-based Question Answering through the Discretisation of Shallow Linguistic Features", The Journal of Web Science: Vol. 1: No. 1, pp 1-15. http://dx.doi.org/10.1561/106.00000001

Published: 24 Jun 2015
© 2015 G. Gkotsis, M. Liakata, C. Pedrinaci, K. Stepanyan, and J. Domingue
 
Subjects
 

Article Help

Share

Open Access

This is published under the terms of CC BY-NC-ND 2.0.

In this article:
1. Introduction
2. Related Work
3. StackExchange Dataset
4. Features for Best Answer Prediction
5. Evaluation: Best Answer Prediction
6. ACQUA
7. Discussion
8. Conclusions
References

Abstract

This paper addresses the problem of determining the best answer in Community-based Question Answering (CQA) websites by focussing on the content. In particular, we present a novel system, ACQUA (http://acqua.kmi.open. ac.uk), that can be installed onto the majority of browsers as a plugin. The service offers a seamless and accurate prediction of the answer to be accepted. Our system is based on a novel approach for processing answers in CQAs. Previous research on this topic relies on the exploitation of community feedback on the answers, which involves rating of either users (e.g., reputation) or answers (e.g. scores manually assigned to answers). We propose a new technique that leverages the content/textual features of answers in a novel way. Our approach delivers better results than related linguistics-based solutions and manages to match rating-based approaches. More specifically, the gain in performance is achieved by rendering the values of these features into a discretised form. We also show how our technique manages to deliver equally good results in real-time settings, as opposed to having to rely on information not always readily available, such as user ratings and answer scores. We ran an evaluation on 21 StackExchange websites covering around 4 million questions and more than 8 million answers. We obtain 84% average precision and 70% recall, which shows that our technique is robust, effective, and widely applicable.

DOI:10.1561/106.00000001