This paper introduces a new corpus, QA-It, for the classification of non-referential it. Our dataset is unique in a sense that it is annotated on question answer pairs collected from multiple genres, useful for developing advanced QA systems. Our annotation scheme makes clear distinctions between 4 types of it, providing guidelines for many erroneous cases. Several statistical models are built for the classification of it, showing encouraging results. To the best of our knowledge, this is the first time that such a corpus is created for question answering.
Anthology | Paper | Presentation | BibTeX