StatNLP

Multilingual Geoquery

A multilingual dataset for Geoquery. Each instance is a sentence annotated with its meaning representations. The corpora in Chinese, Indonesia, Farsi and Swedish are originally released by "Semantic Parsing with Neural Hybrid Trees".

MalwareTextDB

The dataset in various format (see the readme for more details) can be found here: MalwareTextDB-1.0.zip (5,5MB download, 20MB unzipped) The dataset is originally published in this paper "MalwareTextDB: A Database for Annotated Malware Articles".

Multilingual ATIS

A new multilingual version of the ATIS corpus. The dataset is originally published in this paper "Neural Architectures for Multilingual Semantic Parsing".

NP-annotated SMS dataset

Thanks to Alexander Binder, Jie Yang, Dinh Quang Thinh, as well as 64 undergraduate students for the help in creating the annotations for the NUS SMS Corpus. The annotation guidelines given to students.

Chinese Address dataset

The dataset and annotation guideline are uploaded to Github. Thanks to Ali Damo Academy for the Chinese address Corpus.

Taobao and Youku NER Dataset

The dataset and annotation guideline are uploaded to Github. Thanks to Ali Damo Academy for the annotations.