Img


NEWS 2018: The Seventh Named Entities Workshop

T-EnCh, B-ChEn and T-EnVi Datasets

Institute for Infocomm Research License Agreement

To use Xinhua transliteration datasets (T-EnCh and B-ChEn) and the VNU-HCMUS English-Vietnamese transliteration dataset (T-EnVi), you must agree to the following conditions:
1. I will use these datasets for non-commercial research only.
2. I will not re-distribute these datasets.
3. I will explicitly acknowledge the use of the Xinhua (T-EnCh and B-ChEn) datasets by citing the following articles whenever the datasets are mentioned in publications:
* Xinhua News Agency. “Chinese transliteration of foreign personal names”. The Commercial Press, 1992.
* Haizhou Li, Min Zhang and Jian Su. “A joint source-channel model for machine transliteration”. In ACL’04: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, page 159, 2004.
4. I will explicitly acknowledge the use of the VNU-HCMUS (T-EnVi) dataset by following the two guidelines below whenever the datasets are mentioned in publications:
4.a. Acknowledging the data source is provided from “Artificial Intelligence Laboratory (AILab) at the Ho Chi Minh City University of Science (VNU-HCMUS)”
4.b. By citing the following articles:
* Nam X. Cao, Nhut M. Pham, Quan H. Vu. “Comparative analysis of transliteration techniques based on statistical machine translation and joint-sequence model”, In Proceedings of the 2010 Symposium on Information and Communication Technology (pp. 59-63). ACM.
* Hoang Gia Ngo, Nancy F. Chen, Nguyen Binh Minh, Bin Ma, Haizhou Li. “Phonology-Augmented Statistical Transliteration for Low-Resource Languages”, Interspeech, 2015

Request T-EnCh, B-ChEn and T-EnVi Datasets

Request Data By requesting these datasets you make explicit your agreement to the above terms and conditions by Institute for Infocomm Research.