An authorship attribution for serbian
Само за регистроване кориснике
2012
Конференцијски прилог (Објављена верзија)
Метаподаци
Приказ свих података о документуАпстракт
An authorship attribution is a problem of identifying the author of an anonymous or disputed text if there is a closed set of candidate authors. Due to the richness of natural languages and numerous ways of expressing individuality in a writing process, this task employs all the sources of lan- guage knowledge: lexis, syntax, semantics, orthography, etc. The impressive results of n-gram based algorithms have been presented in many papers for many languages so far. The goal of our research was to test if this group of algorithms works equally well on Serbian and if it is a case, to cal- culate the optimal values for the parameters appearing in the algorithms. Also, we wanted to test if a syllable based word decomposition, which represents a more human like word decomposition in comparison to n-grams, can be use- ful in an authorship attribution. Our results confirm good performance of an n-gram based approach (accuracy up to 96%) and show the potential usefulness of a syllable based app...roach (accuracy from 81% to 89%).
Кључне речи:
Syllables / N-grams / Classification / Authorship attributionИзвор:
CEUR Workshop Proceedings, 2012, 920, 109-112Финансирање / пројекти:
- Српски језик и његови ресурси: теорија, опис и примене (RS-MESTD-Basic Research (BR or ON)-178006)
Институција/група
Filološki fakultet / Faculty of PhilologyTY - CONF AU - Zečević, A. AU - Utvić, Miloš PY - 2012 UR - https://repff.fil.bg.ac.rs/handle/123456789/685 AB - An authorship attribution is a problem of identifying the author of an anonymous or disputed text if there is a closed set of candidate authors. Due to the richness of natural languages and numerous ways of expressing individuality in a writing process, this task employs all the sources of lan- guage knowledge: lexis, syntax, semantics, orthography, etc. The impressive results of n-gram based algorithms have been presented in many papers for many languages so far. The goal of our research was to test if this group of algorithms works equally well on Serbian and if it is a case, to cal- culate the optimal values for the parameters appearing in the algorithms. Also, we wanted to test if a syllable based word decomposition, which represents a more human like word decomposition in comparison to n-grams, can be use- ful in an authorship attribution. Our results confirm good performance of an n-gram based approach (accuracy up to 96%) and show the potential usefulness of a syllable based approach (accuracy from 81% to 89%). C3 - CEUR Workshop Proceedings T1 - An authorship attribution for serbian EP - 112 SP - 109 VL - 920 UR - conv_2092 ER -
@conference{ author = "Zečević, A. and Utvić, Miloš", year = "2012", abstract = "An authorship attribution is a problem of identifying the author of an anonymous or disputed text if there is a closed set of candidate authors. Due to the richness of natural languages and numerous ways of expressing individuality in a writing process, this task employs all the sources of lan- guage knowledge: lexis, syntax, semantics, orthography, etc. The impressive results of n-gram based algorithms have been presented in many papers for many languages so far. The goal of our research was to test if this group of algorithms works equally well on Serbian and if it is a case, to cal- culate the optimal values for the parameters appearing in the algorithms. Also, we wanted to test if a syllable based word decomposition, which represents a more human like word decomposition in comparison to n-grams, can be use- ful in an authorship attribution. Our results confirm good performance of an n-gram based approach (accuracy up to 96%) and show the potential usefulness of a syllable based approach (accuracy from 81% to 89%).", journal = "CEUR Workshop Proceedings", title = "An authorship attribution for serbian", pages = "112-109", volume = "920", url = "conv_2092" }
Zečević, A.,& Utvić, M.. (2012). An authorship attribution for serbian. in CEUR Workshop Proceedings, 920, 109-112. conv_2092
Zečević A, Utvić M. An authorship attribution for serbian. in CEUR Workshop Proceedings. 2012;920:109-112. conv_2092 .
Zečević, A., Utvić, Miloš, "An authorship attribution for serbian" in CEUR Workshop Proceedings, 920 (2012):109-112, conv_2092 .