
		<paper>
			<loc>https://jjcit.org/paper/178</loc>
			<title>COTA 2.0: AN AUTOMATIC CORRECTOR OF TUNISIAN ARABIC SOCIAL MEDIA TEXTS</title>
			<doi>10.5455/jjcit.71-1655499240</doi>
			<authors>Asma Mekki,Inès Zribi,Mariem Ellouze,Lamia Hadrich Belguith*</authors>
			<keywords>Orthographic normalization,Tunisian Arabic,COTA Orthography system,CODA-TA</keywords>
			<citation>5</citation>
			<views>5515</views>
			<downloads>1604</downloads>
			<received_date>17-Jun.-2022</received_date>
			<revised_date>  7-Sep.-2022 and 18-Oct.-2022</revised_date>
			<accepted_date>  10-Nov.-2022</accepted_date>
			<abstract>In  written  text,  orthographic  noise  is  a  common  concern  for  NLP,  especially  when  operating  social-network 
comments and raw documents. This is mainly due to its orthographic conventions and morphological ambiguity. 
We  propose  to  automatically  normalize  the  social-media  dialect corpora  by  following  CODA-TA,  the 
conventional Orthography for TA. The existing system developed for TA «COTA Orthography 1.0» is not able to 
handle  all  forms  of  TA.  Therefore,  we  propose  to  extend  its  rules  and  lexicons  to  address  the  peculiarities  of 
social  media  dialect.  In  certain  words,  the  COTA  Orthography  1.0  system  provides  the  user  with  several 
correction  possibilities.  Therefore,  in  the  new  version,  we  incorporated  a  trigram  language  model  to 
automatically  select  the  right  correction.  Our  results show  that  the  system  can  reduce  transcription  errors  by 
95.72%.</abstract>
		</paper>


