OCA Corpus

Resource type
Corpora
Description

OCA is an Arabic corpus of movie reviews. This corpus has been generated from comments in Arabic obtained from different web pages shown in the following table:

Name Webpage Vote system Positive Negative
Cinema Al Rasid http://cinema.al-rasid.com/ 10 36 1
Film Reader http://filmreader.blogspot.com/ 5 0 92
Hot Movie Reviews http://hotmoviews.blogspot.com 5 45 4
Elcinema http://www.elcinema.com 10 0 56
Grind House http://grindh.com 10 38 0
Mzyondubai http://www.mzyondubai.com 10 0 15
Aflamee http://aflamee.com 5 0 1
Grind Film http://grindfilm.blogspot.com/ 10 0 8
Cinema Gate http://www.cingate.net Bad/Good 0 1
Emad Ozery Blog http://emadozery.blogspot.com 10 0 1
Fil Fan http://www.filfan.com 5 81 20
Sport4Ever http://sport4ever.maktoob.com 10 0 1
DVD4ArabPos http://dvd4arab.maktoob.com 10 11 0
Gamraii http://www.gamraii.com 10 39 0
Shadows and Phantoms http://shadowsandphantoms.blogspot.com 10 0 50
    Total 250 250

Some statistics of OCA corpus: This corpus was generated in October 2010 Some statistics on it are shown in the following table:

  Negative Positive
Total documents 250 250
Total tokens 94,556 121,392
Average tokens on each comment 378 485
Total sentences 4,881 3,137
Average sentences on each comment 20 13

Rushdi-Saleh, M., Martín-Valdivia, M. T., Alfonso Ureña-López, L. & Perea-Ortega, J. M. (2011). OCA: Opinion corpus for Arabic. Journal of the American Society for Information Science and Technology.
http://dx.doi.org/10.1002/asi.21598

For any questions on the corpus sends an email to Mohammed Saleh or José M. Perea

 

Resource files