OCA Corpus

Resource type:

Corpora

Description:

OCA is an Arabic corpus of movie reviews. This corpus has been generated from comments in Arabic obtained from different web pages shown in the following table:

Name Webpage Vote system Positive Negative
Cinema Al Rasid http://cinema.al-rasid.com/ 10 36 1
Film Reader http://filmreader.blogspot.com/ 5 0 92
Hot Movie Reviews http://hotmoviews.blogspot.com 5 45 4
Elcinema http://www.elcinema.com 10 0 56
Grind House http://grindh.com 10 38 0
Mzyondubai http://www.mzyondubai.com 10 0 15
Aflamee http://aflamee.com 5 0 1
Grind Film http://grindfilm.blogspot.com/ 10 0 8
Cinema Gate http://www.cingate.net Bad/Good 0 1
Emad Ozery Blog http://emadozery.blogspot.com 10 0 1
Fil Fan http://www.filfan.com 5 81 20
Sport4Ever http://sport4ever.maktoob.com 10 0 1
DVD4ArabPos http://dvd4arab.maktoob.com 10 11 0
Gamraii http://www.gamraii.com 10 39 0
Shadows and Phantoms http://shadowsandphantoms.blogspot.com 10 0 50
Total 250 250

Some statistics of OCA corpus: This corpus was generated in October 2010 Some statistics on it are shown in the following table.:

Negative Positive
Total documents 250 250
Total tokens 94,556 121,392
Average tokens on each comment 378 485
Total sentences 4,881 3,137
Average sentences on each comment 20 13

Rushdi-Saleh, M., Martín-Valdivia, M. T., Alfonso Ureña-López, L. & Perea-Ortega, J. M. (2011). OCA: Opinion corpus for Arabic. Journal of the American Society for Information Science and Technology.
http://dx.doi.org/10.1002/asi.21598

For any questions on the corpus sends an email to Mohammed Saleh or José M. Perea

Resource files:

OCA-corpus.zip