Resource type
Corpora
Description
OCA is an Arabic corpus of movie reviews. This corpus has been generated from comments in Arabic obtained from different web pages shown in the following table:
Name | Webpage | Vote system | Positive | Negative |
---|---|---|---|---|
Cinema Al Rasid | http://cinema.al-rasid.com/ | 10 | 36 | 1 |
Film Reader | http://filmreader.blogspot.com/ | 5 | 0 | 92 |
Hot Movie Reviews | http://hotmoviews.blogspot.com | 5 | 45 | 4 |
Elcinema | http://www.elcinema.com | 10 | 0 | 56 |
Grind House | http://grindh.com | 10 | 38 | 0 |
Mzyondubai | http://www.mzyondubai.com | 10 | 0 | 15 |
Aflamee | http://aflamee.com | 5 | 0 | 1 |
Grind Film | http://grindfilm.blogspot.com/ | 10 | 0 | 8 |
Cinema Gate | http://www.cingate.net | Bad/Good | 0 | 1 |
Emad Ozery Blog | http://emadozery.blogspot.com | 10 | 0 | 1 |
Fil Fan | http://www.filfan.com | 5 | 81 | 20 |
Sport4Ever | http://sport4ever.maktoob.com | 10 | 0 | 1 |
DVD4ArabPos | http://dvd4arab.maktoob.com | 10 | 11 | 0 |
Gamraii | http://www.gamraii.com | 10 | 39 | 0 |
Shadows and Phantoms | http://shadowsandphantoms.blogspot.com | 10 | 0 | 50 |
Total | 250 | 250 |
Some statistics of OCA corpus: This corpus was generated in October 2010 Some statistics on it are shown in the following table:
Negative | Positive | |
---|---|---|
Total documents | 250 | 250 |
Total tokens | 94,556 | 121,392 |
Average tokens on each comment | 378 | 485 |
Total sentences | 4,881 | 3,137 |
Average sentences on each comment | 20 | 13 |
Rushdi-Saleh, M., Martín-Valdivia, M. T., Alfonso Ureña-López, L. & Perea-Ortega, J. M. (2011). OCA: Opinion corpus for Arabic. Journal of the American Society for Information Science and Technology.
http://dx.doi.org/10.1002/asi.21598
For any questions on the corpus sends an email to Mohammed Saleh or José M. Perea
Resource files