Sophia's Portfolio

Sentiment Analysis on IMDB Movie Reviews🍿

Project Summary:

1) Data Preparation: Tokenization, Stopwords Removal & Stemming

2) Text Vectorization

3) Text Classification

Project Detail:

The goal of this project is to train computers to understand human languages using Natural Language Processing (NLP).

Before the data could be used for problems based on NLP, the textual movie review dataset has to be prepared with tokenization, stopwords removal and stemming. Later, we transform all the text tokens into numerical vectors. After performing text vectorization on the review column, we split the data into training and test sets. Finally, we prepare a text classification model for sentiment classification.

The dataset used to train a sentiment classification model contains movie reviews, so we can test the model by giving a movie review as an input. Entering the text "such a great movie!" would give us [“positive”].

Python Code (using Numpy, Pandas, Seaborn, Matplotlib, Scikit-learn, NLTK)