Sent Successfully.
Home / Blog / Artificial Intelligence / IMDB Data Analysis using ANN
IMDB Data Analysis using ANN
Table of Content
- Business Problem
- Data Collection and Pre-processing
- Loading the Datasets
- Reversing the Index to Word
- Example for Enumeration
- Vectorization-Converting Text into Numerical Representation
- Converting the Inputs to Float Type
- Defining the Model
- Neurons, Input Layer and Activation Function
- Splitting the Data into Training and Validation
- Plotting Validation Scores to Visualise the Performance of the Model
- Fine Tuning the Model to Avoid Overfitting
Business Problem
We can determine if a statement is good or negative by utilising the Internet movie database as a dataset.
Data Collection and Pre-processing
from keras.datasets import imdb
Click here to explore 360DigiTMG.
Loading the Datasets
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
train_data[0] # Training data
train_labels[0] # Training labels
max([max(sequence) for sequence in train_data])
word_index = imdb.get_word_index() # accessing the word index
Reversing the Index to Word
reverse_word_index = dict(
[(value, key) for (key, value) in word_index.items()])
decoded_review = ' '.join(
[reverse_word_index.get(i - 3, '?') for i in train_data[0]])
Example for Enumeration
my_list = ['a','b','c','d']
for x, value in enumerate(my_list,1):
print(x,value)
import numpy as np # loading numpy
Vectorization-Converting Text into Numerical Representation
def vectorize_sequences(sequences, dimension=10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results
x_train = vectorize_sequences(train_data) # Passing the training data to change into numeric
x_test = vectorize_sequences(test_data) ) # Passing the testing data to change into numeric
x_train[0] # Numerical form on training data
Converting the Inputs to Float Type
y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')
Defining the Model
from keras import models # Importing the model from keras from keras import layers # Importing the model from keras
model = models.Sequential() # Defining the empty sequential model
model.add(layers.Dense(16, activation='relu', input_shape=(10000,))) # Adding dense layer with
Neurons, Input Layer and Activation Function
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
from keras import optimizers # Importing optimizers from keras
model.compile(optimizer=optimizers.RMSprop(lr=0.001),loss='binary_crossentropy',metrics=['accuracy']) # Utilizing the optimizers ,the loss function and accuracy
Splitting the Data into Training and Validation
x_val = x_train[:10000] # All the data from row number 0 to 9999
partial_x_train = x_train[10000:] # Remaining data are store here
y_val = y_train[:10000] # All the labels from row number 0 to 9999
partial_y_train = y_train[10000:] # Remaining labels from 9999 till end
model = model.fit(partial_x_train, partial_y_train, epochs=20, batch_size=512, validation_data=(x_val, y_val)) # Model training on training data and testing the model on validation data
history_dict = model.history # Getting the values which was calculated by the model
history_dict.keys()
Plotting Validation Scores to Visualise the Performance of the Model
import matplotlib.pyplot as plt
acc = model.history['accuracy'] # Get the training accuracy values
val_acc = model.history['val_accuracy'] # Get the validation accuracy values
loss = model.history['loss'] # Training loss
val_loss = model.history['val_loss'] # Validation loss
epochs = range(1, len(acc)+1) # Number of epochs
plt.plot(epochs, loss, 'bo', label='Training loss') # Dotted curve with blue colour
plt.plot(epochs, val_loss, 'b', label='Validation loss') # Simple curve with blue colour
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
plt.clf()
acc_values = history_dict['accuracy']
val_acc_values = history_dict['val_accuracy']
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
Fine Tuning the Model to Avoid Overfitting
model = models.Sequential()
model.add(layers.Dense(16, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',loss='binary_crossentropy',metrics=['accuracy'])
model.fit(x_train, y_train, epochs=4, batch_size=512) # Early stopping regularization technique is used
results = model.evaluate(x_test, y_test)
model.predict(x_test) # Predicted values on Test data
Click here to learn Data Science Course, Data Science Course in Hyderabad, Data Science Course in Bangalore
Data Science Placement Success Story
Data Science Training Institutes in Other Locations
Agra, Ahmedabad, Amritsar, Anand, Anantapur, Bangalore, Bhopal, Bhubaneswar, Chengalpattu, Chennai, Cochin, Dehradun, Malaysia, Dombivli, Durgapur, Ernakulam, Erode, Gandhinagar, Ghaziabad, Gorakhpur, Gwalior, Hebbal, Hyderabad, Jabalpur, Jalandhar, Jammu, Jamshedpur, Jodhpur, Khammam, Kolhapur, Kothrud, Ludhiana, Madurai, Meerut, Mohali, Moradabad, Noida, Pimpri, Pondicherry, Pune, Rajkot, Ranchi, Rohtak, Roorkee, Rourkela, Shimla, Shimoga, Siliguri, Srinagar, Thane, Thiruvananthapuram, Tiruchchirappalli, Trichur, Udaipur, Yelahanka, Andhra Pradesh, Anna Nagar, Bhilai, Borivali, Calicut, Chandigarh, Chromepet, Coimbatore, Dilsukhnagar, ECIL, Faridabad, Greater Warangal, Guduvanchery, Guntur, Gurgaon, Guwahati, Hoodi, Indore, Jaipur, Kalaburagi, Kanpur, Kharadi, Kochi, Kolkata, Kompally, Lucknow, Mangalore, Mumbai, Mysore, Nagpur, Nashik, Navi Mumbai, Patna, Porur, Raipur, Salem, Surat, Thoraipakkam, Trichy, Uppal, Vadodara, Varanasi, Vijayawada, Vizag, Tirunelveli, Aurangabad
Navigate to Address
360DigiTMG - Data Science, Data Scientist Course Training in Bangalore
No 23, 2nd Floor, 9th Main Rd, 22nd Cross Rd, 7th Sector, HSR Layout, Bengaluru, Karnataka 560102
1800-212-654-321