Aspect Based Sentiment Analysis
Data ScienceDeep Learning

Aspect Based Sentiment Analysis

May 20, 2025  •  29 min read

Table of Contents

Introduction

Ever found yourself in a product manager's shoes at a tech company, grappling with mountains of unstructured customer reviews after a new launch? You're not just looking for a 'happy or sad' tally; you want to pinpoint exactly what customers love or loathe. So, what's your next move? Standard sentiment analysis often falls short, giving you a high-level 'happy or sad' score but missing the granular insights you truly need.

This is precisely where Aspect-Based Sentiment Analysis (ABSA) shines.

Think of Aspect-Based Sentiment Analysis as a precision tool: it lets you drill down into customer reviews to extract specific product or service aspects and then determine the sentiment tied to each one. For example, if you have a review that says "The battery life is great, but the camera quality is terrible," Aspect-Based Sentiment Analysis would allow you to identify "battery life" as an aspect with positive sentiment and "camera quality" as an aspect with negative sentiment.

Why is Aspect-Based Sentiment Analysis Important?

ABSA moves you beyond a generic 'happy or unhappy' score, empowering you to pinpoint exactly which features customers adore or despise. This granular insight is invaluable for making data-driven decisions, whether you're prioritizing product improvements, crafting marketing campaigns, or refining customer support strategies. By analyzing customer reviews over time, you can also track whether sentiment towards specific aspects is improving or declining, which in turn helps you make informed decisions about product development and marketing strategies.

Project Overview and Challenges

The core challenge I faced in this project stemmed from its inherently multi-stage nature. Typically, to achieve Aspect-Based Sentiment Analysis, you'd perform two distinct tasks: first, extracting the aspects from the reviews, and then determining the sentiment associated with each extracted aspect. This proved particularly challenging because, in real-world scenarios, explicit aspect labels are rarely available within raw review texts.

Tech Stack

Why Use BERT?

Traditional models look at words in isolation, but BERT understands the context of each word in a sentence. This helps it identify aspects and their sentiments more accurately, especially when the same word can mean different things in different contexts. Its pre-training on vast amounts of text data makes it incredibly powerful for a wide range of NLP tasks, including this complex multi-task ABSA problem.

Initial Approach: Multi-Step Analysis

My initial approach involved a classic two-stage pipeline: first, I extracted aspects from reviews using one pre-trained BERT model, and then I employed a separate pre-trained BERT model to classify the sentiment of each identified aspect.

While this approach was effective, it came with notable drawbacks: significant processing time, a need for substantial manual optimization, and an increased project complexity owing to the deployment of two distinct models.

Moving Towards a Single-Step Approach

To overcome these limitations, I then explored a single-step approach. The goal was to use a single pre-trained BERT model to simultaneously extract aspects and their associated sentiment, offering a simpler and faster alternative to the multi-step methodology. This streamlined process, while requiring a more complex architecture and potentially more training data, promised significant efficiency gains.

Neural Network Pipeline

Neural Network Pipeline

How Does the Model Work?

  1. The review text is split into tokens (words or subwords).
  2. Each token is converted to an ID and passed to BERT, a language model that understands context.
  3. BERT outputs a representation for each token.
  4. The first head (aspect extraction) predicts which tokens are part of an aspect (using BIO tags).
  5. For each detected aspect, the second head (sentiment classifier) predicts if the sentiment is positive, negative, or neutral.

Dataset

The dataset used in this project is the SemEval 2014 Task 4 dataset, which contains customer reviews of laptops. The dataset is divided into three parts: the training set, the test set, and the validation set. The dataset is available on the SemEval website. The dataset is in the form of a JSONL file, with each row representing a review and its associated aspect and sentiment labels. The aspect categories include "battery", "camera", "design", "performance", and "price". The sentiment labels are "positive", "negative", and "neutral".

Below is the format of the training data.

{
    "text": "now i ' m really bummed that i have a very nice looking chromebook with a beautiful screen that is totally unusable .",
    "labels": [
        {"aspect": "chromebook", "opinion": "nice", "polarity": "positive", "category": "LAPTOP#DESIGN_FEATURES"},
        {"aspect": "chromebook", "opinion": "bummed", "polarity": "negative", "category": "LAPTOP#OPERATION_PERFORMANCE"},
        {"aspect": "chromebook", "opinion": "unusable", "polarity": "negative", "category": "LAPTOP#OPERATION_PERFORMANCE"},
        {"aspect": "screen", "opinion": "beautiful", "polarity": "positive", "category": "DISPLAY#OPERATION_PERFORMANCE"}
    ]
}

Dataset Preprocessing

Before feeding the data into our deep learning models, thorough preprocessing is crucial. This involved several key steps:

  1. Tokenization: Converting text into numerical tokens that the BERT model can understand. I used the BertTokenizer from Hugging Face for this.
  2. Aspect Span Identification: Identifying the start and end positions of each aspect within the tokenized sentence.
  3. Sentiment Mapping: Converting sentiment labels (e.g., "positive", "negative") into numerical representations.
  4. Creating Masks: Generating aspect masks to highlight the relevant tokens for each aspect during training.
  5. Padding and Truncation: Ensuring all input sequences have a uniform length by padding shorter sequences and truncating longer ones.
  6. BIO Tagging: Assigning BIO tags to each token in the input sequence to indicate whether it is part of an aspect or not.

BIO Tagging Scheme

In many NLP tasks, including aspect extraction, named entity recognition, and part-of-speech tagging, a common tagging scheme is used to label tokens in a sequence. The BIO (Beginning, Inside, Outside) tagging scheme is one such method. It helps identify the boundaries of entities or aspects within a text. Once the tags are assigned, they are converted into numerical labels for model training.

TagMeaning
B-ASPBeginning of an aspect
I-ASPInside an aspect
OOutside any aspect

Example:

Text: "The battery life is great"
 
Tags: O B-ASP I-ASP O O

Here's an example of the final preprocessed data structure for a single review:

Tokens: ['[CLS]', 'now', 'i', "'", 'm', 'really', 'bum', '##med', 'that', 'i', 'have', 'a', 'very', 'nice', 'looking', 'chrome', '##book', 'with', 'a', 'beautiful', 'screen', 'that', 'is', 'totally', 'un', '##usa', '##ble', '.', '[SEP]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]', '[PAD]']
Input IDs: [101, 2085, 1045, 1005, 1049, 2428, 26352, 7583, 2008, 1045, 2031, 1037, 2200, 3835, 2559, 18546, 8654, 2007, 1037, 3376, 3898, 2008, 2003, 6135, 4895, 10383, 3468, 1012, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Attention Mask: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
Aspect Spans: [[15, 16], [20, 20]]
Polarities: [1, 1]
Token Labels: ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-ASP', 'I-ASP', 'O', 'O', 'O', 'B-ASP', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']

Explanation of the output

The code for preprocessing the data can be viewed in the following notebook. Note that code will not work as data is not available in the Google Colab environment.

Open In Colab

Multi Step Approach

Aspect-based sentiment analysis involves a two-step process. First extract the aspects from the reviews, and then determine the sentiment associated with each extracted aspect. This is a classic two-stage pipeline approach. In my approach, aspects were extracted from reviews using a pre-trained BERT model. Subsequently, a separate pre-trained BERT model was employed to classify the sentiment of each identified aspect. While effective, this methodology presented drawbacks including significant processing time, substantial manual optimization, and increased project complexity due to the use of two distinct models. That is why I decided to move towards a single-step approach.

Single Step Approach

A single pre-trained BERT model was used to simultaneously extract aspects and their associated sentiment, offering a simpler and faster alternative to multi-step approaches. This single-step method required a more complex architecture and more training data but streamlined the process. The following methods were employed to implement this approach.

General Model Architecture

Aspect Extraction Head
Because I will be extracting the aspects and determining the sentiment associated with each aspect, I will be using multi-head neural network. The first head will be used to extract the aspects. This will be powered by Bert model.

Sentiment Classification Head:
The second head will be used to determine the sentiment associated with each aspect. The output of the first head will be used as the input to the second head. The final output will be a list of aspects and their associated sentiment polarities.

I tried 3 different methods to implement the sentiment classification head. The methods are as follows:

I will discuss each of these methods in detail in the following sections.

In the upcoming code snippets, I have used mps (Metal Performance Shaders) device for GPU acceleration. If you are using a different device, please change the device to cuda or cpu as per your requirement.

Using Mean Pooling

Model Architecture

Below is the model architecture for Multi-Head Neural Network using Mean Pooling. The self.classifier is the Aspect Extraction Head and self.sentiment_classifier is the Sentiment Classification Head. The forward method takes the input IDs and attention mask as input and returns the logits and sequence output.
Sequence output is the contextualized embeddings for each token in the input sequence. The logits are the output of the Aspect Extraction Head.

class AspectDetectionModel(nn.Module):
    def __init__(self):
        super(AspectDetectionModel, self).__init__()
        self.bert = AutoModel.from_pretrained("bert-base-uncased")
        self.dropout = nn.Dropout(0.3)
        self.classifier = nn.Linear(self.bert.config.hidden_size, len(label2id)) # Aspect Extraction Head
        self.sentiment_classifier = nn.Linear(self.bert.config.hidden_size, 3)  # Sentiment Classification Head
 
    def forward(self, input_ids, attention_mask):
        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
        sequence_output = self.dropout(outputs.last_hidden_state)  # [B, L, H]
        logits = self.classifier(sequence_output)  # [B, L, num_labels]
        return logits, sequence_output

Below is the training loop for the model. It takes the input IDs, attention mask, and labels as input and returns the loss. If you notice, I am using sentiment loss as well. The sentiment loss is calculated using the mean pooling of the sequence output.

# Training Loop Snippet
num_epochs = 20
 
for epoch in range(num_epochs):
    total_aspect_train_loss = 0
    total_sentiment_train_loss = 0
 
    model.train()
    total_train_loss = 0
 
    for batch in train_dataloader:
        input_ids = batch["input_ids"].to("mps")
        attention_mask = batch["attention_mask"].to("mps")
        labels = batch["labels"].to("mps")
        
        # logits contains the aspect extraction head output and sequence_output contains the contextualized embeddings
        logits, sequence_output = model(input_ids, attention_mask)
        loss = criterion(logits.view(-1, len(label2id)), labels.view(-1))
        total_aspect_train_loss += loss.item()
 
        sentiment_losses = []
        for i in range(len(input_ids)):
            for aspect_index, sentiment in zip(batch["aspects_index"][i], batch["aspects_sentiment"][i]):
                if aspect_index[1] >= sequence_output.size(1):
                    continue
                # Assume the aspect span is a list of words ['chrome', '##book'], BIO tags are ['B-ASP', 'I-ASP'], and indices are [15, 16]
                # Each word in the aspect span have its own embedding in the sequence output
                # We take the mean of the embeddings for the aspect span
                # aspect_index = [15, 16] means we take the mean of sequence_output[i, 15:17]
                pooled = sequence_output[i, aspect_index[0]:aspect_index[1]+1].mean(dim=0)
                sentiment_logits = model.sentiment_classifier(pooled.unsqueeze(0))
                sentiment_target = torch.tensor([sentiment], dtype=torch.long).to("mps")
                sentiment_loss = criterion(sentiment_logits.view(-1, 3), sentiment_target)
                sentiment_losses.append(sentiment_loss)
 
        if sentiment_losses:
            sentiment_loss = torch.stack(sentiment_losses).mean()
        else:
            sentiment_loss = torch.tensor(0.0).to("mps")
 
        total_sentiment_train_loss += sentiment_loss.item()
        total_loss = aspect_loss + sentiment_loss
 
        ...

Below is the evaluation loop for the model

 
def extract_aspect_spans(pred_labels):
    """
    Extracts aspect spans from the predicted labels.
    Args:
        pred_labels (list): List of predicted labels (BIO tags).
    Returns:
        spans (list): List of aspect spans, where each span is a list of [start, end] indices.
    """
    spans = []
    i = 0
    while i < len(pred_labels):
        if (pred_labels[i] == 1):  # B-ASP
            start = i
            i += 1
            while i < len(pred_labels) and pred_labels[i] == 2:  # I-ASP
                i += 1
            end = i - 1
            spans.append([start, end])
        else:
            i += 1
    return spans
 
# Validation Loop Snippet
 
total_aspect_val_loss = 0
total_sentiment_val_loss = 0
 
for batch in val_dataloader:
    input_ids = batch["input_ids"].to("mps")
    attention_mask = batch["attention_mask"].to("mps")
    labels = batch["labels"].to("mps")
 
    logits, sequence_output = model(input_ids, attention_mask)
    aspect_loss = criterion(logits.view(-1, len(label2id)), labels.view(-1))
    total_aspect_val_loss += aspect_loss.item()
 
    sentiment_losses = []
    preds = torch.argmax(logits, dim=2)
    for i in range(len(input_ids)):
        # During training, I used the existing aspect spans from the training data. But during evaluation, 
        # I will extract the aspect spans from the predicted labels
        # This is because the aspect spans are not available in the validation data
        # I will use the predicted labels to extract the aspect spans
        # The predicted labels are in the form of BIO tags 
        aspects = extract_aspect_spans(preds[i].cpu().tolist())
        for aspect_index, sentiment in zip(batch["aspects_index"][i], batch["aspects_sentiment"][i]):
            if aspect_index in aspects and aspect_index[1] < sequence_output.size(1):
                pooled = sequence_output[i, aspect_index[0]:aspect_index[1]+1].mean(dim=0)
                sentiment_logits = model.sentiment_classifier(pooled.unsqueeze(0))
                sentiment_target = torch.tensor([sentiment], dtype=torch.long).to("mps")
                sentiment_loss = criterion(sentiment_logits.view(-1, 3), sentiment_target)
                sentiment_losses.append(sentiment_loss)
 
    if sentiment_losses:
        sentiment_loss = torch.stack(sentiment_losses).mean()
    else:
        sentiment_loss = torch.tensor(0.0).to("mps")
 
    total_sentiment_val_loss += sentiment_loss.item()

In above code you can see that I am using the extract_aspect_spans function to extract the aspect spans from the predicted labels. The predicted labels are in the form of BIO tags. The extract_aspect_spans function takes the predicted labels as input and returns the aspect spans.

Model Performance

Aspect Extraction Metrics

              precision    recall  f1-score   support
 
         ASP       0.76      0.83      0.79     11700
 
   micro avg       0.76      0.83      0.79     11700
   macro avg       0.76      0.83      0.79     11700
weighted avg       0.76      0.83      0.79     11700

Sentiment Classification Metrics

              precision    recall  f1-score   support
 
         neg       0.90      0.88      0.89      2329
         pos       0.94      0.97      0.96      6686
         neu       0.61      0.42      0.50       667
 
    accuracy                           0.91      9682
   macro avg       0.82      0.76      0.78      9682
weighted avg       0.91      0.91      0.91      9682

From above report we can see that model performance was good on the validation set. The model was able to extract the aspects with a good precision and recall. The sentiment classification was also good with a good precision and recall.

Let us look at some examples of the model predictions.

Review

"the high prices you ' re going to pay is for the view not for the food ."

Output

BIO TagWordSentiment
Othe
Ohigh
B-ASPpricesnegative
Oyou
O'
Ore
Ogoing
Oto
Opay
Ois
Ofor
Othe
Oview
Onot
Ofor
Othe
B-ASPfoodnegative
O.

Review

"This remote control car is fun, fast, and easy to handle—perfect for kids! The build quality is sturdy and it runs smoothly on different surfaces. Battery life is decent and the controls are very responsive. A great gift for kids!"

Output

BIO TagWordSentiment
Othis
B-ASPremotepositive
I-ASPcontrolpositive
I-ASPcarpositive
Ois
Ofun
O,
Ofast
O,
Oand
Oeasy
Oto
Ohandle
O
Operfect
Ofor
Okids
O!
Othe
B-ASPbuildpositive
I-ASPqualitypositive
Ois
Osturdy
Oand
Oit
Oruns
Osmoothly
Oon
Odifferent
Osurfaces
O.
B-ASPbatterypositive
I-ASPlifepositive
Ois
Odecent
Oand
Othe
B-ASPcontrolspositive
Oare
Overy
Oresponsive
O.
Oa
Ogreat
Ogift
Ofor
Okids
O!

Review

Car quality is very nice but the controller sucks . The controller of this car do not works properly and the final in the controller do not rotate fully it only rotate like button

Output

BIO TagWordSentiment
B-ASPcarpositive
I-ASPqualitypositive
Ois
Overy
Onice
Obut
Othe
B-ASPcontrollernegative
Osucks
O.
Othe
B-ASPcontrollernegative
Oof
Othis
Ocar
Odo
Onot
Oworks
Oproperly
Oand
Othe
Ofinal
Oin
Othe
B-ASPcontrollernegative
Odo
Onot
Orotate
Ofully
Oit
Oonly
Orotate
Olike
B-ASPbuttonnegative

The code is available in the following notebook. Note that code will not work as data is not available in the Google Colab environment.

Open In Colab

Using MultiHead Attention

Since the previous method was not able to learn sentiment associated with each aspect, I decided to try a different approach. I used multi-head attention to focus on different parts of the aspect term. This allows the model to capture more complex relationships between the tokens in the input sequence. I used the pooled aspect embedding as the query in the attention mechanism, while the original token embeddings serve as both keys and values. The attention layer produces an attended output, emphasizing information most relevant to the aspect.

Why MultiHead Attention?

Multi-head attention is a mechanism that allows the model to focus on different parts of the input sequence when making predictions. It does this by using multiple attention heads, each of which learns to focus on different aspects of the input. This allows the model to capture more complex relationships between the tokens in the input sequence.

In the context of aspect-based sentiment analysis, multi-head attention can be used to focus on different parts of the aspect term when making predictions. For example, if the aspect term is "battery life", the model can learn to focus on different parts of the aspect term (e.g., "battery" and "life") when making predictions. This allows the model to capture more complex relationships between the tokens in the input sequence.

Model Architecture

class SentimentClassifier(nn.Module):
    def __init__(self, hidden_size):
        super().__init__()
        self.attention = nn.MultiheadAttention(embed_dim=hidden_size, num_heads=1, batch_first=True)
        self.dropout = nn.Dropout(0.3)
        self.norm = nn.LayerNorm(hidden_size)
        self.classifier = nn.Linear(hidden_size, 3)  # Sentiment classes: pos, neg, neutral
 
    def forward(self, token_embeddings, aspect_mask):
        # token_embeddings: [B, L, H], aspect_mask: [B, L]
        aspect_mask = aspect_mask.unsqueeze(-1).expand_as(token_embeddings)  # [B, L, H]
         # mask out non-aspect tokens, average aspect token embeddings
        aspect_embeddings = token_embeddings * aspect_mask  # Zero out non-aspect tokens
        # take mean of aspect embeddings
        aspect_pooled = aspect_embeddings.sum(dim=1) / (aspect_mask.sum(dim=1) + 1e-8)  # [B, H]
        query = aspect_pooled.unsqueeze(1)  # [B, 1, H]
        key = value = token_embeddings  # [B, L, H]
 
        attended_output, attn_weights = self.attention(query, key, value)  # [B, 1, H]
        attended_output = self.dropout(attended_output)
        attended_output = self.norm(attended_output)
 
        logits = self.classifier(attended_output.squeeze(1))  # [B, 3]
        return logits, attn_weights 
 
 
class AspectDetectionModel(nn.Module):
    def __init__(self):
        super(AspectDetectionModel, self).__init__()
        self.bert = AutoModel.from_pretrained("bert-base-uncased")
        self.dropout = nn.Dropout(0.3)
        self.token_classifier = nn.Linear(self.bert.config.hidden_size, len(label2id))
        self.sentiment_classifier = SentimentClassifier(hidden_size=self.bert.config.hidden_size)
 
    def forward(self, input_ids, attention_mask):
        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
        sequence_output = self.dropout(outputs.last_hidden_state)  # [B, L, H]
        token_logits = self.token_classifier(sequence_output)  # For aspect term tagging (BIO)
        return token_logits, sequence_output

The forward method for SentimentClassifier takes the token embeddings and aspect mask as input and returns the sentiment logits and attention weights. The sentiment logits is consumed inside the training loop to calculate the sentiment loss.

I am generating the aspect mask for each aspect span. The main job of aspect mask is to be zero out the non-aspect tokens. After that, the aspect embeddings are added to get the aspect pooled embeddings, and then divided by the number of aspect tokens.

This is to build the query matrix for the multi-head attention. The key and value matrices are the same as the token embeddings. The output of the multi-head attention is then passed to the Linear layer to get the sentiment logits.

Below is the training loop for the model.

num_epochs = 10
 
for epoch in range(num_epochs):
    total_aspect_train_loss = 0
    total_sentiment_train_loss = 0
 
    model.train()
    total_train_loss = 0
 
    for batch in train_dataloader:
        input_ids = batch["input_ids"].to("mps")
        attention_mask = batch["attention_mask"].to("mps")
        labels = batch["labels"].to("mps")
 
        token_logits, sequence_output = model(input_ids, attention_mask)
        aspect_loss = criterion(token_logits.view(-1, len(label2id)), labels.view(-1))
        total_aspect_train_loss += aspect_loss.item()
 
        sentiment_losses = []
        for i in range(len(input_ids)):
            for aspect_index, sentiment in zip(batch["aspects_index"][i], batch["aspects_sentiment"][i]):
                if aspect_index[1] >= sequence_output.size(1):
                    continue
                # build a zero tensor of the same size as input_ids
                aspect_mask = torch.zeros_like(input_ids, dtype=torch.float).to("mps")
                # set the aspect span to 1
                aspect_mask[i, aspect_index[0]:aspect_index[1]+1] = 1
                sentiment_logits, _ = model.sentiment_classifier(
                    sequence_output[i].unsqueeze(0), aspect_mask[i].unsqueeze(0)
                )
                sentiment_target = torch.tensor([sentiment], dtype=torch.long).to("mps")
                sentiment_loss = criterion(sentiment_logits, sentiment_target)
                sentiment_losses.append(sentiment_loss)
 
        if sentiment_losses:
            sentiment_loss = torch.stack(sentiment_losses).mean()
        else:
            sentiment_loss = torch.tensor(0.0).to("mps")
 
        total_sentiment_train_loss += sentiment_loss.item()
        total_loss = aspect_loss + sentiment_loss
 
        ...

The validation loop is similar to the validation loop in the previous method where extract_aspect_spans function is used to extract the aspect spans from the predicted labels

Model Performance

Below is the model performance on the validation set.

Aspect Extraction Metrics

              precision    recall  f1-score   support
 
         ASP       0.71      0.79      0.75     11700
 
   micro avg       0.71      0.79      0.75     11700
   macro avg       0.71      0.79      0.75     11700
weighted avg       0.71      0.79      0.75     11700

Sentiment Classification Metrics

              precision    recall  f1-score   support
 
         neg       0.84      0.83      0.84      2222
         pos       0.92      0.96      0.94      6279
         neu       0.57      0.33      0.42       635
 
    accuracy                           0.88      9136
   macro avg       0.78      0.71      0.73      9136
weighted avg       0.87      0.88      0.88      9136

From the above report we can see that model performance is good on the validation set. The model was able to extract the aspects with a good precision and recall. The sentiment classification was also good with a good precision and recall.

Following are the some of the examples of the model output. In the following first example, notice that the model was able to extract the aspect "view" ,"prices" and "food" and tagged them with "Negative" sentiment, But the "view" aspect should have been tagged as "Positive" sentiment. I think the model was not able to detect the sarcasm in the review.

Example Review

"the high prices you ' re going to pay is for the view not for the food ."

Output

BIO TagWordSentiment
Othe
Ohigh
Opricesnegative
Oyou
O'
Ore
Ogoing
Oto
Opay
Ois
Ofor
Othe
B-ASPviewnegative
Onot
Ofor
Othe
B-ASPfoodnegative
O.

Example Review

"Car quality is very nice but the controller sucks . The controller of this car do not works properly and the final in the controller do not rotate fully it only rotate like button"

Output

BIO TagWordSentiment
B-ASPcarpositive
I-ASPqualitypositive
Ois
Overy
Onice
Obut
Othe
B-ASPcontrollernegative
Osucks
O.
Othe
B-ASPcontrollernegative
Oof
Othis
Ocar
Odo
Onot
Oworks
Oproperly
Oand
Othe
Ofinal
Oin
Othe
Ocontroller
Odo
Onot
Orotate
Ofully
Oit
Oonly
Orotate
Olike
Obutton

Example Review

"This remote control car is fun, fast, and easy to handle—perfect for kids! The build quality is sturdy and it runs smoothly on different surfaces. Battery life is decent and the controls are very responsive. A great gift for kids!"

Output

BIO TagWordSentiment
Othis
B-ASPremotepositive
I-ASPcontrolpositive
I-ASPcarpositive
Ois
Ofun
O,
Ofast
O,
Oand
Oeasy
Oto
Ohandle
O
Operfect
Ofor
Okids
O!
Othe
B-ASPbuildpositive
I-ASPqualitypositive
Ois
Osturdy
Oand
Oit
Oruns
Osmoothly
Oon
Odifferent
Osurfaces
O.
B-ASPbatterypositive
I-ASPlifepositive
Ois
Odecent
Oand
Othe
B-ASPcontrolspositive
Oare
Overy
Oresponsive
O.
Oa
Ogreat
Ogift
Ofor
Okids
O!

The code is available in the following notebook. Note that code will not work as data is not available in the Google Colab environment.

Open In Colab

Using Simple Attention and CLS Embedding

The Multi-head attention module is a bit complex and requires a lot of data to be trained. So I decided to try a simpler approach. I used the CLS token embedding and simple attention mechanism to get the sentiment logits.
The reason I used simple attention is because it is less complex than multi-head attention and has less number of weights to be trained. This makes it easier to train and less prone to overfitting.
The simple attention mechanism is also easier to interpret, as it provides a single attention score for each token in the input sequence. This allows us to see which tokens are most important for the sentiment classification task.

Model Architecture

class SentimentClassifier(nn.Module):
    def __init__(self, hidden_size):
        super().__init__()
        self.dropout = nn.Dropout(0.3)
        self.norm = nn.LayerNorm(hidden_size*2)
        self.classifier = nn.Linear(hidden_size*2, 3)  # Sentiment classes: pos, neg, neutral
        self.attention_param = nn.Linear(hidden_size,1)
 
    def forward(self, token_embeddings, aspect_mask):
 
        cls_embedding = token_embeddings[:, 0, :]
 
        expaned_aspect_mask = aspect_mask.unsqueeze(-1).expand_as(token_embeddings)  # [B, L, H]
        aspect_embeddings = token_embeddings * expaned_aspect_mask  # Zero out non-aspect tokens
 
        attention_score = self.attention_param(aspect_embeddings).squeeze(-1) # [B, L]
        attention_score = attention_score.masked_fill(aspect_mask == 0, -1e9) # Mask non-aspect tokens
        attention_score = torch.softmax(attention_score, dim=1) # [B, L]
 
        aspect_pooled = torch.bmm(attention_score.unsqueeze(1), token_embeddings).squeeze(1)  # [B, H]
        combined = torch.concat([aspect_pooled, cls_embedding ], dim=1)
 
        combined =  self.dropout(combined)
        combined = self.norm(combined)  # [B, H]
 
        logits = self.classifier(combined)  # [B, 3]
        return logits , attention_score
 
 
class AspectDetectionModel(nn.Module):
    def __init__(self):
        super(AspectDetectionModel, self).__init__()
        self.bert = AutoModel.from_pretrained("bert-base-uncased")
        self.dropout = nn.Dropout(0.3)
        self.token_classifier = nn.Linear(self.bert.config.hidden_size, len(label2id))
        self.sentiment_classifier = SentimentClassifier(hidden_size=self.bert.config.hidden_size)
 
    def forward(self, input_ids, attention_mask):
        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
        sequence_output = self.dropout(outputs.last_hidden_state)  # [B, L, H]
 
        token_logits = self.token_classifier(sequence_output)  # For aspect term tagging (BIO)
        return token_logits, sequence_output

The forward method of SentimentClassifier processes the input as follows:

Below is the training loop for the model.

num_epochs = 10
 
for epoch in range(num_epochs):
    total_aspect_train_loss = 0
    total_sentiment_train_loss = 0
    total_aspect_val_loss = 0
    total_sentiment_val_loss = 0
 
    model.train()
    total_train_loss = 0
 
    for batch in train_dataloader:
        input_ids = batch["input_ids"].to("mps")
        attention_mask = batch["attention_mask"].to("mps")
        labels = batch["labels"].to("mps")
 
        token_logits, sequence_output = model(input_ids, attention_mask)
        aspect_loss = criterion(token_logits.view(-1, len(label2id)), labels.view(-1))
        total_aspect_train_loss += aspect_loss.item()
 
        sentiment_losses = []
        for i in range(len(input_ids)):
            for aspect_index, sentiment in zip(batch["aspects_index"][i], batch["aspects_sentiment"][i]):
                if aspect_index[1] >= sequence_output.size(1):
                    continue
                aspect_mask = torch.zeros_like(input_ids, dtype=torch.float).to("mps")
                aspect_mask[i, aspect_index[0]:aspect_index[1]+1] = 1
                sentiment_logits, _ = model.sentiment_classifier(
                    sequence_output[i].unsqueeze(0), aspect_mask[i].unsqueeze(0)
                )
                sentiment_target = torch.tensor([sentiment], dtype=torch.long).to("mps")
                sentiment_loss = criterion(sentiment_logits, sentiment_target)
                sentiment_losses.append(sentiment_loss)
 
        if sentiment_losses:
            sentiment_loss = torch.stack(sentiment_losses).mean()
        else:
            sentiment_loss = torch.tensor(0.0).to("mps")
 
        total_sentiment_train_loss += sentiment_loss.item()
        total_loss = aspect_loss + sentiment_loss
 
        ....

The validation loop is similar to the validation loop in the previous method where extract_aspect_spans function is used to extract the aspect spans from the predicted labels.

Model Performance

Below is the model performance on the validation set.

Aspect Extraction Metrics

              precision    recall  f1-score   support
 
         ASP       0.75      0.82      0.79     14040
 
   micro avg       0.75      0.82      0.79     14040
   macro avg       0.75      0.82      0.79     14040
weighted avg       0.75      0.82      0.79     14040

Sentiment Classification Metrics

              precision    recall  f1-score   support
 
         neg       0.88      0.88      0.88      2793
         pos       0.93      0.97      0.95      7891
         neu       0.59      0.40      0.48       782
 
    accuracy                           0.91     11466
   macro avg       0.80      0.75      0.77     11466
weighted avg       0.90      0.91      0.90     11466

The above model that used Simple Attentions and CLS Embedding gave the best output. Below is the output of the model on the some of sample reviews.

Example Review

"the high prices you ' re going to pay is for the view not for the food ."

Output

BIO TagWordSentiment
Othe
Ohigh
B-ASPpricesnegative
Oyou
O'
Ore
Ogoing
Oto
Opay
Ois
Ofor
Othe
Oview
Onot
Ofor
Othe
B-ASPfoodnegative
O.

Example Review

"This remote control car is fun, fast, and easy to handle—perfect for kids! The build quality is sturdy and it runs smoothly on different surfaces. Battery life is decent and the controls are very responsive. A great gift for kids!"

Output

BIO TagWordSentiment
Othis
B-ASPremotepositive
I-ASPcontrolpositive
I-ASPcarpositive
Ois
Ofun
O,
Ofast
O,
Oand
Oeasy
Oto
Ohandle
O
Operfect
Ofor
Okids
O!
Othe
B-ASPbuildpositive
I-ASPqualitypositive
Ois
Osturdy
Oand
Oit
Oruns
Osmoothly
Oon
Odifferent
Osurfaces
O.
B-ASPbatterypositive
I-ASPlifepositive
Ois
Odecent
Oand
Othe
B-ASPcontrolspositive
Oare
Overy
Oresponsive
O.
Oa
Ogreat
Ogift
Ofor
Okids
O!

Example Review

"Car quality is very nice but the controller sucks . The controller of this car do not works properly and the final in the controller do not rotate fully it only rotate like button"

Output

BIO TagWordSentiment
B-ASPcarpositive
I-ASPqualitypositive
Ois
Overy
Onice
Obut
Othe
B-ASPcontrollernegative
Osucks
O.
Othe
B-ASPcontrollernegative
I-ASPofnegative
Othis
Ocar
Odo
Onot
Oworks
Oproperly
Oand
Othe
Ofinal
Oin
Othe
B-ASPcontrollernegative
Odo
Onot
Orotate
Ofully
Oit
Oonly
Orotate
Olike
Obutton

The code is available in the following notebook. Note that code will not work as data is not available in the Google Colab environment.

Open In Colab

Combined Summary

Below is the combined summary of the model performance on the validation set.

ModelAspect Extraction F1 ScoreSentiment Classification F1 Score [Neg, Pos, Neu]Sentiment Classification Accuracy
Mean Pooling0.790.89, 0.96, 0.500.91
Multi-head Attention0.750.84, 0.94, 0.420.88
Simple Attention + CLS0.790.88, 0.95, 0.480.91

From above table we can see that all the techniques performed well on the aspect extraction task, with F1 scores around 0.79. However, the sentiment classification task showed more variability, with the Mean Pooling method performing best for positive and negative sentiments.

Github Repository

You can find the complete code used for this article in my Github profile. Below is the link to the repository.
https://github.com/gautamnaik1994/Aspect-Based-Sentiment-Analysis

Real World Hurdles in ABSA

Even with the advancements of deep learning and sophisticated models like BERT, Aspect-Based Sentiment Analysis presents its own set of challenges:

Future Scope

Conclusion

This article explored multiple strategies for Aspect-Based Sentiment Analysis, focusing on practical implementation and model performance. The project served as a valuable learning experience in NLP and deep learning, particularly in designing multi-head neural architectures and leveraging transformer models with PyTorch. It also strengthened my skills in evaluating complex deep learning workflows. I hope you found the discussion insightful. Feel free to reach out with any questions or feedback.

Table of Contents