
Customer Sentiment Analysis plays a crucial role in understanding user feedback, improving services, and enhancing customer satisfaction. The architecture depicted in the diagram provides a systematic approach to collecting, processing, analyzing, and visualizing customer sentiments using machine learning models and natural language processing (NLP) techniques. The entire process is divided into multiple stages, from data collection to final insights generation, ensuring that organizations can make data-driven decisions.
1. Data Collection from Multiple Sources
The sentiment analysis process begins with the collection of data from three key sources:
Website: Customer satisfaction ratings and reviews that express sentiments about the platform.
Surveys/Feedback: Study material and tutor rating responses that provide insights into learning effectiveness and tutor performance.
Support Applications: After-support ratings and video quality assessments that help evaluate customer service interactions and content clarity.
These sources generate a mix of structured (ratings) and unstructured (text feedback) data, forming the foundation for sentiment analysis. The data is stored in a centralized system (depicted as a cloud database) that consolidates feedback from different sources for further processing.
2. Data Preprocessing Using Databricks & PySpark
Since customer feedback is often noisy and unstructured, data preprocessing is a crucial step to ensure high-quality and meaningful input for analysis. The Databricks and PySpark framework is used to efficiently process large datasets with high-speed computations. The preprocessing steps include:
Removing Noise: Eliminating unnecessary elements like HTML tags, special characters, and emojis that do not contribute to sentiment understanding.
Handling Missing Data: Addressing missing values in customer reviews by either imputing meaningful values or removing incomplete records.
Standardizing Text: Ensuring uniform formatting to prevent inconsistencies in language variations.
Removing Stopwords: Common words like "the," "is," and "and" are removed since they do not provide contextual meaning.
Removing Punctuation: Eliminating punctuation marks that can interfere with text tokenization and NLP processing.
Converting to Lowercase: Standardizing all text to lowercase to prevent duplicate words due to casing differences.
Stemming and Lemmatization: Reducing words to their root form (e.g., "running" → "run") to enhance text consistency and improve model accuracy.
This preprocessing stage cleans and structures the raw data, preparing it for sentiment classification.
3. Sentiment Labeling Using TextBlob
Once the data is preprocessed, TextBlob, a powerful NLP library, is used for sentiment labeling. TextBlob assigns a sentiment polarity score to customer feedback, categorizing them as:
Positive
Neutral
Negative
The result is a labeled dataset containing both the cleaned text and its corresponding sentiment score. This labeled data serves as the ground truth for training machine learning models.
4. Feature Engineering for Machine Learning Models
To convert textual feedback into a format that machine learning models can understand, feature engineering is applied. The architecture incorporates multiple feature extraction techniques, including:
TF-IDF (Term Frequency-Inverse Document Frequency): Assigns weight to words based on their importance in a document relative to the entire dataset.
Word2Vec: Converts words into numerical vectors based on their context in the sentence.
Bag of Words (BoW): Represents text as a frequency matrix, counting word occurrences in each review.
HFE (Hybrid Feature Extraction): A combination of different techniques to enhance feature representation.
These transformations convert the raw text data into numerical representations, making it suitable for machine learning models.
5. Topic Extraction Using Latent Dirichlet Allocation (LDA)
Beyond sentiment classification, it is also essential to understand common themes in customer feedback. LDA (Latent Dirichlet Allocation) is used for topic extraction, identifying underlying subjects in text reviews. For example, it may reveal topics like:
Tutor Quality
Course Content Issues
Technical Support Problems
Topic modeling helps businesses prioritize areas for improvement based on frequently discussed concerns.
6. Machine Learning Models for Sentiment Classification
After feature extraction, the dataset is divided into training and testing sets, and multiple machine learning models are trained to classify sentiments. The architecture utilizes four key models:
Naive Bayes: A probabilistic classifier commonly used for text classification.
Random Forest: An ensemble learning algorithm that combines multiple decision trees for robust predictions.
Logistic Regression: A statistical model well-suited for binary and multi-class classification tasks.
XGBoost: A gradient boosting algorithm that improves model accuracy and reduces overfitting.
Each model is trained on historical sentiment-labeled data and evaluated to determine the best-performing classifier.
7. Model Evaluation Metrics
To assess model performance, standard evaluation metrics are used:
Precision: Measures the proportion of correctly predicted positive sentiments.
Recall: Assesses how well the model captures actual positive sentiments.
F1 Score: The harmonic mean of precision and recall, balancing both measures.
Accuracy: Overall correctness of sentiment classification.
For better performance, BERT (Bidirectional Encoder Representations from Transformers) is used as an advanced NLP model. BERT significantly improves context understanding and sentiment prediction accuracy by analyzing words in relation to their surrounding text.
8. Visualization & Actionable Insights
After sentiment classification, the insights are visualized using interactive dashboards, providing decision-makers with a clear understanding of customer sentiment trends. The visualizations include:
Actionable Recommendations
Enhancing tutor training based on sentiment trends.
Improving study material and support services.
Key Performance Indicator (KPI): Student Retention Rate (%).
Complaint Heatmap
Identifies the top complaint categories.
Helps prioritize course improvements.
Best vs Worst Rated Courses
Ranks courses based on sentiment and reviews.
KPI: Course Satisfaction Index.
Sentiment Trends Over Time
Tracks customer satisfaction changes over months.
KPI: Monthly Sentiment Change (%).
These insights help businesses optimize learning experiences, improve customer service, and enhance product offerings.

The chart presents the sentiment trends of customer reviews over time, categorized into Positive (Red), Negative (Blue), and Neutral (Gray) sentiments. The x-axis represents the months from January 2023 to January 2024, while the y-axis indicates the number of reviews in each sentiment category.
Key Observations:
Positive Sentiments Dominated Throughout the Year
The positive sentiment (red line) consistently remained the highest among all three categories.
There were slight fluctuations, but overall, the number of positive reviews remained above 1000 reviews per month.
Peaks can be observed at certain intervals, indicating periods of higher satisfaction and engagement.
Negative Sentiment Trends (Blue Line)
The negative sentiment line remained stable, fluctuating around 600-700 reviews per month.
It saw a minor increase in March and August, possibly indicating customer dissatisfaction spikes during these periods.
However, the trend did not show significant improvement or worsening.
Neutral Sentiment Trends (Gray Line)
The neutral sentiment remained the lowest throughout the year, maintaining an average around 400-500 reviews per month.
The variations were minor, indicating a relatively stable segment of users who provided neutral feedback.
Sudden Drop in All Sentiments in January 2024
A drastic decline in all sentiment categories is observed at the start of 2024.
This could indicate:
A drop in customer engagement (fewer reviews collected).
Data collection issues (e.g., missing or incomplete data).
A major policy change or platform issue, reducing user feedback.
End of a campaign or service cycle, leading to fewer interactions.
Possible Business Insights & Recommendations:
Monitor fluctuations in positive sentiment trends: Identify the causes behind peaks in satisfaction and replicate the successful strategies.
Investigate reasons for occasional negative sentiment spikes: Perform a root cause analysis for March and August dips to mitigate similar issues in the future.
Address the drastic decline in January 2024: Validate if this is a data anomaly or a genuine decline in customer engagement.
Enhance Customer Support & Engagement Strategies: Since negative feedback remains consistent, targeted improvements in customer service, product features, and issue resolution may help reduce negative reviews.

The horizontal bar chart represents the average sentiment scores for various courses, categorizing them into best-rated (positive sentiment) and worst-rated (negative sentiment). The x-axis shows the average sentiment score, with positive values indicating better sentiment and negative values indicating poorer sentiment. The y-axis lists different courses, with blue bars representing lower-rated courses and red bars representing higher-rated courses.
Key Observations:
Worst-Rated Courses (Negative Sentiment - Blue Bars)
P3: Advanced Financial Management (AFM) has the lowest sentiment score, making it the worst-rated course.
Other poorly rated courses include:
Financial Management (FM)
Taxation (TX)
Performance Management (PM)
Financial Accounting (FA)
These courses have negative sentiment scores, indicating dissatisfaction among students. Possible reasons could be:
Course difficulty and complexity.
Poor study material or inadequate explanations.
Low-quality tutor support or ineffective teaching methods.
Moderately Rated Courses (Neutral Sentiment - Light Shades)
Courses like Strategic Business Leader (SBL), Business and Technology (BT), and Advanced Audit and Assurance (AAA) are closer to the zero line, indicating mixed feedback from students.
These courses may need minor improvements in content delivery or teaching methods to shift toward a more positive sentiment.
Best-Rated Courses (Positive Sentiment - Red Bars)
The highest-rated course is P4: Advanced Performance Management (APM), with the most positive sentiment score.
Other well-rated courses include:
Audit and Assurance (AA)
Advanced Taxation (ATX)
Corporate and Business Law (LW)
Strategic Business Reporting (SBR)
The positive sentiment indicates that students find these courses well-structured, easy to understand, and beneficial.
These courses may have better study material, engaging instructors, and more effective support systems.
Business Insights & Recommendations:
Investigate the Issues in Poorly Rated Courses
Conduct surveys or analyze student feedback to understand why P3: Advanced Financial Management (AFM) and Financial Management (FM) received negative sentiment.
Improve study materials, tutor support, and exam preparation guidance for these courses.
Consider introducing additional resources like webinars, mentorship sessions, or simplified content.
Leverage Insights from Best-Rated Courses
Identify what makes P4: Advanced Performance Management (APM) and other top courses successful.
Replicate best practices in content structure, tutoring methods, and learning support for other courses.
Monitor Trends Over Time
Continuously track sentiment changes to ensure improvements in poorly rated courses.
Implement feedback-driven enhancements and observe if the sentiment scores improve over time.

Key Observations:
Most Complained About Courses
The courses with the highest number of complaints include:
Performance Management (PM) – 598 complaints
Taxation (TX) – 596 complaints
F2: Management Accounting (MA) – 591 complaints
Audit and Assurance (AA) – 588 complaints
Financial Accounting (FA) – 588 complaints
These courses are represented in the darkest shades, indicating significant dissatisfaction among students.
Courses with Moderate Complaints
Courses such as:
Corporate and Business Law (LW) – 543 complaints
Strategic Business Leader (SBL) – 552 complaints
Advanced Audit and Assurance (AAA) – 557 complaints
These courses still receive a substantial number of complaints but are not the worst.
Least Complained About Course
P3: Advanced Financial Management (AFM) – 499 complaints received the fewest complaints.
This course appears in the lightest shade, indicating relatively lower dissatisfaction compared to other courses.
Potential Reasons for Complaints:
Course Complexity: Courses like Performance Management (PM), Management Accounting (MA), and Taxation (TX) are known for high difficulty levels, leading to more complaints.
Lack of Study Resources: Courses with high complaints may lack adequate learning materials, instructor support, or structured content.
Exam Challenges: If students find exams too difficult or unfairly graded, it could lead to increased dissatisfaction.
Tutor Effectiveness: Courses with ineffective tutors or poor teaching methodologies could result in higher negative feedback.
Business Insights & Recommendations:
Investigate the Root Cause of Complaints in High-Risk Courses
Conduct student surveys to understand the primary reasons behind dissatisfaction in Performance Management (PM), Taxation (TX), and Management Accounting (MA).
Improve study material, exam preparation guidance, and tutor training.
Enhance Learning Support for the Most Complained Courses
Offer additional online tutorials, mentorship sessions, or Q&A forums.
Provide simplified explanations or real-world examples to aid understanding.
Leverage Strengths from the Least Complained Course
P3: Advanced Financial Management (AFM) received the fewest complaints.
Analyze what makes this course more effective and apply those practices to other courses.
Track Complaint Trends Over Time
Continuously monitor if complaints decrease after implementing changes.
If complaints persist, conduct further evaluations and feedback sessions.
Conclusion:
This analysis identifies high-risk courses based on complaint volume, helping institutions prioritize improvements. By addressing concerns in the most complained about courses, educators can enhance student satisfaction, improve course effectiveness, and reduce negative feedback over time.