Submission Category : Softwire Challenge

The challenge tackles how non-diverse bodies can be highlighted and held accountable, or how ones who are doing well can be spotlighted.

Introduction and Elevator Pitch :

Hey everyone, this is team Rose Spiders, comprising of Kavisha Gupta (User Interface expert), Siddesh R Ohri (Machine Learning enthusiast), and Srinath S Janani (Team Project Manager).

Over the course of this hackathon, we have mustered as much creativity and innovation as possible to create our project inspired to protect and strengthen diversity in professional domains - a flagging system to detect micro-aggression in emails.

This detection system leverages natural language processing (NLP) techniques to analyse text inputs. This identifies potential micro-aggressions contributing to negative sentiment.

Framing the Problem :

Upon viewing the different topics available, tackling diversity discrimination in professional settings particularly struck us. It was the most realistically practical topic with the most potential benefits, contributing to such a change would allow us to leave a positive impact on our world. Taking on such projects is what leads our planet a safer and inclusive place to live in.

But then the difficulty of the proposition caught us. The enforcement of diversity is a very subjective and interpersonal matter. How would one be able to translate this into logical code? And even once that is achieved, how would false positives and false negatives be minimised towards forming an optimal solution. As we discussed back and forth, each proposal was met with merits and drawbacks, causing long deliberations over a topic we could truly commit to while delivering an applicable solution.

It is hard to pinpoint all problems relating to diversity to one event. Discrimination is an issue that arises on many surfaces and many places. It would take many measures at once to truly enforce equality among people’s diverseness. Hence whatever topic we had to choose would only be able to cover a portion of the problem at hand.

In workplaces, it is equally important to uphold equality among people, and to reward based on skill and effort rather than privilege. Another issue is harassment and aggression shown towards minority communities and women. Such problems happen in a variety of ways, from verbal abuse to prejudice within writing, to positions withheld favourably to desired persons. There are a plethora of issues pertaining to diversity to handle within workspaces.

The other aspect of the problem is to trace where companies have been able to successfully enforce diversity and equality among employees. Collecting such data and settling appropriate awards also proves to be a challenge that does not have a simple solution. Analysing both sides of the project topic shows how challenging of a topic it is to cover and attempt to improve on. Such topics must be handled delicately and ensure a fair system of equality without any opportunity for misuse to ensure an open diversity.

Idea Explanation :

Our idea centers around detecting subtle discrimination in emails by analyzing text with a trained large language model. The system flags content in cases of microaggressions or harmful language patterns, helping to identify issues related to biases, such as racism, sexism, and other discriminatory behaviors. The system aims not only to notify users but also to gently prompt corrective behavior by providing feedback on potentially problematic language, fostering an opportunity for reflection and improvement in communication.

Currently, the system's core functionality focuses on toxicity and microaggression detection; however, we aim to enhance it with an expanded categorization and detailed feedback system. This planned upgrade would provide tailored feedback on specific categories of microaggression, making the system more informative and actionable.

To support proactive management, the system also compiles flagged data for HR teams to review and use in assessing workplace behavior trends. This report provides HR with insights into communication patterns and allows for early interventions if a recurring pattern of microaggressive language is detected.

Tech Stack :

Our project aimed to tackle micro-aggression detection by text-classification using Natural Language Processing (NLP). While using a ML model, we also aimed to combine this visually with a UI-based emailing system. Finally, a dashboard was also constructed to display the results of this language processing and aggression detection, by summarising potentially flagged misconduct of each user of this email system.

Large Language Model (LLM)

The first and main aspect of our project is the ML model, named as “Model.py”. This program executes the text-classification using a semantic model from Hugging Face transformer models called “AutoModelForSequenceClassification”/Roberta. The semantic analysis tokenizes the input, runs it through the semantic model and then puts a SoftMax on the final probability of micro-aggression. Upon training the model, the program then allows custom text to be tested and classified by the model based on the percentage of Toxicity and varying degrees of micro-aggression.

User Interface

The next aspect is the user interface to demonstrate the usage of the model. This is represented as a mailing system, equipped with a login feature, multiple email listing and individual email viewing dropdown. This is representative of a typical email system companies adopt and serves as a demonstration as to how our project can be implemented into existing systems adaptably. The tech-stack used for this UI design includes a mix of React.js and vanilla CSS.

Dashboard

The final aspect which combines the previous two, is a dashboard system which also uses React.js and vanilla CSS. It displays a summary of email users with counts of potential micro-aggressions that the system has picked up, displayed in a tight UI and summarised the information for the respective party that handles the disciplining of discriminatory actions.

Microaggression Detection AI System: Technical Overview

System Architecture

Web Server: Flask-based REST API with CORS enabled
Endpoint: /analyze for text analysis
Core Models:
Toxic-BERT for toxicity detection
RoBERTa-base for semantic analysis
spaCy for text processing

Model Implementation

1. Toxicity Classification

Uses Unitary/toxic-bert (Hugging Face)
Score range: 0-1
Threshold: 0.5 (configurable)

2. Semantic Analysis

RoBERTa-base with 3-label classification
Categories: Neutral, Positive, Negative
Max sequence length: 512 tokens

Processing Pipeline

# Core Analysis Function
def analyze_text(text):
    toxic_score = toxicity_analysis(text)
    semantic_result = semantic_analysis(text)
    return {
        "toxic_score": toxic_score,
        "is_microaggression": determine_result(toxic_score, semantic_result)
    }

Key Features

Real-time text analysis
Dual-model architecture
JSON API responses
Cross-origin support
PyTorch inference optimization

Limitations & Future Scope

Current: Binary classification
Planned: Context awareness, batch processing
Needed: Input validation, rate limiting

Dependencies

Flask, Flask-CORS
transformers, torch
spaCy (encoreweb_sm)

Time Usage :

Time	Activity
10:30 - 11:15	Announcement of the topics
11:15 - 12:15	Scavenger Hunt
12:15 - 15:30	Topic discussion and finalisation
15:30 - 16:45	Topic detailing and action plan
16:45 - 18:45	Coding of project
22:00 - 04:00	Continued coding and document construction
07:30 - 08:00	Review of progress and milestones reached
09:00 - 10:30	Working on video demo and presentation
10:30 - 12:00	Final edits and submission of project

Accomplishments :

Over the duration of the hackathon, we put our best effort to balance the enjoyment of the event with the work ethic for the competition. In the end, for what was a challenging and intriguing topic for is, managed to collaborate on a solution and implement it strategically within a decently fast amount of time. Especially as our first hackathon for each member, it is a proud achievement to have completed it with our given pace.

We were able to come up with a prototype solution to our satisfaction, while also detailing the various improvements applicable whenever the project can be continued. The teamwork and communication made this project especially enjoyable to communicate ideas with each other and give the best performance we can. We were able to comply to a lot of agile software development principles, one of them being the Minimal Viable Product (MVP).

Next Steps :

The prototype solution was designed to be feasibly constructable within the given time. It currently includes a ML model using NLP, which can detect multiple severities of micro-aggression, point out possible words causing it. This model is also equipped with a UI to view emails, as well as a dashboard of employees indicating their prior warnings based on the severity of the micro-aggressions.

Beyond this prototyped version, there are numerous areas to extend the project.

The text-analysis functionality can be extended to label the discrimination into specific categories.
A dynamic dashboard could be employed to store employee data and track their misconduct-messages in real time.
A database system can be properly established to store data more dynamically, with the options of equipping adding and deleting functions on the data.
Flagged text could potentially have a feedback system, where the user is given an explanation of why their statement is flagged for microaggression.
Additionally, a feedback system to report bugs and improvements can be implemented.

Most importantly, the AI model can be vastly improved over time using bigger training data and being more carefully labelled to improve the accuracy of the algorithm. This improvement would not only benefit this project but would also be highly beneficial to all projects using NLP and sentiment analysis.

Micro-Aggression Detection System (MADS)

Detection for an Integrated Future