Crafting experience...
10/27/2024
A Project Made By
Submitted for
Built At
Jump Start Hackathon
Hosted By
The challenge tackles how non-diverse bodies can be highlighted and held accountable, or how ones who are doing well can be spotlighted.
Hey everyone, this is team Rose Spiders, comprising of Kavisha Gupta (User Interface expert), Siddesh R Ohri (Machine Learning enthusiast), and Srinath S Janani (Team Project Manager).
Over the course of this hackathon, we have mustered as much creativity and innovation as possible to create our project inspired to protect and strengthen diversity in professional domains - a flagging system to detect micro-aggression in emails.
This detection system leverages natural language processing (NLP) techniques to analyse text inputs. This identifies potential micro-aggressions contributing to negative sentiment.
Upon viewing the different topics available, tackling diversity discrimination in professional settings particularly struck us. It was the most realistically practical topic with the most potential benefits, contributing to such a change would allow us to leave a positive impact on our world. Taking on such projects is what leads our planet a safer and inclusive place to live in.
But then the difficulty of the proposition caught us. The enforcement of diversity is a very subjective and interpersonal matter. How would one be able to translate this into logical code? And even once that is achieved, how would false positives and false negatives be minimised towards forming an optimal solution. As we discussed back and forth, each proposal was met with merits and drawbacks, causing long deliberations over a topic we could truly commit to while delivering an applicable solution.
It is hard to pinpoint all problems relating to diversity to one event. Discrimination is an issue that arises on many surfaces and many places. It would take many measures at once to truly enforce equality among people’s diverseness. Hence whatever topic we had to choose would only be able to cover a portion of the problem at hand.
In workplaces, it is equally important to uphold equality among people, and to reward based on skill and effort rather than privilege. Another issue is harassment and aggression shown towards minority communities and women. Such problems happen in a variety of ways, from verbal abuse to prejudice within writing, to positions withheld favourably to desired persons. There are a plethora of issues pertaining to diversity to handle within workspaces.
The other aspect of the problem is to trace where companies have been able to successfully enforce diversity and equality among employees. Collecting such data and settling appropriate awards also proves to be a challenge that does not have a simple solution. Analysing both sides of the project topic shows how challenging of a topic it is to cover and attempt to improve on. Such topics must be handled delicately and ensure a fair system of equality without any opportunity for misuse to ensure an open diversity.
Our idea centers around detecting subtle discrimination in emails by analyzing text with a trained large language model. The system flags content in cases of microaggressions or harmful language patterns, helping to identify issues related to biases, such as racism, sexism, and other discriminatory behaviors. The system aims not only to notify users but also to gently prompt corrective behavior by providing feedback on potentially problematic language, fostering an opportunity for reflection and improvement in communication.
Currently, the system's core functionality focuses on toxicity and microaggression detection; however, we aim to enhance it with an expanded categorization and detailed feedback system. This planned upgrade would provide tailored feedback on specific categories of microaggression, making the system more informative and actionable.
To support proactive management, the system also compiles flagged data for HR teams to review and use in assessing workplace behavior trends. This report provides HR with insights into communication patterns and allows for early interventions if a recurring pattern of microaggressive language is detected.
Our project aimed to tackle micro-aggression detection by text-classification using Natural Language Processing (NLP). While using a ML model, we also aimed to combine this visually with a UI-based emailing system. Finally, a dashboard was also constructed to display the results of this language processing and aggression detection, by summarising potentially flagged misconduct of each user of this email system.
The first and main aspect of our project is the ML model, named as “Model.py”. This program executes the text-classification using a semantic model from Hugging Face transformer models called “AutoModelForSequenceClassification”/Roberta. The semantic analysis tokenizes the input, runs it through the semantic model and then puts a SoftMax on the final probability of micro-aggression. Upon training the model, the program then allows custom text to be tested and classified by the model based on the percentage of Toxicity and varying degrees of micro-aggression.
The next aspect is the user interface to demonstrate the usage of the model. This is represented as a mailing system, equipped with a login feature, multiple email listing and individual email viewing dropdown. This is representative of a typical email system companies adopt and serves as a demonstration as to how our project can be implemented into existing systems adaptably. The tech-stack used for this UI design includes a mix of React.js and vanilla CSS.
The final aspect which combines the previous two, is a dashboard system which also uses React.js and vanilla CSS. It displays a summary of email users with counts of potential micro-aggressions that the system has picked up, displayed in a tight UI and summarised the information for the respective party that handles the disciplining of discriminatory actions.
/analyze
for text analysis# Core Analysis Function
def analyze_text(text):
toxic_score = toxicity_analysis(text)
semantic_result = semantic_analysis(text)
return {
"toxic_score": toxic_score,
"is_microaggression": determine_result(toxic_score, semantic_result)
}
Time | Activity |
---|---|
10:30 - 11:15 | Announcement of the topics |
11:15 - 12:15 | Scavenger Hunt |
12:15 - 15:30 | Topic discussion and finalisation |
15:30 - 16:45 | Topic detailing and action plan |
16:45 - 18:45 | Coding of project |
22:00 - 04:00 | Continued coding and document construction |
07:30 - 08:00 | Review of progress and milestones reached |
09:00 - 10:30 | Working on video demo and presentation |
10:30 - 12:00 | Final edits and submission of project |
Over the duration of the hackathon, we put our best effort to balance the enjoyment of the event with the work ethic for the competition. In the end, for what was a challenging and intriguing topic for is, managed to collaborate on a solution and implement it strategically within a decently fast amount of time. Especially as our first hackathon for each member, it is a proud achievement to have completed it with our given pace.
We were able to come up with a prototype solution to our satisfaction, while also detailing the various improvements applicable whenever the project can be continued. The teamwork and communication made this project especially enjoyable to communicate ideas with each other and give the best performance we can. We were able to comply to a lot of agile software development principles, one of them being the Minimal Viable Product (MVP).
The prototype solution was designed to be feasibly constructable within the given time. It currently includes a ML model using NLP, which can detect multiple severities of micro-aggression, point out possible words causing it. This model is also equipped with a UI to view emails, as well as a dashboard of employees indicating their prior warnings based on the severity of the micro-aggressions.
Beyond this prototyped version, there are numerous areas to extend the project.
Most importantly, the AI model can be vastly improved over time using bigger training data and being more carefully labelled to improve the accuracy of the algorithm. This improvement would not only benefit this project but would also be highly beneficial to all projects using NLP and sentiment analysis.