Found in Translation

Found in Translation

Description

An intelligent Slack bot for navigating mono and multi lingual messaging.

Problem

In the world of information, it can be difficult to moderate and keep up with communication. Particularly, finding relevant messages and data becomes tedious due to poor search functionalities, so finding words and phrases can be quite difficult if you forget the exact wording. It is doubly difficult if you're in a multinational team, with people speaking different languages, not only do you have to remember the exact phrase, but the exact phrase in a different language!

Solution

To solve this, we've come up with our product: Found in Translation (FiT). FiT is an intelligent bot that not only provides semantic searching, but also sentiment analysis for users or servers. Semantic searching is essentially searching by meaning versus searching by strictly the words given. For example, searching for ""food"" with semantic search might give ""here are some restaurants near you"", but with regular search, will yield something like ""2005 food poisoning"", quite different results.

Tech Stack

Our tech stack consists of a Slack Bolt / JS Front End, connected to a Python / Flask backend, which would point to an Azure container as well as a Pinecone database. The backend also queried Cohere, specifically the Classify and Embed analysis endpoints.

The actual code for the API is in the app.py file. We used Flask to develop an API that interfaces with our two cohere models. The first one is a embedding model (using the multilingual model) where we send in a list of chat messages as the embeddings. We then use Pinecone to store the vectors and use dotproduct similarity to find the most relevant messages to the user query.

The other model is a custom classification model trained using the Google GoEmotion dataset. We feed in a user's/chat's message history and get the resulting classifications, which are then processed and averaged in the slackbot.

The clear.py file is used to manually clear the Pinecone Vector Database in case of needing a restart. This API is dockerized and hosted on an Azure Container Instance.

Cohere Endpoints

We query for specifically two endpoints: Embed and Classify. To solve our issue of multilingual search, we used Cohere's Embed endpoint to associate similar words in vector space, regardless of language. We also wanted to classify employee sentiment as well, and we used Classify to fit words and phrases to specific emotions.

Inspiration

Some of us on our hackathon team work throughout the school year, and when working, we've found that oftentimes our communication channels are full of other languages, for us, it was Spanish, and more than that, we found that we all used Slack for communication itself! So we thought this issue could be efficiently tackled through one of Cohere's features: Embed, and implemented it.