Language Detection

Language detection is a necessary first step for businesses that deal with multilingual user bases. Whether you are working with a single multi-lingual model or a multi-model environment, understanding the language of a request (e.g. query, input) is paramount to a good user experience.

Detect Language API identifies the language used in each of the provided texts. For each input, the API will come back with the following:

  • language_name - the full name of the language the input is in.
  • language_code - the ISO code of the language. For example, the language code for English is en.

Use Cases

Single Model Environment

Use a single multilingual model to handle both English and non-English queries. Identify the language of an incoming query and filter your results by matching languages for monolingual retrieval with a multilingual model - in addition, you can specify which languages you want to filter for in a cross-lingual retrieval setup.

Multi Model Environment

Use multiple models for English and non-English embeddings. Identify a query in its respective language and route the request to different models depending on your setup. For example, if a query is in identified as English, route it to our default English embed model, if not, route it to our multilingual embed model.

Examples

Input

response = co.detect_language(texts=["Hello World","2023 will be my year"])

Response

results: [{language_code:"en", language_name:"English"},
{language_code:"en", language_name:"English"}]

List of Supported Languages

Language CodeLanguage Name
afAfrikaans
alsAlbanian
amAmharic
anAragonese
arArabic
arzArabic (Egyptian)
asAssamese
astAsturian
avAvaric
azAzerbaijani
azbSouth Azerbaijani
baBashkir
barBavarian
bclCentral Bikol
beBelarusian
bgBulgarian
bhBihari
bnBengali
boTibetan
bpyBishnupriya Manipuri
brBreton
bsBosnian
bxrBuryat
caCatalan
cbkChavacano
ceChechen
cebCebuano
ckbCentral Kurdish
coCorsican
csCzech
cvChuvash
cyWelsh
daDanish
deGerman
diqZazaki
dsbLower Sorbian
dtyDoteli
dvDhivehi
elGreek
emlEmilian-Romagnol
enEnglish
eoEsperanto
esSpanish
etEstonian
euBasque
faPersian
fiFinnish
frFrench
frrFrisian
fyWestern Frisian
gaIrish
gdGaelic (Scotland)
glGalician
gnGuarani
gomKonkani
guGujarati
gvManx
heHebrew
hiHindi
hifFiji Hindi
hrCroatian
hsbUpper Sorbian
htHaitian Creole
huHungarian
hyArmenian
iaInterlingua
idIndonesian
ieInterlingue; Occidental
iloIloko
ioIdo
isIcelandic
itItalian
jaJapanese
jboLojban
jvJavanese
kaGeorgian
kkKazakh
kmKhmer
knKannada
koKorean
krcKarachay-Balkar
kuKurdish
kvKomi
kwCornish
kyKyrgyz
laLatin
lbLuxembourgish
lezLezghian
liLimburgan
lmoLombard
loLaothian
lrcNorthern Luri
ltLithuanian
lvLatvian
maiMaithili
mgMalagasy
mhrMeadow Mari
minMinangkabau
mkMacedonian
mlMalayalam
mnMongolian
mrMarathi
mrjHill Mari
msMalay
mtMaltese
mwlMirandese
myBurmese
myvErzya
mznMazandarani
nahNahuatl
napNeapolitan
ndsLow German
neNepali
newNepal Bhasa
nlDutch
nnNorwegian Nynorsk
noNorwegian
ocOccitan (post 1500)
orOriya
osOssetian
paPunjabi
pamPampanga
pflPalatine German
plPolish
pmsPiedmontese
pnbPakistani Punjabi
psPushto
ptPortuguese
quQuechua
rmRomansh
roRomanian
ruRussian
rueRusyn
saSanskrit
sahYakut
scSardinian
scnSicilian
scoScots
sdSindhi
shSerbo-Croatian
siSinhalese
skSlovak
slSlovenian
soSomali
sqAlbanian
srSerbian
suSundanese
svSwedish
swSwahili
taTamil
teTelugu
tgTajik
thThai
tkTurkmen
tlTagalog
trTurkish
ttTatar
tyvTuvinian
ugUighur
ukUkrainian
urUrdu
uzUzbek
vecVenetian
vepVeps
viVietnamese
vlsVlaams
voVolapük
waWalloon
warWaray
wuuWu Chinese
xalKalmyk
xmfMingrelian
yiYiddish
yoYoruba
yueYue Chinese
zhChinese