Language Detection
Language detection is a necessary first step for businesses that deal with multilingual user bases. Whether you are working with a single multi-lingual model or a multi model environment, understanding the language of a request (e.g. query, input) is paramount to a good user experience.
Co.detect_language is an endpoint gives the following information for a text input:
- The full name (
language_name
) of the language the input is in. - A
language_code
describing the language the input is in. For example, the ISO code for English isen
.
Use Cases
Single Model Environment
Use a single multilingual model to handle both english and non-english queries. Identify the language of an incoming query and filter the results of your results by matching languages for monolingual retrieval with a multilingual model - in addition, you can specify which languages you want to filter for in a cross-lingual retrieval setup.
Multi Model Environment
Use multiple models for English and non-English embeddings. Identify a query in its respective language and route the request to different models depending on your setup. For example, if a query is in identified as English, route it to our default English embedding model, if not, route it to our multilingual embedding model.
Examples
Input
response = co.detect_language(texts=["Hello World","2023 will be my year"])
Response
results: [{language_code:"en", language_name:"English"},
{language_code:"en", language_name:"English"}]
List of Supported Languages
Language Code | Language Name |
---|---|
af | Afrikaans |
als | Albanian |
am | Amharic |
an | Aragonese |
ar | Arabic |
arz | Arabic (Egyptian) |
as | Assamese |
ast | Asturian |
av | Avaric |
az | Azerbaijani |
azb | South Azerbaijani |
ba | Bashkir |
bar | Bavarian |
bcl | Central Bikol |
be | Belarusian |
bg | Bulgarian |
bh | Bihari |
bn | Bengali |
bo | Tibetan |
bpy | Bishnupriya Manipuri |
br | Breton |
bs | Bosnian |
bxr | Buryat |
ca | Catalan |
cbk | Chavacano |
ce | Chechen |
ceb | Cebuano |
ckb | Central Kurdish |
co | Corsican |
cs | Czech |
cv | Chuvash |
cy | Welsh |
da | Danish |
de | German |
diq | Zazaki |
dsb | Lower Sorbian |
dty | Doteli |
dv | Dhivehi |
el | Greek |
eml | Emilian-Romagnol |
en | English |
eo | Esperanto |
es | Spanish |
et | Estonian |
eu | Basque |
fa | Persian |
fi | Finnish |
fr | French |
frr | Frisian |
fy | Western Frisian |
ga | Irish |
gd | Gaelic (Scotland) |
gl | Galician |
gn | Guarani |
gom | Konkani |
gu | Gujarati |
gv | Manx |
he | Hebrew |
hi | Hindi |
hif | Fiji Hindi |
hr | Croatian |
hsb | Upper Sorbian |
ht | Haitian Creole |
hu | Hungarian |
hy | Armenian |
ia | Interlingua |
id | Indonesian |
ie | Interlingue; Occidental |
ilo | Iloko |
io | Ido |
is | Icelandic |
it | Italian |
ja | Japanese |
jbo | Lojban |
jv | Javanese |
ka | Georgian |
kk | Kazakh |
km | Khmer |
kn | Kannada |
ko | Korean |
krc | Karachay-Balkar |
ku | Kurdish |
kv | Komi |
kw | Cornish |
ky | Kyrgyz |
la | Latin |
lb | Luxembourgish |
lez | Lezghian |
li | Limburgan |
lmo | Lombard |
lo | Laothian |
lrc | Northern Luri |
lt | Lithuanian |
lv | Latvian |
mai | Maithili |
mg | Malagasy |
mhr | Meadow Mari |
min | Minangkabau |
mk | Macedonian |
ml | Malayalam |
mn | Mongolian |
mr | Marathi |
mrj | Hill Mari |
ms | Malay |
mt | Maltese |
mwl | Mirandese |
my | Burmese |
myv | Erzya |
mzn | Mazandarani |
nah | Nahuatl |
nap | Neapolitan |
nds | Low German |
ne | Nepali |
new | Nepal Bhasa |
nl | Dutch |
nn | Norwegian Nynorsk |
no | Norwegian |
oc | Occitan (post 1500) |
or | Oriya |
os | Ossetian |
pa | Punjabi |
pam | Pampanga |
pfl | Palatine German |
pl | Polish |
pms | Piedmontese |
pnb | Pakistani Punjabi |
ps | Pushto |
pt | Portuguese |
qu | Quechua |
rm | Romansh |
ro | Romanian |
ru | Russian |
rue | Rusyn |
sa | Sanskrit |
sah | Yakut |
sc | Sardinian |
scn | Sicilian |
sco | Scots |
sd | Sindhi |
sh | Serbo-Croatian |
si | Sinhalese |
sk | Slovak |
sl | Slovenian |
so | Somali |
sq | Albanian |
sr | Serbian |
su | Sundanese |
sv | Swedish |
sw | Swahili |
ta | Tamil |
te | Telugu |
tg | Tajik |
th | Thai |
tk | Turkmen |
tl | Tagalog |
tr | Turkish |
tt | Tatar |
tyv | Tuvinian |
ug | Uighur |
uk | Ukrainian |
ur | Urdu |
uz | Uzbek |
vec | Venetian |
vep | Veps |
vi | Vietnamese |
vls | Vlaams |
vo | Volapük |
wa | Walloon |
war | Waray |
wuu | Wu Chinese |
xal | Kalmyk |
xmf | Mingrelian |
yi | Yiddish |
yo | Yoruba |
yue | Yue Chinese |
zh | Chinese |
Updated 5 months ago