Wikipedia Semantic Search with Cohere + Weaviate

Wikipedia Semantic Search with Cohere + Weaviate

This is starter code that you can use to search 10 million vectors from Wikipedia embedded with Cohere's multilingual model and hosted as a Weaviate public dataset. This dataset contains 1M vectors in each of the Wikipedia sites in these languages: English, German, French, Spanish, Italian, Japanese, Arabic, Chinese (Simplified), Korean, Hindi [respective language codes: en, de, fr, es, it, ja, ar, zh, ko, hi]

import weaviate

cohere_api_key = ''

auth_config = weaviate.auth.AuthApiKey(api_key="76320a90-53d8-42bc-b41d-678647c6672e") 
client = weaviate.Client(
        "X-Cohere-Api-Key": cohere_api_key,

client.is_ready() #check if True

Let's now define the search function that queries our vector database. Optionally, we want the ability to filter by language.

def semantic_serch(query, results_lang=''):
    Query the vectors database and return the top results. 

        query: str
            The search query
        results_lang: str (optional)
            Retrieve results only in the specified language.
            The demo dataset has those languages:
            en, de, fr, es, it, ja, ar, zh, ko, hi

    nearText = {"concepts": [query]}
    properties = ["text", "title", "url", "views", "lang", "_additional {distance}"]

    # To filter by language
    if results_lang != '':
        where_filter = {
        "path": ["lang"],
        "operator": "Equal",
        "valueString": results_lang
        response = (
            .get("Articles", properties)
    # Search all languages
        response = (
            .get("Articles", properties)

    result = response['data']['Get']['Articles']

    return result

def print_result(result):
    """ Print results with colorful formatting """
    for item in result:
        print(f"\033[95m{item['title']} ({item['views']}) {item['_additional']['distance']}\033[0m")

We can now query the database with any query we want. In the background, Weaviate uses your Cohere API key to embed the query, then retrun the most relevant passages to the query.

query_result = semantic_serch("time travel plot twist")

The Adam Project (3000) -147.31755
Due to a safety feature preventing him from flying because of his injuries, Adam must bring along the younger Adam and use his DNA to enter his jet. They both are soon attacked by Maya Sorian, the leader of the dystopian world, and her assistant Christos, but are saved by Laura, who had faked her death and stayed off-grid in an unknown location. After surviving the attack and comparing notes, Laura and the Adams realize that after the invention of time travel by Louis Reed and his subsequent death, Sorian had monopolized the discovery. During her visit to 2018, Laura learned Sorian frequently came and advised her past self in order to secure her future wealth and power. To protect her secret, Sorian ordered Laura's death. Although Laura survived the assassination attempt, destruction of her time jet left her stranded in the past. The sudden arrival of Sorian's goons interrupts the reunion, and Laura fights off the attack long enough for the two Adams to escape to 2018.

Kang the Conqueror (2000) -146.57275
Nathaniel Richards, a 31st-century scholar and descendant of Reed Richards' time traveling father Nathaniel, becomes fascinated with history and discovers the time travel technology created by Victor von Doom, another possible ancestor of his. He then travels back in time to ancient Egypt aboard a Sphinx-shaped timeship and becomes the Pharaoh Rama-Tut, with plans to claim En Sabah Nur—the mutant destined to become Apocalypse—as his heir. Rama-Tut's rule is cut short when he is defeated by the time-displaced Fantastic Four. An embittered Nathaniel Richards travels forward to the 20th century where he meets Doctor Doom, whom he believes might be his ancestor. He later designs an armor based on Doom's and, calling himself the Scarlet Centurion, pits the Avengers team against alternate-reality counterparts. He plans to dispose of all of them, but the Avengers manage to force him from the timeline.

Back to the Future (3000) -146.4269
That night, Marty meets his eccentric scientist friend, Emmett "Doc" Brown, in the Twin Pines mall parking lot. Doc unveils a time machine built from a modified DeLorean, powered by plutonium he swindled from Libyan terrorists. After Doc inputs a destination time of November5, 1955—the day he first conceived his time travel invention—the terrorists arrive unexpectedly and gun Doc down. Marty flees in the DeLorean, inadvertently activating time travel when he reaches .

Time (2000) -146.41129
Time travel is the concept of moving backwards or forwards to different points in time, in a manner analogous to moving through space, and different from the normal "flow" of time to an earthbound observer. In this view, all points in time (including future times) "persist" in some way. Time travel has been a plot device in fiction since the 19th century. Travelling backwards or forwards in time has never been verified as a process, and doing so presents many theoretical problems and contradictive logic which to date have not been overcome. Any technological device, whether fictional or hypothetical, that is used to achieve time travel is known as a time machine.

In Time (2000) -145.93015
Will and Sylvia rob Weis' time banks, giving the time capsules to the needy. They soon realize they can't significantly change anything, as prices are raised faster to compensate for the extra time. Fortis' gang ambushes them intending to collect the reward for their capture, but Will kills Fortis and his gang. Will and Sylvia then decide to rob Weis' vault of a one-million year capsule. Leon chases them back to Dayton but fails to stop them from distributing the stolen time; Leon times out, having neglected to collect his day's salary. Will and Sylvia nearly time out themselves but survive by taking Leon's salary.

Filtering by language

If we're interested in results in only one language, we can specify it.

query_result = semantic_serch("time travel plot twist", results_lang='ja')

時空の旅人 (500) -144.16002

親殺しのパラドックス (700) -144.11864

タイムトラベル (1000) -143.70331
タイムパラドックスの矛盾を説明するため、タイムトラベル者による歴史の改変で時間軸が分岐し元の世界と並行した別の世界が生まれるとするパラレルワールドの概念がある。この概念を発展させ、タイムトラベル者の介在がなくとも歴史上の重要なポイントで世界が枝分かれしていると解釈する立場もある。この概念を大幅に作品に取り入れた最初期の小説に、可能性として存在する二つの歴史「ジョンバール」と「ギロンチ」の抗争を描いた、ジャック・ウィリアムスンの『航時軍団』(The Legion of Time、1938年)がある。

タイムトラベル (1000) -143.69884

タイムトラベル (1000) -143.61562
タイムパラドックス(Time Paradox / 時間の逆説)は、タイムトラベルに伴う矛盾や変化のことであり、物語のテーマとしてしばしば扱われる。具体的には、タイムトラベルした過去で現代(相対的未来)に存在する事象を改変した場合、その事象における過去と現代の存在や状況、因果関係の不一致という逆説が生じることに着目したものである。