Skip to content

Instantly share code, notes, and snippets.

View generall's full-sized avatar
📡

Andrey Vasnetsov generall

📡
View GitHub Profile

Machine Learning Engineer Test Task: Text2Image Search

Objective

This test task aims to evaluate your grasp of fundamental machine learning and development concepts. Your task involves working with an image dataset to develop a system capable of searching for similar images based on textual queries.

Dataset

Please choose whatever image dataset you prefer, for example:

  • crawl some e-commerce website
collection::collection_manager::holders::segment_holder::SegmentHolder::appendable_segments::{{closure}} () at lib/collection/src/collection_manager/holders/segment_holder.rs:238
collection::collection_manager::holders::segment_holder::SegmentHolder::appendable_segments (self=0x7fcb18c430b8) at lib/collection/src/collection_manager/holders/segment_holder.rs:236
collection::collection_manager::holders::segment_holder::SegmentHolder::segment_flush_ordering (self=0x7fcb18c430b8) at lib/collection/src/collection_manager/holders/segment_holder.rs:460
collection::collection_manager::holders::segment_holder::SegmentHolder::flush_all (self=0x7fcb18c430b8, sync=false) at lib/collection/src/collection_manager/holders/segment_holder.rs:477
collection::update_handler::UpdateHandler::flush_segments (segments=...) at lib/collection/src/update_handler.rs:552
collection::update_handler::UpdateHandler::flush_worker::{{closure}} () at lib/collection/src/update_handler.rs:518
collection::collection_manager::holders::segment_
{
"result": [
{
"payload": {
"location": "html > body > div:nth-of-type(1) > section:nth-of-type(2) > div > div > div > article > h3:nth-of-type(4)",
"sections": [
"documentation",
"documentation/quick_start"
],
"tag": "h3",
@generall
generall / 10k_vector_search.py
Created May 20, 2022 21:05
Search 10k by 10k vectors fast
import asyncio
import time
from multiprocessing import Pool
import httpx
import numpy as np
from grpclib.client import Channel
from qdrant_client import QdrantClient
from qdrant_client.grpc import PointsStub, WithPayloadSelector
from qdrant_client.http.models import Distance, OptimizersConfigDiff, \
@generall
generall / cat_vectors.tsv
Last active July 3, 2021 22:10
categories_ru_config.json
We can't make this file beautiful and searchable because it's too large.
# File: service.py
from fastapi import FastAPI
# That is the file where NeuralSearcher is stored
from neural_searcher import NeuralSearcher
app = FastAPI()
# Create an instance of the neural searcher
from qdrant_client.http.models import Filter
...
city_of_interest = "Berlin"
# Define a filter for cities
city_filter = Filter(**{
"must": [{
"key": "city", # We store city information in a field of the same name
# File: neural_searcher.py
from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer
class NeuralSearcher:
def __init__(self, collection_name):
self.collection_name = collection_name
import numpy as np
import json
fd = open('./startups.json')
# payload is now an iterator over startup data
payload = map(json.loads, fd)
# Here we load all vectors into memory, numpy array works as iterable for itself.
# Other option would be to use Mmap, if we don't want to load all data into RAM
@generall
generall / shrink_embeddings.ipynb
Created April 27, 2019 22:32
Shrinking Fasttext embeddings
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.