Model name: clip_local

About CLIP

CLIP (Contrastive Language-Image Pre-training) is a model that learns visual concepts from natural language supervision. It is a zero-shot learning model that can be used for a wide range of vision and language tasks.

Read more about CLIP on OpenAI's website.

Supported aidb operations

  • encode_text
  • encode_text_batch
  • encode_image
  • encode_image_batch

Supported models

  • openai/clip-vit-base-patch32 (default)

Creating the default model

SELECT aidb.create_model('my_clip_model', 'clip_local');

There is only one model, the default openai/clip-vit-base-patch32, so we do not need to specify the model in the configuration. No credentials are required for the CLIP model.

Creating a specific model

There are no other model configurations available for the CLIP model.

Model configuration settings

The following configuration settings are available for CLIP models:

  • model - The CLIP model to use. The default is openai/clip-vit-base-patch32 and is the only model available.
  • revision - The revision of the model to use. The default is refs/pr/15. This entry is a reference to the model revision in the HuggingFace repository, and is used to specify the model version to use, in this case this branch.
  • image_size - The size of the image to use. The default is 224.

Model credentials

No credentials are required for the CLIP model.


Could this page be better? Report a problem or suggest an addition!