Update dependency transformers to v4.41.0 (!147) · Merge requests · Arbetsförmedlingen / devops / Machine Learning / Process Jobads With Gpt Sw3

renovate-token-rw-2 requested to merge renovate/transformers-4.x into main May 20, 2024

This MR contains the following updates:

Package	Update	Change
transformers	minor	`==4.40.2` -> `==4.41.0`

Release Notes

huggingface/transformers (transformers)

`v4.41.0`: : Phi3, JetMoE, PaliGemma, VideoLlava, Falcon2, FalconVLM & GGUF support

Compare Source

New models

Phi3

The Phi-3 model was proposed in Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone by Microsoft.

TLDR; Phi-3 introduces new ROPE scaling methods, which seems to scale fairly well! A 3b and a Phi-3-mini is available in two context-length variants—4K and 128K tokens. It is the first model in its class to support a context window of up to 128K tokens, with little impact on quality.

Phi-3 by @gugarosa in https://github.com/huggingface/transformers/pull/30423

JetMoE

JetMoe-8B is an 8B Mixture-of-Experts (MoE) language model developed by Yikang Shen and MyShell. JetMoe project aims to provide a LLaMA2-level performance and efficient language model with a limited budget. To achieve this goal, JetMoe uses a sparsely activated architecture inspired by the ModuleFormer. Each JetMoe block consists of two MoE layers: Mixture of Attention Heads and Mixture of MLP Experts. Given the input tokens, it activates a subset of its experts to process them. This sparse activation schema enables JetMoe to achieve much better training throughput than similar size dense models. The training throughput of JetMoe-8B is around 100B tokens per day on a cluster of 96 H100 GPUs with a straightforward 3-way pipeline parallelism strategy.

Add JetMoE model by @yikangshen in https://github.com/huggingface/transformers/pull/30005

PaliGemma

PaliGemma is a lightweight open vision-language model (VLM) inspired by PaLI-3, and based on open components like the SigLIP vision model and the Gemma language model. PaliGemma takes both images and text as inputs and can answer questions about images with detail and context, meaning that PaliGemma can perform deeper analysis of images and provide useful insights, such as captioning for images and short videos, object detection, and reading text embedded within images.

More than 120 checkpoints are released see the collection here !

Add PaliGemma by @molbap in https://github.com/huggingface/transformers/pull/30814

VideoLlava

Video-LLaVA exhibits remarkable interactive capabilities between images and videos, despite the absence of image-video pairs in the dataset.

💡 Simple baseline, learning united visual representation by alignment before projection With the binding of unified visual representations to the language feature space, we enable an LLM to perform visual reasoning capabilities on both images and videos simultaneously. 🔥 High performance, complementary learning with video and image Extensive experiments demonstrate the complementarity of modalities, showcasing significant superiority when compared to models specifically designed for either images or videos.

Add Video Llava by @zucchini-nlp in https://github.com/huggingface/transformers/pull/29733

Falcon 2 and FalconVLM:

Two new models from TII-UAE! They published a blog-post with more details! Falcon2 introduces parallel mlp, and falcon VLM uses the Llava framework

Support for Falcon2-11B by @Nilabhra in https://github.com/huggingface/transformers/pull/30771
Support arbitrary processor by @ArthurZucker in https://github.com/huggingface/transformers/pull/30875

GGUF `from_pretrained` support

You can now load most of the GGUF quants directly with transformers' from_pretrained to convert it to a classic pytorch model. The API is simple:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF"
filename = "tinyllama-1.1b-chat-v1.0.Q6_K.gguf"

tokenizer = AutoTokenizer.from_pretrained(model_id, gguf_file=filename)
model = AutoModelForCausalLM.from_pretrained(model_id, gguf_file=filename)

We plan more closer integrations with llama.cpp / GGML ecosystem in the future, see: https://github.com/huggingface/transformers/issues/27712 for more details

Loading GGUF files support by @LysandreJik in https://github.com/huggingface/transformers/pull/30391

Quantization

New quant methods

In this release we support new quantization methods: HQQ & EETQ contributed by the community. Read more about how to quantize any transformers model using HQQ & EETQ in the dedicated documentation section

Add HQQ quantization support by @mobicham in https://github.com/huggingface/transformers/pull/29637
[FEAT]: EETQ quantizer support by @dtlzhuangz in https://github.com/huggingface/transformers/pull/30262

`dequantize` API for bitsandbytes models

In case you want to dequantize models that have been loaded with bitsandbytes, this is now possible through the dequantize API (e.g. to merge adapter weights)

FEAT / Bitsandbytes: Add dequantize API for bitsandbytes quantized models by @younesbelkada in https://github.com/huggingface/transformers/pull/30806

API-wise, you can achieve that with the following:

from transformers import AutoModelForCausalLM, BitsAndBytesConfig, AutoTokenizer

model_id = "facebook/opt-125m"

model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=BitsAndBytesConfig(load_in_4bit=True))
tokenizer = AutoTokenizer.from_pretrained(model_id)

model.dequantize()

text = tokenizer("Hello my name is", return_tensors="pt").to(0)

out = model.generate(**text)
print(tokenizer.decode(out[0]))

Generation updates

Add Watermarking LogitsProcessor and WatermarkDetector by @zucchini-nlp in https://github.com/huggingface/transformers/pull/29676
Cache: Static cache as a standalone object by @gante in https://github.com/huggingface/transformers/pull/30476
Make Gemma work with torch.compile by @ydshieh in https://github.com/huggingface/transformers/pull/30775

SDPA support

[BERT] Add support for sdpa by @hackyon in https://github.com/huggingface/transformers/pull/28802
Add sdpa and fa2 the Wav2vec2 family. by @kamilakesbi in https://github.com/huggingface/transformers/pull/30121
add sdpa to ViT [follow up of #29325] by @hyenal in https://github.com/huggingface/transformers/pull/30555

Improved Object Detection

Addition of fine-tuning script for object detection models

Fix YOLOS image processor resizing by @qubvel in https://github.com/huggingface/transformers/pull/30436
Add examples for detection models finetuning by @qubvel in https://github.com/huggingface/transformers/pull/30422
Add installation of examples requirements in CI by @qubvel in https://github.com/huggingface/transformers/pull/30708
Update object detection guide by @qubvel in https://github.com/huggingface/transformers/pull/30683

Interpolation of embeddings for vision models

Add interpolation of embeddings. This enables predictions from pretrained models on input images of sizes different than those the model was originally trained on. Simply pass interpolate_pos_embedding=True when calling the model.

Added for: BLIP, BLIP 2, InstructBLIP, SigLIP, ViViT

import requests
from PIL import Image
from transformers import Blip2Processor, Blip2ForConditionalGeneration

image = Image.open(requests.get("https://huggingface.co/hf-internal-testing/blip-test-image/resolve/main/demo.jpg", stream=True).raw)
processor = Blip2Processor.from_pretrained("Salesforce/blip2-opt-2.7b")
model = Blip2ForConditionalGeneration.from_pretrained(
    "Salesforce/blip2-opt-2.7b", 
    torch_dtype=torch.float16
).to("cuda")
inputs = processor(images=image, size={"height": 500, "width": 500}, return_tensors="pt").to("cuda")

predictions = model(**inputs, interpolate_pos_encoding=True)

### Generated text: "a woman and dog on the beach"
generated_text = processor.batch_decode(predictions, skip_special_tokens=True)[0].strip()

Blip dynamic input resolution by @zafstojano in https://github.com/huggingface/transformers/pull/30722
Add dynamic resolution input/interpolate position embedding to SigLIP by @davidgxue in https://github.com/huggingface/transformers/pull/30719
Enable dynamic resolution for vivit by @jla524 in https://github.com/huggingface/transformers/pull/30630

🚨 might be breaking

🚨🚨🚨Deprecate evaluation_strategy to eval_strategy🚨🚨🚨 by @muellerzr in https://github.com/huggingface/transformers/pull/30190
🚨 Add training compatibility for Musicgen-like models by @ylacombe in https://github.com/huggingface/transformers/pull/29802
🚨 Update image_processing_vitmatte.py by @rb-synth in https://github.com/huggingface/transformers/pull/30566

Cleanups

Remove task guides auto-update in favor of links towards task pages by @LysandreJik in https://github.com/huggingface/transformers/pull/30429
Remove add-new-model in favor of add-new-model-like by @LysandreJik in https://github.com/huggingface/transformers/pull/30424
Remove mentions of models in the READMEs and link to the documentation page in which they are featured. by @LysandreJik in https://github.com/huggingface/transformers/pull/30420

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Enabled.

♻ Rebasing: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this MR and you won't be reminded about this update again.

If you want to rebase/retry this MR, check this box

This MR has been generated by Renovate Bot.

Update dependency transformers to v4.41.0