系列教程四 | BLIP 模型在自定义图像描述数据集上的微调
我们将使用一个示例数据集,该数据集包含足球球员的图像和相应的标注文本。通过本教程,你将学会如何将自己的图像-文本数据集上传到 BitaHub 平台,加载和处理数据,以及在BitaHub 平台上进行模型微调。1.通过datasets库加载已上传的足球运动员数据集,执行上述代码时,会显示数据集的加载进度,完成后数据集就准备就绪。可以在BitaHub主页下载此次训练所需要的模型和数据集,并将其存入刚刚创
在本教程中,我们将展示在 BitaHub 平台上微调 BLIP(Bootstrapping Language-Image Pretraining)模型。我们将使用一个示例数据集,该数据集包含足球球员的图像和相应的标注文本。通过本教程,你将学会如何将自己的图像-文本数据集上传到 BitaHub 平台,加载和处理数据,以及在BitaHub 平台上进行模型微调。
一.环境准备
创建Bitahub项目
1.进入BitaHub官网,完成注册后点击右上角进入工作台。

2.在文件存储中创建文件系统。

可以在BitaHub主页下载此次训练所需要的模型和数据集,并将其存入刚刚创建的文件系统当中。


3.在「模型开发和训练」中,创建新的开发环境。
-
在「存储挂载」中添加模型和数据集;选择平台镜像。
-
选择 JupyterLab访问方式,单卡4090GPU套餐。


安装依赖库
1.在运行代码的环境中,使用以下命令安装所需的库:
pip install git+https://github.com/huggingface/transformers.git@mainpip install -q datasets
二.加载数据集
1.通过datasets库加载已上传的足球运动员数据集,执行上述代码时,会显示数据集的加载进度,完成后数据集就准备就绪。
from datasets import load_datasetdataset = load_dataset("data/football-dataset", split="train")
Generating train split: 100%|██████████| 6/6 [00:00<00:00, 852.93 examples/s]
2.要清楚数据集的结构,确保数据集中包含可以通过键 text 访问的文本信息列。让我们获取第一个示例的图像描述:
dataset[1]["text"]
'Maradona after winning the 1986 FIFA World Cup with Argentina'
3.查看对应的图像

4.为了让数据适用于模型训练,需要创建一个自定义数据集类来预处理数据:
torch.utils.data import Dataset, DataLoaderclass ImageCaptioningDataset(Dataset):def __init__(self, dataset, processor):self.dataset = datasetself.processor = processordef __len__(self):return len(self.dataset)def __getitem__(self, idx):item = self.dataset[idx]encoding = self.processor(images=item["image"], text=item["text"], padding="max_length", return_tensors="pt")# remove batch dimensionencoding = {k:v.squeeze() for k,v in encoding.items()}return encoding
三.模型与处理器初始化
1.利用transformers库加载预训练的 BLIP 模型和处理器:
from transformers import AutoProcessor, BlipForConditionalGenerationprocessor = AutoProcessor.from_pretrained("model/blip-image-captioning-base")model = BlipForConditionalGeneration.from_pretrained("model/blip-image-captioning-base")
2.将预处理后的数据集转换为数据加载器,以便训练时按批次加载数据:
train_dataset = ImageCaptioningDataset(dataset, processor)train_dataloader = DataLoader(train_dataset, shuffle=True, batch_size=2
四.模型训练
设置训练所需的参数,包括优化器、设备和训练轮数,然后开始训练模型:
import torchoptimizer = torch.optim.AdamW(model.parameters(), lr=5e-5)device = "cuda" if torch.cuda.is_available() else "cpu"model.to(device)model.train()for epoch in range(50):print("Epoch:", epoch)for idx, batch in enumerate(train_dataloader):input_ids = batch.pop("input_ids").to(device)pixel_values = batch.pop("pixel_values").to(device)outputs = model(input_ids=input_ids,pixel_values=pixel_values,labels=input_ids)loss = outputs.lossprint("Loss:", loss.item())loss.backward()optimizer.step()optimizer.zero_grad()
Epoch: 0Loss: 12.939854621887207Loss: 10.245623588562012Loss: 10.24632453918457Epoch: 1Loss: 10.172136306762695Loss: 10.176426887512207Loss: 10.143275260925293Epoch: 2Loss: 10.148330688476562Loss: 10.101930618286133Loss: 10.096418380737305Epoch: 3Loss: 10.054553031921387Loss: 10.073493003845215Loss: 10.094892501831055Epoch: 4Loss: 10.084649085998535Loss: 10.016317367553711Loss: 9.996145248413086Epoch: 5Loss: 9.995129585266113Loss: 9.980039596557617Loss: 9.956443786621094Epoch: 6Loss: 9.8993558883667Loss: 9.824309349060059Loss: 9.236822128295898Epoch: 7Loss: 8.876283645629883Loss: 8.683889389038086Loss: 8.422343254089355Epoch: 8Loss: 8.153888702392578Loss: 7.872360706329346Loss: 7.6936798095703125Epoch: 9Loss: 7.418920040130615Loss: 7.207061290740967Loss: 7.046998023986816Epoch: 10Loss: 6.877917289733887Loss: 6.653289318084717Loss: 6.513096332550049Epoch: 11Loss: 6.353802680969238Loss: 6.164441108703613Loss: 6.016610145568848Epoch: 12Loss: 5.855982303619385Loss: 5.685638427734375Loss: 5.55130672454834Epoch: 13Loss: 5.383485317230225Loss: 5.235148906707764Loss: 5.074439525604248Epoch: 14Loss: 4.904763698577881Loss: 4.762197494506836Loss: 4.599602222442627Epoch: 15Loss: 4.437991142272949Loss: 4.267367362976074Loss: 4.143704414367676Epoch: 16Loss: 3.9676566123962402Loss: 3.799015760421753Loss: 3.6743252277374268Epoch: 17Loss: 3.489795446395874Loss: 3.3644168376922607Loss: 3.195446252822876Epoch: 18Loss: 3.0441834926605225Loss: 2.886849880218506Loss: 2.752702236175537Epoch: 19Loss: 2.6108040809631348Loss: 2.4503977298736572Loss: 2.3077549934387207Epoch: 20Loss: 2.1754894256591797Loss: 2.0409276485443115Loss: 1.9023433923721313Epoch: 21Loss: 1.774215817451477Loss: 1.6534215211868286Loss: 1.5341780185699463Epoch: 22Loss: 1.4227075576782227Loss: 1.3040554523468018Loss: 1.2004470825195312Epoch: 23Loss: 1.1041473150253296Loss: 1.0089735984802246Loss: 0.9248499870300293Epoch: 24Loss: 0.842706561088562Loss: 0.7683207988739014Loss: 0.7005384564399719Epoch: 25Loss: 0.6362196803092957Loss: 0.5797610282897949Loss: 0.5299125909805298Epoch: 26Loss: 0.48054414987564087Loss: 0.4395519495010376Loss: 0.4030042588710785Epoch: 27Loss: 0.3674047887325287Loss: 0.33880168199539185Loss: 0.3099591135978699Epoch: 28Loss: 0.28744447231292725Loss: 0.2636030316352844Loss: 0.2445455640554428Epoch: 29Loss: 0.22669453918933868Loss: 0.2116294801235199Loss: 0.19875739514827728Epoch: 30Loss: 0.1853506714105606Loss: 0.1732858568429947Loss: 0.16410362720489502Epoch: 31Loss: 0.15450723469257355Loss: 0.14666743576526642Loss: 0.13762235641479492Epoch: 32Loss: 0.13125456869602203Loss: 0.1254701167345047Loss: 0.11936748027801514Epoch: 33Loss: 0.11371345818042755Loss: 0.10930757969617844Loss: 0.10532978922128677Epoch: 34Loss: 0.10060680657625198Loss: 0.09693031013011932Loss: 0.09384751319885254Epoch: 35Loss: 0.09049879014492035Loss: 0.08732312172651291Loss: 0.08435408025979996Epoch: 36Loss: 0.08175667375326157Loss: 0.07963433861732483Loss: 0.07721687853336334Epoch: 37Loss: 0.07500782608985901Loss: 0.07290299236774445Loss: 0.07139705121517181Epoch: 38Loss: 0.06948820501565933Loss: 0.0678931176662445Loss: 0.06586962938308716Epoch: 39Loss: 0.06447815150022507Loss: 0.06333400309085846Loss: 0.061826009303331375Epoch: 40Loss: 0.0602596215903759Loss: 0.05946327745914459Loss: 0.05823744460940361Epoch: 41Loss: 0.05697426572442055Loss: 0.056047819554805756Loss: 0.05480460077524185Epoch: 42Loss: 0.05381720885634422Loss: 0.05305255204439163Loss: 0.05202964320778847Epoch: 43Loss: 0.05123036354780197Loss: 0.05013291910290718Loss: 0.04960477724671364Epoch: 44Loss: 0.04879220575094223Loss: 0.047930758446455Loss: 0.04713277146220207Epoch: 45Loss: 0.046472690999507904Loss: 0.045729801058769226Loss: 0.04521225765347481Epoch: 46Loss: 0.044310249388217926Loss: 0.043927837163209915Loss: 0.04331314191222191Epoch: 47Loss: 0.04280324652791023Loss: 0.042053334414958954Loss: 0.041331253945827484Epoch: 48Loss: 0.04100647568702698Loss: 0.04039805382490158Loss: 0.03982575610280037Epoch: 49Loss: 0.03945675864815712Loss: 0.0388018935918808Loss: 0.038384564220905304
五.模型推理
从数据集中取一个样本图像,经处理器编码后,生成文本描述。
example = dataset[1]image = example["image"]inputs = processor(images=image, return_tensors="pt").to(device)pixel_values = inputs.pixel_valuesgenerated_ids = model.generate(pixel_values=pixel_values, max_length=50)generated_caption = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]print(generated_caption)
maradona after winning the 1986 fifa world cup with argentina
-
这是 BLIP 生成的图像描述,表明模型成功学习到了图像和文本的匹配关系。
本教程介绍了如何在 BitaHub 平台上微调 BLIP 模型,以实现图像描述生成。
BitaHub社区更多模型及教程持续更新中,期待您的关注!
更多推荐
所有评论(0)