LICENSE · main · Steffen Roßkopf / Master Thesis-visual-grounding · GitLab

squashing full commits for public repo · 0c77d444

Steffen Roßkopf authored Apr 14, 2022

some cleaning  + provide saved_captions for best version (+60 squashed commit)

Squashed commit:

[a8e8b43a] few final deletions of unnecessary stuff

[a2414099] cleaning all over the place

[56c1bb31] first iteration readme

[e280078c] aded script to save only ~240 visual news images used in our dataset

[8c2d2152] FOLDER_DATASETS now in config file

[43d1367d] last few experiments for thesis

[8ec7ecb6] looking at naive NER once again and saving captions for different values of K

[3ca75e79] dataset now finally ready

[5b4ea12c] saving changed captions in json to save time + finetuning dataset examples

[34334936] back to normal random assignments of falsified entries

[c22f41fb] add naive ner to say we checked that in thesis

[b1c86058] final datset ready :)

[5010066b] min score now in expand_dataset + quality control vars now also in config

[d9b91918] finally caption1 always pristine!!!

[712fdf8b] CaptionFrom to proper string values + no falsified indices for google queries needed, since we skip them in google_inverse_search

[9265493f] only downloading pristine, even indices in inverse_annotations + adding min, max and some info to return_all_int_dirs, so user has some more control

[383c9278] fixed bugs in picture savings: 1) show all boxes now, not just two, 2) second changed caption also shows up under picture

[d38fdb2e] relative newsclipings path in dataset + adding google search timestamp to dataset

[ace10e12] SET and SPLIT now in one place (config file) + rm spacy, just use sbert nearest TAG + some cleaning of comments and old code

[7ab7ffbb] scripts for cosmos inc saving both inferences with altered captions

[36c0a55e] adding different enum methods for NER replacement of labels (bert + flair NER) + new random seed for falsified newsclipings examples

[685f04e0] wip friday, not bad results for now with altering captions

[f37316c3] some improvements in expand_dataset (include no_text and penalice their scores, min_selection > 1 to save) + person_split dataset first look

[a079ce41] wip, mainly presentation and examples and try to understand mdetr outputs and problems  + fixed position of thresholds in alter_captions case

[c518b253] added "is_falsified: {actual_context}"-text to saved images

[f4f4f10d] some cleaning + introducing enums for specific strings + useing dataset to save mdetr inference + adding demo dataset creation

[9b7ef633] two options now to compose dataset from webresults, clip (alt) and now sbert wk between caption web and original caption (+1 squashed commits)

[d2615a04] included code to make sure we have balanced dataset in write_dataset

[387e467d] now one json new dataset, starting to use it, todo: 50-50 split

[9db96bb3] good progress on writing news dataset to file in json, but still have to think about composition of things in dataset, e.g. make sure 50% pristine?? (+1 squashed commits)

[5b33336d] some progress in comparing captions from the web

[b3919538] seems like billing needed, what to do?

[4361d5c6] looking at search scripts, does not look bad, guess I can work with that

[ce9bd096] first look at open domain data collection scripts

[30c270a9] refactoring, mainly the whole file system, more order there now (+2 squashed commit)

[325f9f40] refactoring done,
replacement now without bugs [multiple spanning label] and inside dict

[2578e2c9] could be worse, progress towards refactoring replacing part of captions

[ab5dca06] start new approach, tuple (img, cap1, cap2) again like in cosmos

[7bdda8d7] test if naievly comparing bigger or smaller clip id-pairs works well (big Q: are we allowed to query them?) (+1 squashed commits)

[d8b04c35] still confused about task, but it used to be more

[beef0db5] first look at json

[cfb9ddcf] first commit newsclipings

(cherry picked from commit 86f7dc0f983be70cd17f69521fc08ce927808001)

[5a9f4359] evaluation before meeting

[0d9dee94] basic eval if sim is moving in right direction

[4c917512] idea runs (still bug if label does leave sth out in the middle) + still need to evaluate if this is useful

[1308121b] fixing small bug with start_position (in special case added twice)

[8538d214] progress on trying to find exact label span in caption. but not yet correct / bugs appear

[230931d6] ICE II, utility function to find closest tag in sbert-space

[f2efeea3] ICE commit, little break

[800167a0] finally some progress

[eeb7dae7] some random tried, progress mediocore

[facb8b86] tries to remove part of matching label, not finished

[c206d226] abbility to draw and save top box of each caption + get accuracy on intevall

[aceb13ea] progress on getting accuracy  scores

[e9a5edea] some file refactoring

[6c31205d] change to local computing (cached medtr easier) + first good progress evaluation cosmos (draw top box)

[33cf11ec] [colab] looking at first outputs from offical medtr colab + small correction of test annotation (231.jpg:small)

[2eb1cb2f] adding cosmos annotoations

[dd9dc61e] Add LICENSE

[a7969fbd] shorten readme

0c77d444

This project is licensed under the MIT License. Learn more