Skip to content
  • Steffen Roßkopf's avatar
    squashing full commits for public repo · 0c77d444
    Steffen Roßkopf authored
    some cleaning  + provide saved_captions for best version (+60 squashed commit)
    
    Squashed commit:
    
    [a8e8b43a] few final deletions of unnecessary stuff
    
    [a2414099] cleaning all over the place
    
    [56c1bb31] first iteration readme
    
    [e280078c] aded script to save only ~240 visual news images used in our dataset
    
    [8c2d2152] FOLDER_DATASETS now in config file
    
    [43d1367d] last few experiments for thesis
    
    [8ec7ecb6] looking at naive NER once again and saving captions for different values of K
    
    [3ca75e79] dataset now finally ready
    
    [5b4ea12c] saving changed captions in json to save time + finetuning dataset examples
    
    [34334936] back to normal random assignments of falsified entries
    
    [c22f41fb] add naive ner to say we checked that in thesis
    
    [b1c86058] final datset ready :)
    
    [5010066b] min score now in expand_dataset + quality control vars now also in config
    
    [d9b91918] finally caption1 always pristine!!!
    
    [712fdf8b] CaptionFrom to proper string values + no falsified indices for google queries needed, since we skip them in google_inverse_search
    
    [9265493f] only downloading pristine, even indices in inverse_annotations + adding min, max and some info to return_all_int_dirs, so user has some more control
    
    [383c9278] fixed bugs in picture savings: 1) show all boxes now, not just two, 2) second changed caption also shows up under picture
    
    [d38fdb2e] relative newsclipings path in dataset + adding google search timestamp to dataset
    
    [ace10e12] SET and SPLIT now in one place (config file) + rm spacy, just use sbert nearest TAG + some cleaning of comments and old code
    
    [7ab7ffbb] scripts for cosmos inc saving both inferences with altered captions
    
    [36c0a55e] adding different enum methods for NER replacement of labels (bert + flair NER) + new random seed for falsified newsclipings examples
    
    [685f04e0] wip friday, not bad results for now with altering captions
    
    [f37316c3] some improvements in expand_dataset (include no_text and penalice their scores, min_selection > 1 to save) + person_split dataset first look
    
    [a079ce41] wip, mainly presentation and examples and try to understand mdetr outputs and problems  + fixed position of thresholds in alter_captions case
    
    [c518b253] added "is_falsified: {actual_context}"-text to saved images
    
    [f4f4f10d] some cleaning + introducing enums for specific strings + useing dataset to save mdetr inference + adding demo dataset creation
    
    [9b7ef633] two options now to compose dataset from webresults, clip (alt) and now sbert wk between caption web and original caption (+1 squashed commits)
    
    [d2615a04] included code to make sure we have balanced dataset in write_dataset
    
    [387e467d] now one json new dataset, starting to use it, todo: 50-50 split
    
    [9db96bb3] good progress on writing news dataset to file in json, but still have to think about composition of things in dataset, e.g. make sure 50% pristine?? (+1 squashed commits)
    
    [5b33336d] some progress in comparing captions from the web
    
    [b3919538] seems like billing needed, what to do?
    
    [4361d5c6] looking at search scripts, does not look bad, guess I can work with that
    
    [ce9bd096] first look at open domain data collection scripts
    
    [30c270a9] refactoring, mainly the whole file system, more order there now (+2 squashed commit)
    
    [325f9f40] refactoring done,
    replacement now without bugs [multiple spanning label] and inside dict
    
    [2578e2c9] could be worse, progress towards refactoring replacing part of captions
    
    [ab5dca06] start new approach, tuple (img, cap1, cap2) again like in cosmos
    
    [7bdda8d7] test if naievly comparing bigger or smaller clip id-pairs works well (big Q: are we allowed to query them?) (+1 squashed commits)
    
    [d8b04c35] still confused about task, but it used to be more
    
    [beef0db5] first look at json
    
    [cfb9ddcf] first commit newsclipings
    
    (cherry picked from commit 86f7dc0f983be70cd17f69521fc08ce927808001)
    
    [5a9f4359] evaluation before meeting
    
    [0d9dee94] basic eval if sim is moving in right direction
    
    [4c917512] idea runs (still bug if label does leave sth out in the middle) + still need to evaluate if this is useful
    
    [1308121b] fixing small bug with start_position (in special case added twice)
    
    [8538d214] progress on trying to find exact label span in caption. but not yet correct / bugs appear
    
    [230931d6] ICE II, utility function to find closest tag in sbert-space
    
    [f2efeea3] ICE commit, little break
    
    [800167a0] finally some progress
    
    [eeb7dae7] some random tried, progress mediocore
    
    [facb8b86] tries to remove part of matching label, not finished
    
    [c206d226] abbility to draw and save top box of each caption + get accuracy on intevall
    
    [aceb13ea] progress on getting accuracy  scores
    
    [e9a5edea] some file refactoring
    
    [6c31205d] change to local computing (cached medtr easier) + first good progress evaluation cosmos (draw top box)
    
    [33cf11ec] [colab] looking at first outputs from offical medtr colab + small correction of test annotation (231.jpg:small)
    
    [2eb1cb2f] adding cosmos annotoations
    
    [dd9dc61e] Add LICENSE
    
    [a7969fbd] shorten readme
    0c77d444
This project is licensed under the MIT License. Learn more