Identify unlinked images with a script
@cnorris asked me today if it was possible to loop through our repo and identify unlinked images. I've got a hacky partial solution but I can't figure out how to finish it.
find . -type f -print0 |
xargs -0 file --mime-type |
grep -F 'image/' |
cut -d ':' -f 1 |
xargs -n 1 basename
I've gotten as far as printing out the full path to each image, and then extracting the basename
of each image. Next, we'd need to grep the docset for the string, and print out the filenames with no results.
Observations
- We have to drop most of the path, because we can't assume that the image link uses anything except
img/
+ the filename. - (That means we could get false results if two images share the same name in different directories.)
- Only checks a single repo, so no love for
charts
orrunner
Resources I found
- https://stackoverflow.com/questions/16758105/list-all-graphic-image-files-with-find Finding all image files
-
https://www.sitepoint.com/community/t/unlinked-images-clean-up/189094/10 reminded me about
basename
-
https://unix.stackexchange.com/questions/358840/what-does-this-command-line-echo-1-xargs-n-1-basename-cut-d-f1-d more
basename
musings