Replies: 6 comments 7 replies
-
Qwen 2 2B isn't the best image capable model, but it is fast and small. I would try using Gemma-3 4B or even 12B to get some more intelligent output. With that said, this is a script that was written to do a specific thing -- keyword and caption batches of images using a local LLM. Anything else is outside of those requirements. I'm glad you like the script and find it useful! |
Beta Was this translation helpful? Give feedback.
-
Oh I see, Thank you for creating a metadata generator program, it's very
useful for me.
Pada Sen, 24 Mar 2025, 12.30, jabberjabberjabber ***@***.***>
menulis:
… Qwen 2 2B isn't the best image capable model, but it is fast and small. I
would try using Gemma-3 4B or even 12B to get some more intelligent output.
With that said, this is a script that was written to do a specific thing
-- keyword and caption batches of images using a local LLM. Anything else
is outside of those requirements.
I'm glad you like the script and find it useful!
—
Reply to this email directly, view it on GitHub
<#25 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOINEZER3K33Q2AS4BEAY732V6J6FAVCNFSM6AAAAABZTFLZB6VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTENJZG42TGNA>
.
You are receiving this because you are subscribed to this thread.Message
ID: <jabberjabberjabber/ImageIndexer/repo-discussions/25/comments/12597534
@github.com>
|
Beta Was this translation helpful? Give feedback.
-
But the Gemma-3 4B model doesn't have a projector image, can it combine
Gemma-3 4B for model text and Qwen2-VL-2B for image projector?
Pada Sen, 24 Mar 2025, 15.40, Til ***@***.***> menulis:
… Thanks a lot for your quick reply. Ok cool, will check out Gemma too!
No worries, still love ImageIndexer! 😊👍
—
Reply to this email directly, view it on GitHub
<#25 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOINEZA2N63G5LEGQKDZNXD2V7AHRAVCNFSM6AAAAABZTFLZB6VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTENJZHEYTGMA>
.
You are receiving this because you commented.Message ID:
<jabberjabberjabber/ImageIndexer/repo-discussions/25/comments/12599130@
github.com>
|
Beta Was this translation helpful? Give feedback.
-
Thank you for your help. I’ve successfully implemented the Gemma-3 4B model
on the Image Indexer program using an i3 10100f, 32GB RAM, and GTX 1080 Ti
FTW. It outperforms the Qwen2-VL-2B model.
Tested models:
- google_gemma-3-4b-it-Q2_K.gguf: 10s/image
- google_gemma-3-4b-it-Q6_K.gguf: 15s/image
- Qwen2-VL-2B: 7s/image
Pada Sel, 25 Mar 2025 pukul 01.48 jabberjabberjabber <
***@***.***> menulis:
… Projector:
https://huggingface.co/bartowski/google_gemma-3-4b-it-GGUF/blob/main/mmproj-google_gemma-3-4b-it-f16.gguf
—
Reply to this email directly, view it on GitHub
<#25 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOINEZDZRCVEIMUQ2A7GUQT2WBHP5AVCNFSM6AAAAABZTFLZB6VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTENRQGYZTCNA>
.
You are receiving this because you commented.Message ID:
<jabberjabberjabber/ImageIndexer/repo-discussions/25/comments/12606314@
github.com>
|
Beta Was this translation helpful? Give feedback.
-
Hello, |
Beta Was this translation helpful? Give feedback.
-
Don't just change the image projector, but also apply it to the text model,
like this:
set "TEXT_MODEL=
https://huggingface.co/bartowski/google_gemma-3-4b-it-GGUF/blob/main/google_gemma-3-4b-it-Q2_K.gguf
"
set "IMAGE_PROJECTOR=
https://huggingface.co/bartowski/google_gemma-3-4b-it-GGUF/blob/main/mmproj-google_gemma-3-4b-it-f16.gguf
"
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Jabberjabberjabber :)
This ImageIndexer of yours is an awesome tool! I do have a few questions:
For captioning lots of images of a cat called 'Felini' I tried to use the prompt (in the ImageIndexer settings) instructing the AI to mention Felini the cat in the description / keywords which didn't work. This is maybe a bit along the line of @shoverians request of being able to add specific keyword to a batch of images?
When preparing images for the web, one of the most important tags is the Alt tag ("-AltTextAccessibility" in exiftool) because it helps people who cannot see to identify the image content. The descriptions/captions coming from ImageIndexer would just be perfect for this purpose. So I guess this is a bit like the suggestion from @eternalliving where those exif fields could be redirected...
I know these tags are super messy and it's probably very tough to direct... I dunno, just in case it helps: At the moment I have ChatGPT look at an image and instruct it to spit out a code snippet like the one below (giving it an example of all the tags to be used), that I copy&paste in exiftool to inject the tags.
exiftool -Creator="Til Vogt" -Copyright="Til & Felini" -Source="https://felini.rocks" -CreatorWorkEmail="meow@felini.rocks" -PersonInImage="Felini the Kitty" -City="Agra" -Country="India" -Subject="Felini Cat Relaxing at the Taj Mahal" -Title="Felini Cat Lounges in Front of the Taj Mahal" -Headline="Felini the Cat Enjoys a Relaxing Moment Near the Taj Mahal" -Description="Felini, a playful black-and-white cat, stretches out comfortably near the Taj Mahal, basking in the golden sunlight of Agra, India. His relaxed pose perfectly captures the essence of a peaceful journey." -AltTextAccessibility="Felini, a black-and-white cat, lounges near the Taj Mahal, enjoying the warm light of Agra, India." -keywords+="Felini cat" -keywords+="Taj Mahal" -keywords+="Agra India" -keywords+="travel photography" -keywords+="relaxing cat" -IntellectualGenre="Travel Photography" Felini-cat-world-trip_India_Agra_Taj-Mahal_B_01.jpg
To cut a few corners and being able to process a batch of images, I was kind of hoping that given this example Qwen2 might be able to spit out a similar code snippet that ImageIndexer could forward to exiftool... Sorry for those naive questions - my coding knowledge is super basic.
Anyhow, thanks a lot for your work and congrats to the great tool already!
Cheers,
Til
Beta Was this translation helpful? Give feedback.
All reactions