Skip to content

WARNING: No source added in .add_source_document #1172

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JKDevz opened this issue Apr 3, 2025 · 2 comments
Open

WARNING: No source added in .add_source_document #1172

JKDevz opened this issue Apr 3, 2025 · 2 comments

Comments

@JKDevz
Copy link

JKDevz commented Apr 3, 2025

I followed the following LLMWare tutorial on YouTube and ran into the following error when using Prompter.add_source_document().

source = prompter.add_source_document(contracts_path, contract, query=key)

This gives me a Warning that the source was not added.

What could be the issue?

My contracts_path = ./docs/. My files do have spaces in their names, but I doubt that would be an issue. Could it be the file-type suffix?

Tutorial Video Followed: https://www.youtube.com/watch?v=8aV5p3tErP0

@mferris77
Copy link

I'm encountering this as well. While I haven't been able to figure out the root cause yet, I learned that when you create the query list, the keys (ie 'base salary', 'vacation') are used to reduce the amount of content to only chunks of content containing those terms. So if you're using a label there that won't be found in the document, it won't return any chunks of text.

When I updated the query list to something that was on the first page I then got some results. But now it seems to only be parsing the first page - or 10 chunks. So I'm not sure if it's limiting itself to 10 chunks or is encountering an issue with pages in the document.

Their example contracts have multiple pages so I'm not quite sure what the issue is.

@Pravalika-Batchu
Copy link

Hey! I ran into that same issue with "prompter.add_source_document()" giving a warning that the source wasn't added. After looking into it, I think the main reason is that the query key we're using (like 'base salary' or 'vacation') has to actually exist exactly in the document. If it doesn’t match any content, no chunks get picked up and it doesn’t add the source.

To check, try using a super common word like “employee” or even set query=None just to see if the document gets loaded without filtering. That’ll tell us if the file itself is fine.

Also, the file type might matter — make sure it’s a .pdf or .txt. And spaces in the filename probably aren’t the issue, but to be safe, try renaming one file to something simple like contract1.pdf.

Lastly, it looks like it only pulls in up to 10 chunks, so if our key is on a later page or not found early, it might get skipped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants