• Vlyn@lemmy.zip
    link
    fedilink
    English
    arrow-up
    4
    ·
    1 year ago

    You can’t trust the result if you only do one pass, because the result could be compromised. The entire point of the first pass is a simple: Safe, yes or no? And only when it’s safe do you go for the actual result (which might be used somewhere else).

    If you try to encode the entire checking + prompt into one request then it might be possible to just break out of that and deliver a bad result either way.

    Overall though it’s insanity to use a LLM with user input where the result can influence other users. Someone will always find a way to break any protections you’re trying to apply.