Prompt Ingection Attacks and How to handle them?

saurabhkamal14
5 days ago
1 min read

A prompt injection attack is when someone gives cleverly crafted input to an AI system that tricks into doing something you didn't intend - like revealing hidden instructions, or doing harmful actions.

How to handle prompt injection attacks?

Check the user input first (input validation)

Before the model sees it, examine what user submit.

Look for weird, dangerous, or clearly malicious content.
clean it up or block it if you see risky patterns.

This stop a lot of attacks before they can affect the model.

Check what the AI answers (Output filtering)

Even after the model generates a reply, verify the response before showing it to users.

Make sure it doesn't accidently leak internal instructions.
Block or correct replies that look manipulated or unsafe.

Use least privilege (Limit what the model can do)

Don't give the AI more power than it needs.

If it doesn't need access to sensitive database or sensitive actions, don't grant it access.
Only give it the minimum access required to do its job.

This reduce the damage if something goes wrong.

Human check for important tasks

For actions that could give big consequences (like sending emails, accessing secrets, doing transactions), ask a real person to review before proceeding.

This ensures an attacker can't just make the AI do harmful things automatically.

Putting it all together

If I combine all these practices - careful input checks, clear separation of system vs user text, output validation, limiting capabilities, and human review for sensitive actions - you can greatly reduce the chances of prompt injections succeeding.

GlobeFT

Data Driven Research LAB

Prompt Ingection Attacks and How to handle them?

Recent Posts

GlobeFT

Data Driven Research LAB

Tel. +44(0) 744 206 3191

GlobeFT

Data Driven Research LAB

Tel. +44(0) 744 206 3191

​

​