Which models do you cover?

OpenAI (GPT-4o / o1 / o3 family), Anthropic (Claude 4.x), Google (Gemini 2.x), Meta (Llama 3.x / 4), Mistral, open-source via vLLM / Ollama, and custom fine-tunes. Provider-agnostic — we test what you ship.

Is this just prompt-injection testing?

No — prompt injection is one of seven phases. The high-impact engagements typically uncover issues in RAG tenancy, agentic tool-use authorization, training-data leakage, or supply-chain trust (e.g. embedding model from an unknown HuggingFace org).

Can you also test our MCP servers?

Yes — MCP server security is a fast-growing engagement type. We enumerate exposed tools, fuzz parameters, test authorization boundaries and look for confused-deputy patterns where the agent invokes privileged MCP tools on the user's behalf.

OWASP LLM Top 10 · MITRE ATLAS · MLSecOps

AI / LLM Security Testing

Security testing for production AI / LLM systems — prompt injection, jailbreaks, data exfiltration via context windows, model supply-chain risks, RAG pipeline poisoning, agentic tool-call abuse and ML training-data integrity. Mapped to OWASP LLM Top 10 and MITRE ATLAS, with deliverables your AI safety team and your CISO both accept.

Request a quote See methodology

Download sample pentest report

CERT-In format · anonymised

Engagement at a glance

Quote SLA48 hours
Typical engagement5–15 working days
RetestFree within 30 days
Reporting formatCERT-In + ISO + SOC 2 ready
Team100% in-house · OSCP / OSWE / OSEP

What this actually looks like

A AI Pentest engagement, in plain language.

AI security is not a generic pentest with the word 'AI' added. A Macksofy AI engagement tests the live model behind your chatbot, the RAG pipeline that retrieves context, the tools your agent can invoke, the training data your fine-tune ingests, and the supply chain (HuggingFace weights, embedding models, vector DBs) underneath. We chain: prompt injection → tool-call abuse → data exfiltration. We test for membership inference, model inversion, and PII leakage from training corpora. And we ship rules — guardrail prompts, output validators, sandboxing patterns — that your platform team can deploy on Monday.

Business impact

De-risk customer-facing LLM products before regulator or media exposure
Satisfy emerging AI-governance frameworks: EU AI Act, India DPDP Act AI-system controls, NIST AI RMF, ISO/IEC 42001
Catch RAG-pipeline data leakage before it becomes a customer-data incident
Validate agentic systems (function-calling, tool-use, MCP) for unintended actions

Methodology

Phased delivery — every step documented.

Interactive walkthrough of how we run a AI Pentest engagement — tap a phase to expand its activities.

Methodology · slide 1 of 7

Auto-advancing

Phase 01 / 7

3 activities

1 · Threat model & scope

Architecture review: model, RAG, agent tools, fine-tune pipeline, deployment surface
Data-flow review: training data, embeddings, vector store, output channels
Threat model aligned to OWASP LLM Top 10 + MITRE ATLAS

Tooling

Industry-standard + custom.

We use the same tooling top BFSI red teams operate — combined with Macksofy in-house extensions and proprietary scripts where commercial tools fall short.

Tools we operate

PyRIT (Microsoft AI red-teaming)Garak (LLM vulnerability scanner)promptfoo (regression + eval)LLM Guard / Lakera Guard / Guardrails AICustom Macksofy prompt-injection corpusMITRE ATLAS technique playbooksOpenAI Evals + Inspect AI

Industries served

Sectors we operate in

SaaS & Product CompaniesFintech & PaymentsBanking & Financial ServicesHealthcare & HealthTechEdTechGovernment & PSUE-commerce & D2CSeries-A to Series-D startups

Deliverables

What you get

OWASP LLM Top 10 + MITRE ATLAS findings inventory
Reproducer prompts for every finding (copy-pasteable)
Recommended guardrail prompts + output-validator rules
RAG pipeline + vector-store hardening checklist
Agent tool-use sandboxing patterns
Training-data integrity + supply-chain risk register
Free retest within 30 days of guardrail deployment

Case studies

Anonymized engagement snapshots.

Fintech Chatbot SaaS (Bengaluru)

Scope · Customer-facing GPT-4o-powered support agent + RAG over support docs

Finding: Indirect prompt injection via a poisoned support article let an attacker exfiltrate other tenants' chat history via the agent's retrieval tool

Critical — patched via guardrail prompt + retrieval-tenancy enforcement; cross-tenant leakage closed before launch

Risk severity · Critical

LMHC

HealthTech Triage Bot (Mumbai)

Scope · Patient-symptom triage LLM with EMR tool-call access

Finding: Tool-call confused-deputy: a craftily-phrased patient prompt caused the agent to retrieve another patient's record via the EMR lookup tool

Critical — authz enforced at the tool layer not the agent layer; remediated before clinical rollout

Risk severity · Critical

LMHC

Indicative pricing · INR

Transparent tiers. No surprises at quote time.

Indicative price ranges based on typical Indian engagements. Final fixed-price quote within 72 hours of the discovery call.

Free 30-day retest · CERT-In format reports

Tier 01

Focused

₹2.5L–₹5L

Single asset or app

Manual + tooled testing
CERT-In format report
Free 30-day retest

Request a fixed-price quote

Tier 02

Stack

₹6L–₹12L

Multi-asset engagement

Everything in Focused
Web + API + mobile coverage
Executive + technical briefings

Request a fixed-price quote

Tier 03

Programme

Starts at ₹15L

Quarterly retainer · large estate

Everything in Stack
Quarterly cycles + post-release retests
Same consultants throughout

Request a fixed-price quote

Note · Indicative pricing in INR. Final quote depends on scope, asset count and engagement window. Fixed-price proposal within 72 hours.

What clients say · Trusted India + UAE

Rated 4.9 ★ from 612 client reviews.

CERT-In Empanelled

Govt of India · MeitY

EC-Council ATC

Authorized Training

ISO 27001 Certified

Info Security Mgmt

“We've worked with three Big 4 firms before Macksofy. None found what their team did in our payments stack. The most actionable report we've received in a decade.”

Aisha Khan

Information Security Manager · Listed Fintech · BKC, Mumbai

“The CHFI training Macksofy delivered for our cyber cell raised investigation quality measurably. Practical, India-context-aware, and respectful of our operational realities.”

Inspector K. Joshi

Cyber Cell · Maharashtra Police · Mumbai

“Came in with zero security background. 5 weeks later I was running Burp Suite and Metasploit confidently. Cleared CEH on the first attempt.”

Vivek Iyer

DevSecOps Lead · Healthcare SaaS · Hyderabad

Read all 612 reviews on Google →

FAQ

Things people ask before signing.

Both layers. Foundation-model behaviour (jailbreak resistance, refusal patterns) is part of the assessment, but the highest-impact findings usually live in your application layer — the RAG pipeline, the agent tools, the guardrails. We test the full stack.

Related services

Talk to us

Get a fixed-price proposal in 48 hours.

Tell us about your security need — pentest, audit, training or a wider engagement. A senior consultant will reply within a few business hours.

CERT-In Empanelled

Information Security Auditor · India

CERT-In Empanelled
EC-Council ATC · CompTIA Authorized
20,000+ professionals trained
India + UAE engagements

AI / LLM Security Testing

A AI Pentest engagement, in plain language.

Phased delivery — every step documented.

1 · Threat model & scope

Industry-standard + custom.

Sectors we operate in

What you get

Anonymized engagement snapshots.

Scope · Customer-facing GPT-4o-powered support agent + RAG over support docs

Scope · Patient-symptom triage LLM with EMR tool-call access

Transparent tiers. No surprises at quote time.

Focused

Stack

Programme

Rated 4.9 ★ from 612 client reviews.

Things people ask before signing.

Often paired with this engagement.

Penetration Testing

Vulnerability Assessment & Penetration Testing (VAPT)

Web Application Security Testing

Get a fixed-price proposal in 48 hours.