Toolkit & Resources — PGN Limited

Testing Toolkit

The Tools
We Use

Every tool we use is chosen for a specific purpose. We do not apply a generic stack — our tooling combinations are built around your deployment architecture, regulatory context, and testing objectives.

pgn test --list-tools --certified-uk-compliant

# UK-GDPR certified tool stack

ragas evaluation · uk-financial, nhs, legal

deepeval evaluation · all sectors

promptfoo security · fca, ico adversarial

garak security · ncsc-aligned

k6 load · uk sector peak patterns

locust load · chaos + black friday

langsmith tracing · audit trail generation

helicone observ. · uk data residency

arize-phoenix observ. · semantic drift

mlflow tracking · ico-defensible audit

Evaluation

RAGAS

RAG evaluation framework, extended with UK domain-specific ground-truth datasets for financial services, healthcare clinical guidelines, and legal clause libraries.

RAG Pipelines

Evaluation

DeepEval

LLM unit testing framework with pytest integration. Supports LLM-as-judge scoring, hallucination detection, and CI/CD-ready test suites across any model or provider.

All Service Lines

Security

Promptfoo

Red team testing with UK-specific adversarial scenarios — regulatory evasion, UK financial regulator disclosure manipulation, UK data protection authority data extraction attempts, and social engineering patterns.

Red Team

Security

Garak

LLM vulnerability scanner aligned to UK AI Security Guidance Principles and OWASP LLM Top 10. Sector-specific probe libraries for UK financial, clinical, and legal contexts.

Security Audit

Load Testing

k6 & Locust

Load generation at UK sector peak traffic patterns — healthcare surge periods, financial quarter-end reporting, and peak retail load profiles.

Performance

Observability

LangSmith

Pipeline tracing and debugging with audit trail generation. Produces the structured logs required for UK regulatory framework model validation and regulatory accountability documentation.

Audit Trail

Observability

Helicone

LLM observability and cost monitoring with UK data residency support. Configured for GDPR-compliant log storage within Azure UK South or AWS eu-west-2.

Monitoring

Observability

Arize Phoenix

Tracing and semantic drift detection for production LLM systems. Alerts on quality degradation before it becomes a regulatory or operational issue.

Production Monitoring

Tracking

MLflow

Experiment tracking and model registry with UK data protection authority-defensible audit trail configuration. Every test run is logged with full reproducibility — essential for regulatory evidence packs.

Compliance Evidence

Tracking

Weights & Biases

Model performance tracking with visualised evidence packs designed for presentation to UK regulatory institutions and internal auditors — not just technical teams.

Regulatory Evidence

Evaluation

OpenAI Evals

Standardised evaluation harness extended with UK domain-specific test sets for financial services, healthcare, and legal clause consistency.

Benchmarking

Cloud

Azure OpenAI & AWS Bedrock

All testing conducted within UK data residency boundaries — Azure UK South and AWS eu-west-2. GCP europe-west2 available on request. Data sovereignty guaranteed.

UK Data Residency

Regulatory Frameworks

UK Compliance
Coverage

Every engagement produces evidence aligned to the specific regulatory frameworks your organisation must satisfy.

Financial Services

UK regulatory framework Model Risk

Three lines of defence documentation for AI model risk

Model inventory and validation evidence

Ongoing monitoring framework design

Independent challenge documentation for internal audit

Change management evidence for model updates

Data & Privacy

Regulatory Accountability & UK GDPR

DPIA-supporting technical assessment documentation

Automated decision-making transparency audit

Subject access and data minimisation review

Incident response logging configuration

Bias assessment for protected characteristics

Healthcare

sector compliance standards & sector regulatory standards

Technical security measures documentation for sector-specific compliance

UK sector regulator AI as a Medical Device classification support

Clinical safety documentation (DCB0129 / DCB0160)

sector regulator standards alignment evidence

Sector-specific security boundary configuration review

Government & AI Safety

UK AI Standards & UK cybersecurity guidance

UK AI Security Guidance Principles alignment assessment

UK AI Standards Responsible AI evaluation framework

OWASP LLM Top 10 coverage report

Online Safety Act 2023 content risk assessment

EU AI Act cross-border compliance mapping

Expertise

Who Does the Work

Our team is drawn from UK AI research, information security, and regulatory compliance backgrounds.

🤖

ML Practitioners & NLP Researchers

AI engineers and researchers with hands-on experience in production LLM deployments across UK regulated sectors.

PyTorch Transformers RAGAS MLflow

🛡️

AI Red Team Specialists

Penetration testing and adversarial AI specialists with experience in UK financial services and government security contexts.

OWASP LLM Garak Promptfoo UK cybersecurity guidance

📋

UK Regulatory Specialists

Former compliance professionals with direct experience of UK regulatory institutions across financial services, healthcare, legal, and public sector environments.

UK regulatory framework UK data protection authority GDPR sector compliance standards UK sector regulator

☁️

UK Cloud & MLOps Engineers

Cloud architects with UK data residency specialism — Azure UK South, AWS eu-west-2, and private on-premises LLM infrastructure.

Azure UK South AWS eu-west-2 Kubernetes CI/CD

Quarterly Reports

UK AI Risk Reports

Practical guidance on UK regulatory developments and LLM risk — written for technical and compliance teams, not for press releases.

Subscribe to the Quarterly Report

Delivered to your inbox each quarter. No marketing email — just the report.

Subscribe →

Q1 2025

UK regulatory framework: What UK Financial Services Need to Know About LLM Model Risk

Practical guide to the regulatory guidance model risk management framework as applied to LLM deployments — validation requirements, ongoing monitoring, and what internal audit will ask.

Request report

Q4 2024

AI Risk Classification: A Practical Guide for UK Healthcare Digital Teams

How to navigate UK sector regulator AI as a Medical Device classification for clinical LLM applications — software as a medical device criteria, regulatory pathway options, and documentation requirements.

Request report

Q3 2024

OWASP LLM Top 10: UK Implications and Practical Testing Approaches

What the OWASP LLM Top 10 means in practice for UK regulated deployments — from prompt injection to model theft — with sector-specific risk prioritisation guidance.

Request report

Q2 2024

AI Use in Regulated Legal Services Guidance: Implications for Legal Services Organisations

What UK regulatory institution guidance on AI use means in practice — consistency obligations, professional indemnity implications, and testing requirements for contract review and legal research tools.

Request report

Q1 2024

RAG in Regulated Environments: Hidden Risks and Testing Approaches

Why retrieval-augmented generation pipelines require a different testing approach — and what happens when retrieval quality, source attribution, or context window management goes wrong.

Request report

Q4 2023

UK AI Safety Institute: What the Evaluation Framework Means for Regulated Sectors

An overview of the UK AI Standards evaluation framework and its practical implications for financial services, healthcare, and government AI deployments under UK law.

Request report

Our Testing
Toolkit

The Tools
We Use

RAGAS

DeepEval

Promptfoo

Garak

k6 & Locust

LangSmith

Helicone

Arize Phoenix

MLflow

Weights & Biases

OpenAI Evals

Azure OpenAI & AWS Bedrock

UK Compliance
Coverage

UK regulatory framework Model Risk

Regulatory Accountability & UK GDPR

sector compliance standards & sector regulatory standards

UK AI Standards & UK cybersecurity guidance

Who Does the Work

ML Practitioners & NLP Researchers

AI Red Team Specialists

UK Regulatory Specialists

UK Cloud & MLOps Engineers

UK AI Risk Reports

Subscribe to the Quarterly Report

UK regulatory framework: What UK Financial Services Need to Know About LLM Model Risk

AI Risk Classification: A Practical Guide for UK Healthcare Digital Teams

OWASP LLM Top 10: UK Implications and Practical Testing Approaches

AI Use in Regulated Legal Services Guidance: Implications for Legal Services Organisations

RAG in Regulated Environments: Hidden Risks and Testing Approaches

UK AI Safety Institute: What the Evaluation Framework Means for Regulated Sectors

Start with a Free
Risk Assessment

Our TestingToolkit

The ToolsWe Use

RAGAS

DeepEval

Promptfoo

Garak

k6 & Locust

LangSmith

Helicone

Arize Phoenix

MLflow

Weights & Biases

OpenAI Evals

Azure OpenAI & AWS Bedrock

UK ComplianceCoverage

UK regulatory framework Model Risk

Regulatory Accountability & UK GDPR

sector compliance standards & sector regulatory standards

UK AI Standards & UK cybersecurity guidance

Who Does the Work

ML Practitioners & NLP Researchers

AI Red Team Specialists

UK Regulatory Specialists

UK Cloud & MLOps Engineers

UK AI Risk Reports

Subscribe to the Quarterly Report

UK regulatory framework: What UK Financial Services Need to Know About LLM Model Risk

AI Risk Classification: A Practical Guide for UK Healthcare Digital Teams

OWASP LLM Top 10: UK Implications and Practical Testing Approaches

AI Use in Regulated Legal Services Guidance: Implications for Legal Services Organisations

RAG in Regulated Environments: Hidden Risks and Testing Approaches

UK AI Safety Institute: What the Evaluation Framework Means for Regulated Sectors

Start with a FreeRisk Assessment

Our Testing
Toolkit

The Tools
We Use

UK Compliance
Coverage

Start with a Free
Risk Assessment