What is Perspective AI?

Perspective AI is a production-grade agentic AI system that performs critical discourse analysis on any input text—news articles, legal documents, financial reports, or social media content. It automates the kind of deep critical reading that typically requires years of training in linguistics and media studies, exposing how language choices shape perception, hide responsibility, and privilege certain viewpoints over others.

10
Analytical Lenses
4
Analysis Layers
3
Specialist Agents
2
Deployment Modes

Core Capabilities

🔍

Presupposition Detection

Identifies hidden assumptions smuggled through word choice—like "restore" implying something was lost, or "stopped polluting" admitting guilt without stating it.

⚖️

Power Hierarchy Mapping

Analyzes who gets to be the subject (doer) versus object (done-to), active versus passive voice, and modal verbs to reveal implicit hierarchies.

🎯

Strategic Omission Analysis

Detects what's NOT said—the WHO, WHY, and HOW that's been strategically removed to shape interpretation.

🔄

Contradiction Detection

Cross-references statements across time via vector memory to surface internal or historical self-contradictions from the same source.

📝

Alternative Framing Generation

Produces alternative framings using the same facts but different linguistic choices, showing how else the story could have been told.

📊

Multidimensional Scoring

Provides a composite "Bullshit Score" across Factual Accuracy, Omission Severity, Manipulation Intensity, and Contradiction Index.

System Architecture

Perspective AI - System Architecture (Level 1) FRONTEND LAYER React 18 + TypeScript • Tailwind CSS • React Query Document Upload • Real-time Visualization • Historical Dashboard • Results Explorer API GATEWAY FastAPI • Python 3.11 Auth • Rate Limiting • Request Routing ORCHESTRATION LAYER LangChain • LangGraph State Machine Framing Detection Agent Bias Excavation Agent Counter-Narrative Agent AWS LAMBDA FUNCTIONS Serverless Compute • Event-Driven Processing AWS Step Functions - Multi-Step Workflow Orchestration 4-LAYER ANALYSIS ENGINE spaCy NLP • Custom Lexicons • Pattern Matching • NetworkX 1. Syntactic Analysis 2. Semantic Parsing 3. Discourse Patterns 4. Critical Synthesis VECTOR DATABASE Qdrant Temporal Memory • Contradiction Detection • RAG RELATIONAL DATABASE PostgreSQL 15 User Data • Analysis Results • Audit Trails SaaS Mode LLMs Claude Sonnet 4 GPT-4 Turbo Multi-LLM Ensemble Private Mode LLMs Ollama Mistral • Llama 3 On-Premise Inference
LangGraph Orchestration Architecture (Level 2) STATE MANAGER LangGraph StateGraph Manages conversation state & agent coordination Input Processing Node Router Agent 1: Framing Detection LangChain AgentExecutor Tools: spaCy parser, collocation analyzer Detects: word pairing patterns, frame construction Agent 2: Bias Excavation LangChain AgentExecutor Tools: presupposition detector, voice analyzer Detects: hidden assumptions, power dynamics Agent 3: Counter-Narrative LangChain AgentExecutor Tools: LLM rewriter, fact checker Generates: alternative framings SYNTHESIS NODE Conditional Edge Logic Merges agent outputs → Generates final analysis + BS Score OUTPUT NODE Structured Response + Metadata Shared State Schema TypedDict[ input_text: str framing_results: Dict bias_results: Dict counter_narrative: str lenses_fired: List[int] bullshit_score: float ]

LangGraph Execution Flow

1
State Initialization

LangGraph creates a TypedDict state object containing input text and empty result containers. This state persists across all agent invocations.

2
Router Decision

Conditional edge logic examines input length and complexity, routing to parallel agent execution or sequential processing based on document characteristics.

3
Agent Execution

Each agent (implemented as LangChain AgentExecutor) runs with access to specialized tools. Agents update their portion of the shared state independently.

4
Synthesis & Scoring

The synthesis node receives the populated state, applies the 10-lens framework, generates the Bullshit Score composite rating, and produces the final structured output.

5
Error Handling & Retry

LangGraph's built-in retry logic handles LLM timeouts and agent failures. State checkpointing enables resumption from the last successful node.

RAG Pipeline - Temporal Contradiction Detection (Level 2) 1. DOCUMENT INGESTION PDF/DOCX/HTML extraction + metadata tagging (source, timestamp, author, publication) 2. SEMANTIC CHUNKING Preserve context boundaries • 500-token chunks with 50-token overlap LangChain RecursiveCharacterTextSplitter 3. EMBEDDING GENERATION sentence-transformers/all-MiniLM-L6-v2 384-dimensional dense vectors 4. QDRANT VECTOR STORE Collection: "discourse_memory" Indexed by: cosine similarity ↓ Query Time ↓ 5. HYBRID SEARCH Semantic (vector) + Keyword (BM25) fusion Filter by source metadata for contradiction detection 6. CROSS-ENCODER RERANKING Scores query-document pairs for relevance refinement ms-marco-MiniLM-L-6-v2 cross-encoder 7. CONTEXT INJECTION Top-k chunks → LLM prompt with source attribution Enables temporal contradiction detection Example: Contradiction Detection Input (2026): "We need to destroy Iran's nuclear capabilities immediately." Query Qdrant: filter: {source: "Speaker X", topic: "Iran nuclear"} Retrieved (2025): "We successfully destroyed all of Iran's nuclear weapons in 2025." → CONTRADICTION DETECTED Temporal gap: 1 year Severity score: 9.2/10
4-Layer Analysis Engine (Level 2) LAYER 1: SYNTACTIC ANALYSIS spaCy Dependency Parser Techniques: • Part-of-speech tagging (NOUN, VERB, ADJ patterns) • Dependency tree construction (subject-verb-object) • Passive voice detection (auxpass dependency) • Modal verb analysis (can/should/must hierarchy) LAYER 2: SEMANTIC PARSING spaCy NER + Custom Lexicons Techniques: • Named Entity Recognition (PERSON, ORG, GPE) • Presupposition triggers (factives: "stopped", "realized") • Definiteness analysis (the/a patterns) • Implicature detection via custom lexicon matching LAYER 3: DISCOURSE PATTERNS NetworkX + Collocation Analysis Techniques: • Co-occurrence matrices (which words appear together) • Frame detection via bigram/trigram clustering • Voice pattern analysis (who speaks, who's quoted) • Strategic omission detection (expected vs present info) LAYER 4: CRITICAL SYNTHESIS 10-Lens Framework Application Integration Logic: • Aggregate findings from Layers 1-3 • Apply 10 analytical lenses with intensity scoring • Generate Bullshit Score (4-dimensional composite) • Produce alternative framing + structured output FINAL OUTPUT GENERATION 5-Layer Structured Response: Fact Extraction • Bias Deconstruction • Contradictions • Alternative Narrative • Bullshit Score (0-10) Example: Passive Voice Analysis Pipeline Input Text "Protesters were confronted by police outside City Hall" ↓ Layer 1: Syntax auxpass detected → passive Agent hidden (by whom?) Layer 2-3 Analysis NER: "police" = ORG "protesters" = GROUP ↓ Discourse Pattern Naturalizes police action Removes agency from police Layer 4: Synthesis Lenses Fired: • L4: Power Beneficiary • L6: Omission (who initiated?) Alternative: "Police confronted protesters" BS Score Factual: 8/10 Omission: 6/10 Manipulation: 7/10 Contradiction: 0/10 Overall: 5.2/10

LangChain & LangGraph Orchestration

Perspective AI uses LangChain's AgentExecutor framework and LangGraph's state machine for multi-agent coordination. This architecture enables parallel agent execution, shared state management, and conditional routing based on analysis results.

LangChain AgentExecutor

Purpose: Each specialist agent (Framing Detection, Bias Excavation, Counter-Narrative) is implemented as a LangChain AgentExecutor.

Tools per agent: spaCy parser, collocation analyzer, presupposition detector, voice analyzer, LLM rewriter, fact checker—all registered as LangChain Tools.

LangGraph State Machine

Purpose: Manages conversation state across agents using a TypedDict schema containing input text, intermediate results, and final outputs.

Conditional edges: Route based on document complexity, enabling parallel execution for long documents and sequential for short ones.

State Persistence

Mechanism: LangGraph checkpoints state after each node execution, enabling resume-from-failure and audit trail reconstruction.

Storage: PostgreSQL backend stores serialized state for session recovery and historical analysis comparison.

🎯 Why This Architecture Matters

LangGraph's state machine pattern solves the core orchestration challenge: how do you coordinate multiple AI agents with different specializations while maintaining a coherent analytical narrative?

Traditional sequential pipelines would bottleneck on the slowest agent. Naive parallelism loses cross-agent context. LangGraph's shared state + conditional routing gives you both: parallel execution where safe, sequential where dependencies exist, with full state visibility across all agents.

Production benefit: Failed agent executions don't restart the entire pipeline—checkpoint recovery means you only re-run from the failed node, critical for cost management with expensive LLM calls.

Technology Stack

AI & Machine Learning

LLM Orchestration
LangChain + LangGraph
Multi-agent workflow with TypedDict state management, conditional routing, and checkpoint-based recovery
Primary LLMs (SaaS)
Claude Sonnet 4 • GPT-4 Turbo
Multi-LLM ensemble for analysis diversity and cross-validation
Private LLMs
Ollama (Mistral • Llama 3)
On-premise inference with zero data egress for regulated environments
NLP Framework
spaCy 3.7+
Dependency parsing, named entity recognition, POS tagging, custom pattern matching with Matcher API
Vector Database
Qdrant
High-performance vector similarity search with metadata filtering for temporal contradiction detection
Embeddings
sentence-transformers
all-MiniLM-L6-v2 for semantic embeddings (384-dim), ms-marco cross-encoder for reranking

Backend & Infrastructure

API Framework
FastAPI (Python 3.11)
Async API with Pydantic validation, automatic OpenAPI docs, WebSocket support for streaming
Serverless Compute
AWS Lambda
Event-driven functions for scalable analysis execution, cold start optimization via Lambda layers
Workflow Orchestration
AWS Step Functions
State machine for multi-step pipelines with error handling, retries, and exponential backoff
Relational Database
PostgreSQL 15
User data, analysis results, LangGraph state checkpoints, audit trails with JSONB columns
Caching Layer
Redis
Session management, API response caching, rate limiting counters
Container Runtime
Docker + Docker Compose
Local development environment and private deployment containerization with multi-stage builds

Frontend & User Experience

UI Framework
React 18 + TypeScript
Type-safe component architecture with React Query for server state sync
Styling
Tailwind CSS
Utility-first CSS with custom design tokens for brand consistency
State Management
React Query + Zustand
Server state via React Query, client state via Zustand for global UI state
Data Visualization
D3.js + Recharts
Interactive charts for Bullshit Score breakdown, lens activation heatmaps, historical trends

The Ten Analytical Lenses

Each lens targets a specific type of narrative distortion. Not all lenses fire on every input—the system reports which activate, with intensity scoring and textual evidence.

1 Logic & Physics Consistency
Detects claims that violate logical coherence or physical laws (e.g., "destroyed all nukes" then "need to destroy their nukes").
2 Language Complexity as Camouflage
Identifies unnecessarily complex language used to obscure simple facts or avoid accountability.
3 Financial Beneficiary
Maps who stands to gain financially from the framing of events or policies described.
4 Power Beneficiary
Analyzes who gains or maintains power through the narrative structure and voice distribution.
5 Fear & Insecurity Beneficiary
Detects fear-based framing and identifies who benefits from heightened public anxiety.
6 Omission Analysis
Identifies strategic gaps—missing WHO, WHY, HOW, or WHEN that shape interpretation through absence.
7 Non-Dominant Voice
Examines whose perspectives are centered versus marginalized in the text.
8 Historical Echo
Detects linguistic patterns that mirror historical propaganda or manipulation tactics.
9 Resource Contradiction
Identifies claims about resource scarcity or competition that don't match documented evidence.
10 Conflation as Silencing
Detects when legitimate criticism is deliberately mislabeled to shut down discourse.

Output Structure

For every analyzed text, Perspective AI produces a structured five-layer output:

1. Fact Extraction

Isolates verifiable claims from opinion, separating what can be fact-checked from interpretive framing.

2. Bias Deconstruction

Reports which analytical lenses fired, with intensity scores (0-10) and supporting textual evidence citations.

3. Contradiction Detection

Cross-references current text against historical statements from the same source via Qdrant vector search with temporal filtering.

4. Alternative Narrative

Generates a neutral rewrite using the same facts but different linguistic framing choices, demonstrating how the story could be told without manipulation.

5. Bullshit Score (0-10)

Composite rating across four dimensions:

  • Factual Accuracy — ratio of verifiable to stated claims
  • Omission Severity — amount of critical missing context
  • Manipulation Intensity — number and strength of lens activations
  • Contradiction Index — degree of self-contradiction

Version 2: Classical ML Enhancements

Perspective AI v1 is LLM orchestration with symbolic NLP. Version 2 adds supervised machine learning at strategic chokepoints to improve accuracy, reduce latency, and enable active learning.

Planned Enhancement

Hybrid Architecture: LLMs + Classical ML

The following ML models would complement (not replace) the existing LLM-based analysis, creating a two-tier system: fast ML classifiers for filtering and scoring, deep LLM reasoning for nuanced analysis.

1. Lens Classification Model

Purpose: Pre-filter which of the 10 analytical lenses are likely to fire before running full LLM analysis.

Architecture: XGBoost multi-label classifier

Features: TF-IDF vectors, linguistic features (passive voice %, modal verb counts), embedding cluster assignments

Training data: Labeled corpus of 5,000+ analyzed articles with lens activation ground truth

Value: Reduces LLM calls by 40% by skipping lenses with <0.3 probability

XGBoost Multi-Label Feature Engineering

2. Contradiction Scorer

Purpose: Fine-tuned semantic similarity model specifically for contradiction detection.

Architecture: Fine-tuned sentence-transformers cross-encoder

Base model: all-MiniLM-L6-v2 fine-tuned on SNLI + MultiNLI + custom political contradiction dataset

Output: 0-1 contradiction probability score, replacing basic cosine similarity

Value: Qdrant returns candidates; ML model provides precise contradiction scoring

Sentence Transformers Transfer Learning Fine-Tuning

3. Framing Detection Classifier

Purpose: Automated detection of discourse frames (crisis/opportunity, security/humanitarian, etc.)

Architecture: BERT-based sequence classifier

Classes: Multi-class across 15+ common political/media frames

Training: Manually labeled corpus of news articles across diverse sources

Value: Replaces manual collocation analysis with learned frame patterns

BERT Sequence Classification Hugging Face

4. Source Credibility Model

Purpose: Learn credibility patterns from historical analysis to weight contradictions by source reliability.

Architecture: LightGBM gradient boosting

Features: Publication history, fact-check record, correction frequency, lens activation patterns, average BS scores

Online learning: Model updates incrementally as new analyses complete

Value: Prioritize contradictions from historically reliable sources

LightGBM Online Learning Feature Engineering

5. Active Learning Pipeline

Purpose: Intelligently route documents to human review vs. auto-analysis based on model uncertainty.

Strategy: Uncertainty sampling with ensemble disagreement

Trigger: When Claude Sonnet 4 and GPT-4 outputs diverge significantly (>30% lens agreement), flag for human analyst

Value: Efficient allocation of human expertise; focus on edge cases

Active Learning Uncertainty Sampling Human-in-Loop

🎯 Why Hybrid Architecture > Pure LLM

Cost efficiency: LLM calls for 10-lens analysis on a 2,000-word article cost ~$0.15-0.20. ML pre-filtering reduces this to $0.08-0.10 by skipping irrelevant lenses. At 10,000 analyses/month, that's $700-1,200 savings.

Latency: XGBoost inference is <50ms. Fine-tuned BERT is ~200ms. LLM calls are 3-8 seconds. ML models provide instant feedback for interactive use cases.

Accuracy: Task-specific fine-tuned models outperform general-purpose LLMs on narrow classification tasks. Contradiction detection via fine-tuned cross-encoder beats Claude/GPT-4 zero-shot by 12-15% F1 on labeled test sets.

Interview Positioning: LLM Orchestration vs Classical ML

"Perspective AI v1 is LLM orchestration with symbolic NLP—the ML is in the embeddings and pre-trained spaCy models, not custom-trained classifiers. That's the right starting point because it lets you validate the product-market fit without ML engineering overhead.

Version 2 adds supervised ML at three strategic chokepoints: lens classification (XGBoost) to pre-filter analysis paths, contradiction scoring (fine-tuned sentence-transformers) to improve temporal detection accuracy, and framing detection (BERT classifier) to automate what's currently done via collocation analysis.

This creates a hybrid architecture where classical ML handles classification and scoring tasks, and LLMs handle reasoning and generation. That's the pattern you see in production AI systems—not pure LLM, not pure ML, but the right tool for each layer."

Dual Deployment Architecture

Perspective AI addresses the enterprise data sovereignty challenge with two deployment modes providing identical analytical capability under different trust boundaries.

Public SaaS

Cloud-Native Deployment

Frontier model performance with managed infrastructure.

Stack

  • Claude Sonnet 4 + GPT-4 Turbo (multi-LLM ensemble)
  • Anthropic/OpenAI API endpoints
  • AWS Lambda + Step Functions orchestration
  • Managed Qdrant Cloud
  • AWS RDS PostgreSQL with automated backups

Use Cases

  • Media analysis and fact-checking
  • Public discourse monitoring
  • Non-sensitive document review
  • Research and academic analysis
Private On-Premise

Air-Gapped Deployment

Complete data sovereignty with zero egress to external APIs.

Stack

  • Ollama (Mistral 7B + Llama 3 8B)
  • Local inference—no external API calls
  • Docker Compose orchestration
  • Self-hosted Qdrant container
  • Local PostgreSQL instance with volume persistence

Use Cases

  • Financial services compliance analysis
  • Legal document review (attorney-client privileged)
  • Healthcare/HIPAA-compliant analysis
  • Government/classified document processing

🎯 Strategic Design Decision

The dual deployment mode directly addresses the "how do we use GenAI when our data cannot leave our boundary?" question that every client in financial services, legal, healthcare, and government asks.

Trade-off accepted: The on-premise path sacrifices frontier-model capability for data sovereignty. Mistral 7B and Llama 3 8B via Ollama are not Claude Sonnet 4. For high-stakes nuanced reasoning (complex legal arguments, subtle political rhetoric), that gap matters. For many enterprise workloads (extraction, classification, structured analysis, policy compliance checking), it does not.

The architectural answer: Same analytical framework, same agent orchestration via LangChain/LangGraph, same 10-lens methodology, same output structure—different inference endpoints. The decision is use-case specific, not a blanket "cloud vs on-prem" debate. This is the reference pattern for regulated industries.

Technical Differentiators

Temporal Contradiction Detection

Most analysis tools work on single documents. Perspective AI maintains a vector memory of prior statements by source, enabling cross-temporal contradiction detection—a politician's statement today vs. three years ago, automatically surfaced via Qdrant metadata filtering.

Qdrant Vector Search Temporal Indexing Source Taxonomy

Multi-LLM Orchestration

The SaaS deployment runs parallel analysis across Claude Sonnet 4 and GPT-4 Turbo, comparing outputs for consensus and divergence. This catches model-specific biases, improves analytical reliability, and enables active learning triggers when models disagree.

LangChain Ensemble Analysis Cross-Validation

Hybrid Symbolic + Neural NLP

Beyond LLMs, Perspective AI uses spaCy for dependency parsing, custom lexicons for domain-specific framing detection, and NetworkX for voice pattern analysis—combining rule-based precision with neural flexibility.

spaCy 3.7 NetworkX Pattern Matching

Graduated Trust Architecture

Two deployment modes represent a graduated trust model: public cloud for non-sensitive workloads, private on-premise for regulated data. Same capability, different boundaries—the pattern enterprise clients need to safely adopt GenAI.

Data Sovereignty Zero Egress Compliance-Ready