ExposureShield: Free Email Exposure Scan
This capstone presents a prototype scanner that uses publicly available breach sources and safe, ethical design choices to detect possible exposure and report risk in simple language.
Author: Eric Hiheglo • Program: Cybersecurity • Date:
Free Email Exposure Scan
Enter your email to check if it appears in known public breach databases. This scan is for awareness and education.
/api/free-scan (server-side). Your HIBP key stays on the server (Vercel environment variables).
Abstract
Individuals are frequently impacted by data breaches, credential leaks, and illegal marketplaces. While large organizations can purchase advanced monitoring tools, many people do not have access to simple and affordable exposure detection. This capstone introduces ExposureShield, a prototype scanner that detects, classifies, and reports personal exposure using publicly available breach sources. The system generates a risk level and a clear report that helps a user understand what happened and what actions to take.
The prototype integrates data acquisition, preprocessing and normalization, a classification and prioritization layer, and a reporting workflow. Evaluation focuses on standard metrics and usability outcomes, with an emphasis on privacy-first and ethical design.
1. Introduction
This section explains the problem, why it matters, and what the project delivers.
1.1 Background
Data breaches expose emails, passwords, and personal information. Attackers can resell stolen data and use it for account takeover, identity theft, phishing, and fraud.
1.2 Problem Statement
Most individuals do not have access to a reliable and ethical tool that detects and explains exposure using clear language, without requiring enterprise resources.
1.3 Purpose
The purpose is to design and evaluate a prototype that can detect, classify, and report individual exposure in a reliable and ethical manner.
Research Question & Objectives
Research Question
How can a prototype scanner, built on publicly available breach and dark-web sources, effectively detect, classify, and report individual exposure in a reliable and ethical manner?
Objectives
- Collect and normalize exposure-related data from public breach sources and controlled datasets.
- Classify exposure content and prioritize severity to reduce false alerts.
- Generate a clear report with recommended actions for the user.
- Evaluate performance using precision, recall, and F1-score, plus usability review.
2. Literature Review
This section summarizes the research foundation that informed the system design. It covers ethical data handling, text classification in cybersecurity, risk scoring, and limits in existing monitoring solutions.
| Topic Area | What the research shows | How ExposureShield uses it |
|---|---|---|
| Ethical research | Ethics and privacy are critical when analyzing breach or illicit ecosystem data. | Minimize collection, avoid harmful data handling, and protect keys server-side. |
| ML text classification | Text models can detect threat-related patterns and reduce manual review. | Use classification concepts to support reporting and prioritization. |
| Risk scoring | Prioritization reduces alert fatigue and improves practical response. | Assign simple risk levels and provide clear actions. |
| Existing tools | Many tools detect exposure but do not explain results clearly for users. | Focus on simple reporting and usability. |
3. Methodology
This section describes the research design, data sources, processing steps, and evaluation approach.
3.1 Research Design
- Applied research using a build-and-evaluate approach
- Prototype development with controlled testing
- Quantitative metrics + usability review
3.2 Data and Processing
- Public breach data sources (permitted use)
- Normalization and de-duplication
- Text preprocessing for classification and reporting
Evaluation Plan
3.3 Metrics
- Precision: reduce false positives and incorrect alerts
- Recall: detect more real exposures
- F1-score: balanced measure of performance
3.4 Usability
- Clarity of results and risk explanation
- Actionability of the recommendations
- Time to understand the outcome
4. Implementation
This section explains how the prototype is built and how data moves through the system.
4.1 High-Level Architecture
- Frontend: user enters email and views results
- Backend API: server endpoint calls breach source API securely
- Risk layer: simple risk scoring and reporting
4.2 Data Flow
- User submits an email.
- Frontend calls
/api/free-scan. - Server queries the breach source API and returns results.
- Frontend shows risk level, breaches, and recommended actions.
5. Evaluation & Results
Replace the placeholders below with your final results from testing.
| Metric | Value | Meaning |
|---|---|---|
| Precision | — | Lower false positives |
| Recall | — | Fewer missed exposures |
| F1-score | — | Overall balance |
Discussion
- What the prototype detects well
- Common false positives and why they happen
- What improved clarity for users
- Limitations and future improvements
6. Ethics, Privacy, and Compliance
ExposureShield is designed to reduce harm and protect users. The scan endpoint keeps API keys server-side and focuses on awareness and protective guidance.
- Minimal data retention
- No illegal content collection
- Server-side API key protection
- Clear user guidance and safe reporting
References (APA 7th)
Put your final references here (exactly as in your capstone paper).
- Author, A. A. (Year). Title of article. Journal Name, volume(issue), pages.
- Author, B. B. (Year). Title of study. Conference/Journal, pages.
Appendix
- Appendix A: System Architecture Diagram
- Appendix B: Model Configuration Parameters
- Appendix C: Sample Report Output