If you've been in DevOps long enough, you've seen it. A developer adds a quick debug line to trace a request, and suddenly full names, email addresses, credit card numbers โ or worse, JWT tokens and API keys โ are flowing into your centralized logging platform. By the time anyone notices, thousands of log lines have been indexed, replicated, and cached across your entire observability stack.
This isn't a hypothetical. In 2023, a major fintech company was fined โฌ2.3 million under GDPR after customer transaction data was found unredacted in application logs accessible to third-party monitoring vendors. The logs had been sitting there for 14 months.
The fix isn't complicated โ but it does require a deliberate approach.
Why Sensitive Data Ends Up in Logs in the First Place
Log hygiene is usually an afterthought. Developers are focused on making features work, and adding log.debug("Processing request for user: " + user.toString()) feels harmless in a dev environment. The problem is that toString() on a user object often serializes the entire model โ including fields like email, phone, ssn, or password_hash.
Other common culprits:
- HTTP request/response logging โ Full request bodies logged for debugging, containing form data with passwords or credit card numbers
- Error stack traces โ Exception messages that include query parameters or object state with PII
- Authentication middleware โ JWT tokens, session cookies, or API keys logged during auth flows
- Webhook payloads โ Third-party webhook bodies (Stripe, Twilio, etc.) often contain customer data
- Database query logs โ Raw SQL with interpolated user-supplied values
The scary part is that none of this is malicious. It's just normal development behaviour that wasn't reviewed through a security lens.
The Regulatory Dimension: GDPR, HIPAA, and SOC 2
If you're operating in Europe or handling European customer data, GDPR Article 5 requires that personal data be processed in a way that ensures appropriate security, including protection against unauthorised processing. Storing PII in unencrypted, broadly accessible log files that third-party SaaS vendors ingest almost certainly violates this principle.
HIPAA (for US healthcare) is even stricter. Protected Health Information (PHI) in logs constitutes a potential breach event, regardless of whether the logs were accessed by an unauthorised party. The fact that it was accessible is enough to trigger reporting obligations.
SOC 2 Type II auditors are increasingly scrutinising log pipelines as part of access control and data classification checks. Auditors want to see that your logs don't contain sensitive data โ and that you have controls to prove it.
The bottom line: log masking isn't just good practice. For many organisations, it's a compliance requirement.
What to Redact: A Practical PII Checklist
Before you can mask anything, you need to know what you're looking for. Here's a working list of the data types that most commonly appear in application logs:
| Data Type | Example Pattern | Risk Level |
|---|---|---|
| Email addresses | [email protected] | High |
| Phone numbers | +1-555-867-5309 | High |
| Credit card numbers | 4111 1111 1111 1111 | Critical |
| Social Security Numbers | 123-45-6789 | Critical |
| JWT tokens | eyJhbGciOiJIUzI1... | Critical |
| API keys / secrets | sk_live_..., AKIA... | Critical |
| IP addresses | 192.168.1.100 | Medium |
| Names + addresses | John Smith, 42 Baker St | High |
| Auth headers | Bearer eyJ... | Critical |
| Passwords (plaintext) | password=mysecret | Critical |
Four Approaches to Masking Sensitive Data in Logs
1. Application-Level Masking (Best Practice)
The ideal place to mask sensitive data is at the source โ in your application code, before the log message is even written. Most logging libraries support custom serializers or filters.
In Node.js with pino:
const pino = require('pino');
const logger = pino({
serializers: {
req(req) {
return {
method: req.method,
url: req.url,
// deliberately omit req.body โ never log raw request bodies
};
}
}
});
In Python with the standard logging module, add a custom Filter class that scrubs patterns before records are emitted:
import logging
import re
class PIIFilter(logging.Filter):
PATTERNS = [
(re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'), '[EMAIL REDACTED]'),
(re.compile(r'\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b'), '[CARD REDACTED]'),
(re.compile(r'eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+'), '[JWT REDACTED]'),
]
def filter(self, record):
record.msg = self._scrub(str(record.msg))
return True
def _scrub(self, text):
for pattern, replacement in self.PATTERNS:
text = pattern.sub(replacement, text)
return text
2. Log Shipper Filters (Fluent Bit / Fluentd)
If you can't modify application code โ or you're dealing with third-party services โ the next best option is filtering at the log shipper level. Fluent Bit supports a lua filter for regex replacement:
[FILTER]
Name lua
Match *
Script redact_pii.lua
call redact
-- redact_pii.lua
function redact(tag, timestamp, record)
local log = record["log"] or ""
-- Redact email addresses
log = string.gsub(log, "[%w%.]+@[%w%.]+%.[%a]+", "[EMAIL REDACTED]")
-- Redact JWT tokens
log = string.gsub(log, "eyJ[A-Za-z0-9_%-]+%.[A-Za-z0-9_%-]+%.[A-Za-z0-9_%-]+", "[JWT REDACTED]")
record["log"] = log
return 1, timestamp, record
end
3. Logstash Mutate + Gsub Filter
If your stack runs on ELK with Logstash as the pipeline:
filter {
mutate {
gsub => [
"message", "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b", "[EMAIL REDACTED]",
"message", "eyJ[A-Za-z0-9_\-]+\.[A-Za-z0-9_\-]+\.[A-Za-z0-9_\-]+", "[JWT REDACTED]",
"message", "(?i)(api[_-]?key|secret|token|password)\s*[:=]\s*\S+", "[SECRET REDACTED]"
]
}
}
4. Use an Online Log Masker Before Sharing Logs
There's one scenario that almost everyone overlooks: sharing logs manually โ pasting log snippets into Slack, Jira tickets, or support tickets to debug an issue. This is where a lot of accidental PII leakage happens in practice. Before you paste a log snippet anywhere, sanitise it first.
DevOpsArsenal Log Masker & Sensitive Data Anonymizer
Detects and redacts emails, phone numbers, credit cards, JWT tokens, API keys, AWS credentials, IP addresses, and more โ all in your browser, with nothing sent to a server. Sanitise any log snippet in seconds before sharing it in Slack, Jira, or a support ticket.
Try Log Masker Free โBest Practices Summary
- Never log raw request or response bodies in production. Log structured metadata instead (method, URL, status code, duration).
- Treat logs as untrusted data โ apply the same data classification rules you'd apply to a database table.
- Audit your logs quarterly โ run regex scans across a sample of recent logs to check for PII leakage patterns you haven't caught yet.
- Set log retention policies โ even if logs contain no PII today, retention limits reduce the blast radius of future mistakes.
- Test your redaction rules โ add log masking unit tests alongside your application tests. Masking rules break when log formats change.
- Use structured logging โ JSON logs with explicit fields are far easier to sanitise than free-form text strings.
Frequently Asked Questions
[EMAIL REDACTED]) while preserving the log structure. Anonymization goes further by removing or transforming data in a way that makes re-identification impossible. For logs, masking is usually sufficient and is much easier to implement and reverse if needed for debugging.Found this useful? DevOpsArsenal has 50+ free tools for DevOps engineers, cloud architects, and developers โ from Kubernetes YAML validators to SLA calculators. No sign-up required.