Non-Functional Requirements
...
NFR-001: Performance Targets
Requirement: The system shall meet specified performance targets for processing throughput, API latency, and rendering speed to support high-volume Nordic utilities operations.
Priority: HIGH
Priority: HIGH
Owner: Technical Architect
Performance Metrics
...
| Metric | Target | Measurement Method | Acceptance Threshold | Test Scenario |
|---|---|---|---|---|
| Batch Processing ThroughputMonthly throughput | 10M invoices/month | Monthly invoice count in Application Insights≥ 10M in peak month | Production monitoring | |
| 100K Batch Processing Timebatch processing | < 2 hours | Timestamp diff (queuedAt to completedAt)≤ 120 minutes | Load test TC-200 | |
| API Response Time response (p50) | < 200ms | Application Insights percentiles | ≤ 200ms | Load test TC-201 |
| API Response Time response (p95) | < 500ms | Application Insights percentiles | ≤ 500ms | Load test TC-202 |
| API Response Time response (p99) | < 1000ms | Application Insights percentiles≤ 1000ms | Load test TC-203 | |
| PDF Generation Time generation (p95) | < 5 seconds/invoice | Custom metric tracking≤ 5 seconds | Render test TC-204 | |
| Handlebars Rendering render (p95) | < 2 seconds/invoice | Custom metric tracking≤ 2 seconds | Render test TC-205 | |
| Queue Processing Lagprocessing lag | < 5 minutes | Queue depth / throughput calculation≤ 5 minutes | Queue monitoring | |
| Database Query Time query (p95) | < 100ms | PostgreSQL slow query log≤ 100ms | Query analysis | |
| ParserService : (10K batch) | < 2 minutes | Parse duration measurement≤ 120 seconds | Parser test TC-206 |
Load Testing Scenarios
...
Scenario 1: Steady State (Normal Month)
- Duration: 24 hours
- Load: 333K invoices evenly distributed evenly
- Concurrent batches: 5-10
- Expected: All targets met, no errors
Scenario 2: Peak Load (Heating Season)
- Duration: 8 hours
- Load: 314K invoices ( concentrated
- first week concentration)
- Concurrent batches: 20+
- Expected: 2-hour SLA met, >99% success rate
Scenario 3: Spike Test
- Duration: 30 minutes
- Load: Sudden
- 10 batch uploads
- batches (50K invoices) uploaded simultaneously
- Expected: System auto
- Auto-scales, processes without degradation
Acceptance Criteria
...
| # | Criterion | Validation Method | Target | ||
|---|---|---|---|---|---|
| 1 | Load test with 100K batch completescompletion | End-to-end test | ≤ 2 hours | ||
| 2 | API latency API maintains p95 < 500ms under load | Concurrent API requests (1000 RPS )test | p95 ≤ 500ms | ||
| 3 | 10M monthly capacitySystem processes 10M in peak month | Production monitoring (Oct-Mar) | ≥ 10M in peak month | ||
| 4 | 50-org performance | 50 concurrent uploads | No performance degradation with 50 orgs | 50 orgs upload simultaneously | All SLAs met |
| 5 | Worker auto-scaling maintains lag < 5 min | Monitor queue depth during peaks | Lag ≤ 5 min | ||
| 6 | PDF generation stays within targetperformance | Render 1000 PDFs, measure | p95 ≤ 5 seconds |
Dependencies:
- Azure Container Apps auto-scaling configuration
- Application Insights performance monitoring
- Load testing tool (NBomber, k6, or JMeter)
- Blob storage premium tier for high IOPS
Risks & Mitigation (Nordic Peak Season):
| Risk | Likelihood | Impact | Mitigation Strategy | Owner |
|---|---|---|---|---|
| Heating season peaks exceed capacity (Oct-Mar) | MEDIUM | HIGH | - Historical data analysis for peak prediction - Pre-warm workers 1st/last week of month - Priority queue for SLA customers - Customer communication: off-peak scheduling - Capacity planning review quarterly | Operations Manager |
| Template complexity slows rendering | MEDIUM | HIGH | - Template performance guidelines - POC testing with customer templates - Recommend simple templates - Compiled template caching - Parallel rendering for 32 items | Technical Architect |
| Playwright memory issues at scale | HIGH | HIGH | - Semaphore: max 10 concurrent PDFs - Worker memory limit: 2GB - Browser instance pooling - Monitor memory usage - Scale horizontally (more workers) | Technical Architect |
| PostgreSQL connection exhaustion | MEDIUM | MEDIUM | - Connection pooling (max 50 per service) - Monitor active connections - Timeout settings (30 seconds) - Consider read replicas for heavy queries | Technical Architect |
4.2 NFR-002: Scalability & Auto-Scaling
Requirement: The system shall scale horizontally without manual intervention to handle peak loads during Nordic heating season (October-March) and monthly invoice cycles.
Priority: HIGH
Scaling Configuration:
| Component | Min Instances | Max Instances | Trigger Metric | Threshold | Scale Up Time | Scale Down Time |
|---|---|---|---|---|---|---|
| CoreApiService | 5 | 20 | CPU Utilization OR Request Rate | 70% OR 1000 RPS | 2 minutes | 10 minutes |
| ParserService | 2 | 10 | Queue Length (batch-upload-queue) | Length > 0 | 1 minute | 5 minutes |
| DocumentGenerator | 2 | 100 | Queue Length (batch-items-queue) | Length > 32 | 1 minute | 5 minutes |
| EmailService | 5 | 50 | Queue Length (email-queue) | Length > 50 | 1 minute | 5 minutes |
| PostalService | 1 | 3 | Scheduled (not queue-based) | 12:00, 20:00 CET | N/A | After completion |
Peak Load Capacity:
Normal Load (non-heating season, mid-month):
- 333K invoices/day average
- 5-10 concurrent batches
- Worker instances: 10-20 total
Peak Load (heating season, first/last week):
- 2.2M invoices/week (95% of monthly volume)
- 314K invoices/day
- 20+ concurrent batches
- Worker instances: 80-100 total
Scaling Calculation:
Peak Day: 314,000 invoices
Processing time per invoice: 10 seconds (parse + render + PDF + deliver)
Total processing time: 314,000 × 10s = 872 hours
Target completion: 8 hours
Required workers: 872 / 8 = 109 workers
With 32 items/worker: 109 × 32 = 3,488 items processing simultaneously
...
Acceptance Criteria:
| Criterion | Validation Method | Target |
|---|---|---|
| Auto-scaling triggered on queue depth | Monitor scaling events | Scales within 2 min |
| Scaling up completes within 2 minutes | Measure from trigger to ready | ≤ 2 minutes |
| Scaling down after 10 min low load | Monitor scale-down timing | ≥ 10 minutes |
| Performance maintained during scaling | Monitor API latency during scale events | No degradation |
| No message loss during scaling | Count messages before/after | 100% preserved |
| Pre-warming for known peaks | Schedule scale-up 1st/last week | Workers ready |
| Max 100 DocumentGenerator instances | Verify max instance count | ≤ 100 |
Pre-Warming Strategy (Heating Season):
Monthly Schedule:
- Day 1-7: Pre-warm to 50 instances at 00:00
- Day 8-23: Scale based on queue (2-20 instances)
- Day 24-31: Pre-warm to 50 instances at 00:00
Heating Season (Oct-Mar): Double pre-warm levels
- Day 1-7: Pre-warm to 80 instances
- Day 24-31: Pre-warm to 80 instances
...
Dependencies:
- Azure Container Apps with KEDA (Kubernetes Event-Driven Autoscaling)
- Queue depth monitoring
- Historical load data for pre-warming schedule
Risks & Mitigation:
| Risk | Likelihood | Impact | Mitigation Strategy | Owner |
|---|---|---|---|---|
| Scale-up too slow for sudden spike | MEDIUM | HIGH | - Pre-warm during known peaks - Keep min instances higher during peak season - Queue backpressure (503) if overloaded - Customer scheduling guidance | Operations Manager |
| Azure Container Apps 100-instance limit | LOW | HIGH | - Priority queue for SLA customers - Queue backpressure to throttle intake - Consider split by organization - Plan for Phase 2: dedicated worker pools | Technical Architect |
| Cost escalation during sustained peaks | MEDIUM | MEDIUM | - Cost alerts at thresholds - Auto-scale down aggressively - Reserved instances for base load - Monitor cost per invoice | Finance Controller |
4.3 NFR-003: Availability & Reliability (Nordic 24/7 Operations)
Requirement: The system shall maintain 99.9% uptime with automatic failover, multi-region deployment, and recovery procedures to support Nordic utilities' 24/7 invoice delivery operations.
Priority: HIGH
Availability Targets:
| Metric | Target | Allowed Downtime | Measurement | Consequences of Breach |
|---|---|---|---|---|
| System Uptime | 99.9% | 43 min/month | Azure Monitor | SLA credit to customers |
| Batch Success Rate | > 99.5% | 50 failures per 10K | Processing logs | Investigation required |
| Delivery Success Rate | > 98% | 200 failures per 10K | Delivery tracking | Alert to organization |
| API Availability | 99.9% | 43 min/month | Health check monitoring | Incident escalation |
| MTTR (Mean Time To Recovery) | < 30 minutes | N/A | Incident timestamps | Process improvement |
| MTBF (Mean Time Between Failures) | > 720 hours (30 days) | N/A | Incident tracking | Root cause analysis |
Multi-Region Deployment:
Primary Region: West Europe (Azure westeurope)
- Sweden: Primary processing
- Denmark: Primary processing
Secondary Region: North Europe (Azure northeurope)
- Norway: Primary processing
- Finland: Primary processing
- Failover for Sweden/Denmark
Traffic Routing:
- Azure Traffic Manager with Performance routing
- Health check: /health endpoint every 30 seconds
- Auto-failover on 3 consecutive failed health checks
- Failover time: < 2 minutes
...
Recovery Time Objectives:
| Scenario | RTO (Recovery Time) | RPO (Data Loss) | Recovery Method | Responsible Team |
|---|---|---|---|---|
| Worker Instance Crash | < 5 minutes | 0 (idempotent) | Automatic queue retry | Automatic |
| Database Failure | < 15 minutes | < 5 minutes | Auto-failover to read replica | Automatic + Ops verification |
| Primary Region Failure | < 30 minutes | < 15 minutes | Traffic Manager failover to secondary region | Ops Manager |
| Blob Storage Corruption | < 1 hour | < 1 hour | Restore from blob version/snapshot | Ops Team |
| Queue Service Outage | < 15 minutes | 0 (messages preserved) | Wait for Azure recovery, messages retained | Ops Manager |
| SendGrid Complete Outage | < 2 hours | 0 (fallback to postal) | Route all email invoices to postal queue | Ops Team |
| 21G SFTP Unavailable | < 4 hours | 0 (retry scheduled) | Retry at next scheduled time (12:00/20:00) | Ops Team |
Backup & Recovery Strategy:
Blob Storage:
Replication: Geo-Redundant Storage (GRS) - Primary: West Europe - Secondary: North Europe - Automatic replication Soft Delete: 7 days retention - Recover accidentally deleted blobs within 7 days Blob Versioning: 30 days retention - Previous versions accessible - Rollback capability Point-in-Time Restore: Not needed (blob versioning sufficient)
...
PostgreSQL:
Backup Schedule: Daily automated backups Retention: 35 days Backup Window: 02:00-04:00 CET (low traffic period) Point-in-Time Restore: 7 days Geo-Redundant: Enabled Read Replica: North Europe (for failover)
...
Acceptance Criteria:
| Criterion | Validation Method | Target |
|---|---|---|
| Multi-region deployment operational | Verify services in both regions | Both regions active |
| Traffic Manager routes to healthy region | Simulate West Europe failure | Routes to North Europe |
| Database auto-failover tested | Simulate primary DB failure | Failover < 15 min |
| Blob geo-replication verified | Write to primary, read from secondary | Data replicated |
| Health checks on all services | GET /health on all endpoints | All return 200 |
| Automated incident alerts configured | Simulate service failure | Alert received within 5 min |
| Worker auto-restart on crash | Kill worker process | New instance starts |
| Queue message retry tested | Simulate worker crash mid-processing | Message reprocessed |
| Disaster recovery drill quarterly | Simulate complete region loss | Recovery within RTO |
| Backup restoration tested monthly | Restore database from backup | Successful restore |
Dependencies:
- Azure Traffic Manager configuration
- Multi-region resource deployment
- Database replication setup
- Automated failover testing procedures
- Incident response runbook
Risks & Mitigation (Nordic Context):
| Risk | Likelihood | Impact | Mitigation Strategy | Owner |
|---|---|---|---|---|
| Both Azure regions fail simultaneously | VERY LOW | CRITICAL | - Extremely rare (Azure multi-region SLA 99.99%) - Accept risk (probability vs cost of 3rd region) - Communication plan for extended outage - Manual failover to Azure Germany (emergency) | Executive Sponsor |
| Network partition between regions | LOW | HIGH | - Each region operates independently - Eventual consistency acceptable - Manual reconciliation if partition >1 hour - Traffic Manager handles routing | Technical Architect |
| Database failover causes brief downtime | LOW | MEDIUM | - Accept 1-2 minutes downtime during failover - API returns 503 with Retry-After - Queue-based processing unaffected - Monitor failover duration | Operations Manager |
| Swedish winter storms affect connectivity | LOW | MEDIUM | - Azure datacenter redundancy within region - Monitor Azure status dashboard - Communication plan for customers - No physical office connectivity required | Operations Manager |
4.4 NFR-004: Security Requirements
Requirement: The system shall implement comprehensive security controls including OAuth 2.0 authentication, role-based access control, encryption, audit logging, and protection against OWASP Top 10 vulnerabilities.
Priority: CRITICAL
4.4.1 Authentication & Authorization
OAuth 2.0 Implementation:
Grant Type: Client Credentials Flow (machine-to-machine)
Token Provider: Microsoft Entra ID
Token Lifetime: 1 hour
Refresh Token: 90 days
Token Format: JWT (JSON Web Token)
Algorithm: RS256 (RSA signature with SHA-256)
...
Required Claims in JWT:
{ "aud": "api://eg-flow-api", "iss": "https://login.microsoftonline.com/{tenant}/v2.0", "sub": "user-object-id", "roles": ["Batch.Operator"], "organization_id": "123e4567-e89b-12d3-a456-426614174000", "exp": 1700226000, "nbf": 1700222400 }
...
NFR-002: Scalability & Auto-Scaling
Priority: HIGH
Owner: Technical Architect
Scaling Configuration
| Component | Min | Max | Trigger | Threshold | Scale Up | Scale Down |
|---|---|---|---|---|---|---|
| CoreApiService | 5 | 20 | CPU OR Request Rate | 70% OR 1000 RPS | 2 min | 10 min |
| ParserService | 2 | 10 | Queue Length | >0 | 1 min | 5 min |
| DocumentGenerator | 2 | 100 | Queue Length | >32 | 1 min | 5 min |
| EmailService | 5 | 50 | Queue Length | >50 | 1 min | 5 min |
| PostalService | 1 | 3 | Scheduled | 12:00, 20:00 CET | N/A | After completion |
Peak Load Capacity
Normal Load (non-heating, mid-month):
- 333K invoices/day average
- 5-10 concurrent batches
- Worker instances: 10-20 total
Peak Load (heating season, first/last week):
- 2.2M invoices/week
- 314K invoices/day
- 20+ concurrent batches
- Worker instances: 80-100 total
Pre-Warming Strategy (Heating Season)
Monthly Schedule:
- Day 1-7: Pre-warm to 50 instances at 00:00
- Day 8-23: Scale based on queue (2-20 instances)
- Day 24-31: Pre-warm to 50 instances at 00:00
Heating Season (Oct-Mar): Double levels
- Day 1-7: Pre-warm to 80 instances
- Day 24-31: Pre-warm to 80 instances
...
NFR-003: Availability & Reliability
Priority: HIGH
Owner: Technical Architect
Availability Targets
| Metric | Target | Allowed Downtime | Measurement |
|---|---|---|---|
| System Uptime | 99.9% | 43 min/month | Azure Monitor |
| Batch Success Rate | >99.5% | 50 failures per 10K | Processing logs |
| Delivery Success Rate | >98% | 200 failures per 10K | Delivery tracking |
| API Availability | 99.9% | 43 min/month | Health checks |
| MTTR | <30 minutes | N/A | Incident timestamps |
| MTBF | >720 hours | N/A | Incident tracking |
Multi-Region Deployment
Primary Region: West Europe (Sweden, Denmark)
Secondary Region: North Europe (Norway, Finland)
Traffic Routing:
- Azure Traffic Manager (Performance routing)
- Health check: /health every 30 seconds
- Auto-failover: 3 consecutive failures
- Failover time: <2 minutes
...
NFR-004: Security Requirements
Priority: CRITICAL
Owner: Technical Architect
Authentication & Authorization
OAuth 2.0:
- Grant Type: Client Credentials
- Token Provider: Microsoft Entra ID
- Token Lifetime: 1 hour
- Algorithm: RS256
Roles:
- Super Admin (global)
- Organization Admin (single org)
- Template Admin (single org)
- Batch Operator (single org)
- Read-Only User (single org)
- API Client (single org)
Encryption
In Transit:
- TLS 1.3 minimum
- HSTS enabled
At Rest:
- Blob Storage: AES-256
- PostgreSQL: AES-256
- Backups: AES-256
...
NFR-005: Data Retention
Priority: HIGH
Owner: Legal/Compliance
| Data Type | Retention | Storage Tier Transitions |
|---|---|---|
| Invoices (PDF/HTML/JSON) | 7 years | Day 0-365: Hot Day 366-2555: Cool Day 2556+: Archive |
| Batch Source (XML) | 90 days | Day 0-30: Hot Day 31-90: Cool Day 91+: Delete |
| Audit Logs | 7 years | Year 0-1: PostgreSQL Year 1-7: Blob (compressed) |
| Application Logs | 90 days | Application Insights |
...
Approval Section
Stakeholder Sign-Off
| Stakeholder Role | Name | Signature | Date | Status |
|---|---|---|---|---|
| Product Owner | ☐ PENDING | |||
| Technical Architect | ☐ PENDING |
Approval Criteria
- All CRITICAL requirements reviewed and accepted
- All HIGH requirements reviewed and accepted
- All dependencies identified and acknowledged
- All risks reviewed with mitigation strategies
- All acceptance criteria defined and measurable
- Budget and timeline implications understood
- Resource allocation confirmed
- Compliance requirements validated (GDPR, Bokföringslagen)
Change Control
Any changes to approved CRITICAL or HIGH priority requirements must follow the change control process:
- Document proposed change in Jira (tag with egflow version)
- Impact assessment (scope, timeline, cost)
- Re-approval by Product Owner and Technical Architect
- Update this document with version increment
- Communicate changes to development team
- Update affected Jira issues with new fixVersion
Version History
| Version | Date | Author | Changes | Release Target |
|---|---|---|---|---|
| 1.0 | 2025-11-20 | Product Owner | Initial draft | egflow-1.0.0 |
| 1.1 | 2025-11-21 | Product Owner | Added FR-003 details, updated acceptance criteria | egflow-1.0.0 |
| 1.2 | 2025-11-27 | Product Owner | Updated versioning strategy to match Gasell model | egflow-1.0.0 "Corny Flamingo" |
Role Definitions & Permissions:
| Role | Scope | Permissions | Use Case |
|---|---|---|---|
| Super Admin | Global (all organizations) | Full CRUD on all resources, cross-org visibility | EG internal support team |
| Organization Admin | Single organization | Manage org users, configure settings, view all batches | Utility IT manager |
| Template Admin | Single organization | Create/edit templates, manage template versions | Utility design team |
| Batch Operator | Single organization | Upload batches, start processing, view status | Utility billing team |
| Read-Only User | Single organization | View batches, download invoices, view reports | Utility customer service |
| API Client | Single organization | Programmatic batch upload and status queries | Billing system integration |
Acceptance Criteria:
| Criterion | Validation Method | Target |
|---|---|---|
| OAuth 2.0 token required for all endpoints (except /health) | Call API without token | 401 Unauthorized |
| JWT token validated (signature, expiration, audience) | Tampered token, expired token | 401 Unauthorized |
| Refresh tokens work for 90 days | Use refresh token after 30 days | New access token issued |
| All 6 roles implemented in PostgreSQL | Query roles table | 6 roles present |
| Users can only access their organization | User A calls Org B endpoint | 403 Forbidden |
| All actions logged to audit_logs table | Perform action, query audit_logs | Entry created |
| API authentication middleware on all routes | Attempt bypass | All protected |
| MFA enforced for Super Admin | Login as Super Admin | MFA challenge |
| MFA enforced for Org Admin | Login as Org Admin | MFA challenge |
| Failed logins logged | 3 failed login attempts | 3 entries in audit_logs |
| Account lockout after 5 failed attempts | 6 failed login attempts | 15-minute lockout |
| API key rotation every 90 days | Check Key Vault secret age | Alert at 80 days |
4.4.2 Data Protection
Encryption Standards:
In Transit:
- TLS 1.3 minimum (TLS 1.2 acceptable)
- Cipher suites: AES-256-GCM, ChaCha20-Poly1305
- Certificate: Wildcard cert for *.egflow.com
- HSTS: max-age=31536000; includeSubDomains
At Rest:
- Azure Blob Storage: AES-256 (Microsoft-managed keys)
- PostgreSQL: AES-256 (Microsoft-managed keys)
- Backups: AES-256 encryption
- Customer-managed keys (CMK): Phase 2 option
Sensitive Data Fields (extra protection):
- Personnummer: Encrypted column in database (if stored)
- API keys: Azure Key Vault only
- Email passwords: Never stored
- Customer addresses: Standard blob encryption sufficient
...
Acceptance Criteria:
| Criterion | Validation Method | Target |
|---|---|---|
| All API traffic over HTTPS | Attempt HTTP request | Redirect to HTTPS or reject |
| TLS 1.3 or 1.2 enforced | Check TLS version in traffic | TLS ≥ 1.2 |
| Data encrypted at rest (blob) | Verify Azure encryption settings | Enabled |
| Data encrypted at rest (PostgreSQL) | Verify DB encryption | Enabled |
| Secrets in Azure Key Vault only | Code scan for hardcoded secrets | Zero secrets in code |
| No credentials in source control | Git history scan | Zero credentials |
| Database connections use managed identity | Check connection strings | No passwords |
| Personnummer not in URLs | URL pattern analysis | No personnummer patterns |
| Personnummer not in logs | Log analysis | No personnummer found |
4.4.3 Application Security (OWASP Top 10)
Security Measures:
| OWASP Risk | Mitigation | Validation |
|---|---|---|
| A01: Broken Access Control | Organization middleware, RBAC enforcement | Penetration testing |
| A02: Cryptographic Failures | TLS 1.3, AES-256, Key Vault | Security scan |
| A03: Injection | Parameterized queries, input validation | SQL injection testing |
| A04: Insecure Design | Threat modeling, security review | Architecture review |
| A05: Security Misconfiguration | Azure security baseline, CIS benchmarks | Configuration audit |
| A06: Vulnerable Components | Dependabot, automated scanning | Weekly scan |
| A07: Authentication Failures | OAuth 2.0, MFA, rate limiting | Penetration testing |
| A08: Software/Data Integrity | Code signing, SRI, checksums | Build verification |
| A09: Logging Failures | Comprehensive audit logging | Log completeness review |
| A10: SSRF | URL validation, allowlist | Security testing |
Input Validation:
// Example: Batch upload validation with FluentValidation public class BatchUploadValidator : AbstractValidator<BatchUploadRequest> { public BatchUploadValidator() { RuleFor(x => x.File) .NotNull().WithMessage("File is required") .Must(BeValidXml).WithMessage("File must be valid XML") .Must(BeLessThan100MB).WithMessage("File must be less than 100MB"); RuleFor(x => x.Metadata.BatchName) .NotEmpty().WithMessage("Batch name is required") .Length(1, 255).WithMessage("Batch name must be 1-255 characters") .Must(NotContainPathSeparators).WithMessage("Batch name cannot contain / or \\") .Must(NoSQLInjectionPatterns).WithMessage("Invalid characters in batch name"); RuleFor(x => x.Metadata.Priority) .Must(x => x == "normal" || x == "high") .WithMessage("Priority must be 'normal' or 'high'"); } private bool NoSQLInjectionPatterns(string input) { var sqlPatterns = new[] { "--", "/*", "*/", "xp_", "sp_", "';", "\";" }; return !sqlPatterns.Any(p => input.Contains(p, StringComparison.OrdinalIgnoreCase)); } }
...
Acceptance Criteria:
| Criterion | Validation Method | Target |
|---|---|---|
| Input validation on all API endpoints | Send malicious input | Rejected with error |
| SQL injection prevented | Attempt SQL injection in batch name | Sanitized/rejected |
| XSS prevented in templates | Inject script tags in template | Sanitized on render |
| XML external entity (XXE) attack prevented | Upload XXE payload | Parsing rejects |
| Billion laughs attack prevented | Upload billion laughs XML | Parsing rejects/times out safely |
| File upload size enforced | Upload 101MB file | Rejected at API gateway |
| Rate limiting prevents abuse | 1000 rapid API calls | 429 after limit |
| CSRF protection (future web UI) | Attempt CSRF attack | Blocked by token |
| Dependency vulnerabilities scanned weekly | Run Dependabot | Alerts for high/critical |
| Security headers present | Check HTTP response | X-Frame-Options, CSP, etc. |
4.4.4 Network Security
Acceptance Criteria:
| Criterion | Status | Phase |
|---|---|---|
| DDoS protection enabled (Azure basic) | ✅ Included | Phase 1 |
| IP whitelisting support for API clients | ✅ Optional feature | Phase 1 |
| VNet integration for Container Apps | ⚠️ Phase 2 | Phase 2 |
| Private endpoints for Blob Storage | ⚠️ Phase 2 | Phase 2 |
| Network Security Groups (NSGs) | ⚠️ Phase 2 | Phase 2 |
| Azure Firewall for egress filtering | ⚠️ Phase 2 | Phase 2 |
Dependencies:
- FluentValidation library
- OWASP dependency check tools
- Penetration testing (external vendor)
- Security code review process
Risks & Mitigation (Nordic/EU Security Context):
| Risk | Likelihood | Impact | Mitigation Strategy | Owner |
|---|---|---|---|---|
| NIS2 Directive compliance (EU critical infrastructure) | MEDIUM | CRITICAL | - Energy sector falls under NIS2 - Incident reporting procedures (24h to authorities) - Security measures documentation - Annual security audit - CISO designated | Legal/Compliance |
| Swedish Säkerhetspolisen (SÄPO) requirements | LOW | HIGH | - Enhanced security for critical infrastructure - Incident reporting to MSB (Swedish Civil Contingencies) - Employee background checks for production access - Security clearance for key personnel | Security Officer |
| API key theft/leakage | MEDIUM | HIGH | - Rotate keys every 90 days - Monitor for leaked keys (GitHub scanning) - Revoke compromised keys immediately - API key hashing in database - Never log full API keys | Security Officer |
| Insider threat (privileged access abuse) | LOW | CRITICAL | - Least privilege principle - All actions audited - Regular access reviews - Separation of duties - Anomaly detection in audit logs | Security Officer |
| Third-party vendor breach (SendGrid, 21G) | LOW | HIGH | - Data Processing Agreements (DPAs) signed - Regular vendor security assessments - Minimal data sharing - Encryption in transit to vendors - Vendor breach response plan | Legal/Compliance |
4.5 NFR-005: Data Retention & Lifecycle Management
Requirement: The system shall manage data retention according to Swedish accounting law (7-year invoice retention) with automated lifecycle policies for cost optimization.
Priority: HIGH
Retention Policies:
| Data Type | Legal Requirement | Retention Period | Storage Tier Transition | Disposal Method |
|---|---|---|---|---|
| Invoices (PDF/HTML/JSON) | Bokföringslagen (Swedish Accounting Act) | 7 years from fiscal year end | Day 0-365: Hot Day 366-2555: Cool Day 2556+: Archive | Permanent deletion after 7 years |
| Batch Source Files (XML) | None (internal processing) | 90 days | Day 0-30: Hot Day 31-90: Cool Day 91+: Delete | Automatic deletion |
| Batch Metadata JSON | Audit trail | 90 days | Day 0-90: Hot Day 91+: Delete | Automatic deletion |
| Audit Logs (PostgreSQL) | GDPR, Swedish law | 7 years | Year 0-1: PostgreSQL Year 1-7: Blob (compressed) | Deletion after 7 years |
| Application Logs | Operational | 90 days | Application Insights | Automatic deletion |
| Templates | Business continuity | Indefinite (archived versions) | Hot (active) Cool (archived) | Never deleted |
| Organization Config | Business continuity | Indefinite | Hot | Never deleted (updated in place) |
Azure Blob Lifecycle Policy:
{ "rules": [ { "enabled": true, "name": "invoice-lifecycle", "type": "Lifecycle", "definition": { "filters": { "blobTypes": ["blockBlob"], "prefixMatch": ["invoices-"] }, "actions": { "baseBlob": { "tierToCool": { "daysAfterModificationGreaterThan": 365 }, "tierToArchive": { "daysAfterModificationGreaterThan": 2555 }, "delete": { "daysAfterModificationGreaterThan": 2920 } } } } }, { "enabled": true, "name": "batch-source-cleanup", "type": "Lifecycle", "definition": { "filters": { "blobTypes": ["blockBlob"], "prefixMatch": ["batches-"] }, "actions": { "baseBlob": { "tierToCool": { "daysAfterModificationGreaterThan": 30 }, "delete": { "daysAfterModificationGreaterThan": 90 } } } } } ] }
...
Storage Growth Projection:
Assumptions:
- 10M invoices/month
- 85KB per invoice (50KB PDF + 30KB HTML + 5KB JSON)
- 7-year retention
Growth Over Time:
| Year | New Data/Month | Cumulative Total | Primary Storage Tier | Secondary Tier |
|---|---|---|---|---|
| Year 1 | 850 GB | 10.2 TB | Hot (10.2 TB) | - |
| Year 2 | 850 GB | 20.4 TB | Hot (10.2 TB) | Cool (10.2 TB) |
| Year 3 | 850 GB | 30.6 TB | Hot (10.2 TB) | Cool (20.4 TB) |
| Year 7 | 850 GB | 71.4 TB | Hot (10.2 TB) | Cool (10.2 TB), Archive (51 TB) |
Storage Tier Pricing Impact:
With lifecycle policies (Hot → Cool → Archive):
- Year 1-2: Manageable with hot storage
- Year 3-7: Significant savings with tiering (estimated 85% reduction vs all-hot)
Acceptance Criteria:
| Criterion | Validation Method | Target |
|---|---|---|
| Automated lifecycle policies configured | Check Azure policy | Policies active |
| Data transitions to Cool after 1 year | Verify tier of 13-month-old invoice | Cool tier |
| Data transitions to Archive after 7 years | Verify tier of 7-year-old invoice | Archive tier |
| 7-year invoice retention enforced | Attempt to access 8-year-old invoice | Deleted (404) |
| Old batch files deleted after 90 days | Check for 91-day-old batch file | Deleted (404) |
| Retention policy exceptions supported | Tag invoice with legal hold | Not deleted despite age |
| Legal hold prevents deletion | Set legal hold, verify no deletion | Invoice retained |
| Data restoration from Archive within 24h | Request archived invoice | Retrieved within 24h |
| Templates never automatically deleted | Check template age | Old templates present (archived) |
Legal Hold Functionality:
{ "invoiceId": "uuid", "legalHold": { "enabled": true, "reason": "Customer dispute - case #12345", "appliedBy": "user-uuid", "appliedAt": "2025-11-21T10:00:00Z", "expiresAt": null } }
...
Dependencies:
- Azure Blob lifecycle management
- Legal hold tagging mechanism
- Retention compliance monitoring
- Archive tier data retrieval procedures
Risks & Mitigation (Swedish Legal Context):
| Risk | Likelihood | Impact | Mitigation Strategy | Owner |
|---|---|---|---|---|
| Bokföringslagen (Accounting Act) non-compliance | LOW | CRITICAL | - 7-year retention strictly enforced - Legal opinion obtained - Retention policy reviewed by auditor - Automated compliance reporting - Skatteverket (Tax Agency) audit trail | Legal/Compliance |
| Premature invoice deletion | LOW | HIGH | - Lifecycle policy testing in staging - Deletion logging and alerts - Soft delete (7-day recovery) - Annual retention audit | Operations Manager |
| Storage costs exceed budget | MEDIUM | MEDIUM | - Lifecycle policies reduce costs 85% - Cost monitoring and alerts - Quarterly cost review - Consider compression for PDFs | Finance Controller |
| Archive retrieval SLA breach | LOW | MEDIUM | - Document 24-hour SLA for archive - Test archive retrieval monthly - Maintain critical invoices in Cool (not Archive) | Operations Manager |
4.6 NFR-006: Monitoring, Logging & Observability
Requirement: The system shall provide comprehensive monitoring through Application Insights, structured logging with Serilog, real-time dashboards with 5-minute refresh, and automated alerting with escalation procedures.
Priority: HIGH
4.6.1 Structured Logging Standards
Serilog Configuration:
Log.Logger = new LoggerConfiguration() .MinimumLevel.Information() .MinimumLevel.Override("Microsoft", LogEventLevel.Warning) .MinimumLevel.Override("Microsoft.AspNetCore", LogEventLevel.Warning) .Enrich.FromLogContext() .Enrich.WithProperty("Application", "EGU.Flow") .Enrich.WithProperty("Environment", environment) .Enrich.WithProperty("Region", azureRegion) .Enrich.WithMachineName() .Enrich.WithThreadId() .WriteTo.ApplicationInsights( connectionString, TelemetryConverter.Traces, LogEventLevel.Information) .WriteTo.Console(new CompactJsonFormatter()) .CreateLogger();
...
Log Entry Structure:
{ "timestamp": "2025-11-21T10:30:00.123Z", "level": "Information", "messageTemplate": "Batch {BatchId} processing started for organization {OrganizationId}. Items: {ItemCount}", "message": "Batch 550e8400... processing started for organization 123e4567.... Items: 5000", "properties": { "BatchId": "550e8400-e29b-41d4-a716-446655440000", "OrganizationId": "123e4567-e89b-12d3-a456-426614174000", "ItemCount": 5000, "CorrelationId": "corr-abc-123", "Application": "EGU.Flow", "Environment": "Production", "Region": "westeurope", "MachineName": "worker-001" } }
...
PII Masking Rules:
NEVER log:
- Personnummer (Swedish social security numbers)
- Full customer names (log customer ID only)
- Email addresses (log domain only: user@***@example.se)
- Phone numbers (log prefix only: +4670****)
- Street addresses (log city only)
- Bank account numbers
- API keys or tokens
SAFE to log:
- Organization IDs (UUIDs)
- Batch IDs (UUIDs)
- Invoice IDs (UUIDs)
- Invoice numbers (business references)
- Processing statistics
- Error codes and messages
- Performance metrics
...
4.6.2 Application Insights Dashboards
Dashboard 1: Operations (Real-Time)
Refresh: Every 5 minutes
Metrics:
- Active batches count (gauge)
- Queue depths (4 queues, time series chart)
- Worker instance counts per service (bar chart)
- Processing throughput (items/minute, time series)
- Error rate (percentage, last hour vs last 24h)
- System health status (green/yellow/red indicators)
- Current API request rate (RPS)
Dashboard 2: Performance
Refresh: Every 5 minutes
Metrics:
- API response times (p50, p95, p99 - line chart)
- Batch processing duration (histogram)
- PDF generation times (p50, p95, p99)
- Handlebars rendering times (p50, p95, p99)
- Delivery latency by channel (email vs postal)
- Worker CPU/memory utilization
- Database query performance (slow query tracking)
Dashboard 3: Business
Refresh: Hourly
Metrics:
- Invoices processed (today, this week, this month - counters)
- Delivery channel breakdown (pie chart: email/postal)
- Failed deliveries by reason (bar chart)
- Top 10 organizations by volume (bar chart)
- Processing trends (daily invoice count, 30-day chart)
- Monthly invoice volumes (seasonal view)
Dashboard 4: Vendor Formats
Refresh: Hourly
Metrics:
- Batches by vendor format (pie chart: GASEL/XELLENT/ZYNERGY)
- Parsing success rate by vendor (percentage gauges)
- Average batch size by vendor
- Parsing duration by vendor
- Validation errors by vendor format
4.6.3 Alert Rules & Escalation
Critical Alerts (5-minute evaluation window):
| Alert Name | Condition (Kusto Query) | Severity | Recipients | Escalation (after 15 min) |
|---|---|---|---|---|
| High Error Rate | traces | where severityLevel >= 3 | count > 50 | High | Ops team | Dev team + On-call |
| Queue Depth Critical | customMetrics | where name == 'Queue.Depth' and value > 10000 | High | Ops team | Product Owner |
| Worker Crash Spike | traces | where message contains 'Worker crashed' | count > 3 | Critical | Ops + Dev teams | CTO |
| Delivery Failure Rate | customMetrics | where name startswith 'Delivery' and value < 0.9 | Medium | Ops team | Customer success |
| API Response Degraded | requests | summarize p95=percentile(duration, 95) | where p95 > 1000 | Medium | Ops team | Technical Architect |
| Batch Processing Timeout | customMetrics | where name == 'Batch.Duration' and value > 120 | High | Ops team | Product Owner |
| Database Connection Errors | exceptions | where type contains 'Npgsql' | count > 10 | Critical | Ops + DBA | CTO |
| Blob Storage Throttling | exceptions | where message contains '503 Server Busy' | count > 20 | High | Ops team | Technical Architect |
| SendGrid Deliverability Drop | SendGrid webhook: bounceRate > 10% | High | Ops team | Email deliverability specialist |
| 21G SFTP Connection Failure | SFTP connection exceptions | High | Ops team | 21G account manager |
Alert Delivery:
- Primary: Email to ops team distribution list
- Secondary: SMS to on-call engineer
- Escalation: PagerDuty incident creation
- Integration: Create Jira ticket automatically
On-Call Rotation:
- European team: 24/7 coverage (Swedish, Danish time zones)
- Shift schedule: Week-long rotations
- Handoff procedure: Thursday 09:00 CET
Acceptance Criteria:
| Criterion | Validation Method | Target |
|---|---|---|
| Dashboards refresh every 5 minutes | Check dashboard timestamp | ≤ 5 min old |
| Data retained for 90 days | Check oldest data in dashboard | 90 days accessible |
| Dashboards accessible to authorized users | Login as different roles | Appropriate access |
| Critical alerts trigger within 5 min | Simulate high error rate | Alert within 5 min |
| Alert escalation after 15 min | Don't acknowledge alert | Escalation triggered |
| PII masked in logs | Search logs for personnummer regex | Zero matches |
| Correlation IDs trace requests | Follow request across services | Same ID throughout |
| Log retention 90 days | Check Application Insights retention | 90 days |
| Structured logging in JSON format | Parse log entries | Valid JSON |
Custom Metrics Tracked:
public class MetricsService { private readonly TelemetryClient _telemetry; public void TrackBatchProcessing(BatchMetadata batch) { _telemetry.TrackMetric("Batch.TotalItems", batch.Statistics.TotalItems, new Dictionary<string, string> { ["OrganizationId"] = batch.OrganizationId, ["VendorCode"] = batch.VendorInfo.VendorCode }); _telemetry.TrackMetric("Batch.Duration", (batch.Timestamps.CompletedAt - batch.Timestamps.StartedAt)?.TotalMinutes ?? 0); _telemetry.TrackMetric("Batch.SuccessRate", (double)batch.Statistics.SuccessfulItems / batch.Statistics.TotalItems * 100); } public void TrackDelivery(string channel, bool success, string organizationId) { _telemetry.TrackMetric($"Delivery.{channel}.Success", success ? 1 : 0, new Dictionary<string, string> { ["OrganizationId"] = organizationId, ["Channel"] = channel }); } public void TrackQueueDepth(string queueName, int depth) { _telemetry.TrackMetric("Queue.Depth", depth, new Dictionary<string, string> { ["QueueName"] = queueName }); } public void TrackVendorParsing(string vendorCode, bool success, long durationMs) { _telemetry.TrackMetric("Parser.Duration", durationMs, new Dictionary<string, string> { ["VendorCode"] = vendorCode, ["Success"] = success.ToString() }); } }
...
Kusto Queries for Common Operations:
Query 1: Failed Batches (Last 24 Hours)
traces | where timestamp > ago(24h) | where customDimensions.BatchId != "" | where severityLevel >= 3 | summarize ErrorCount = count(), FirstError = min(timestamp), LastError = max(timestamp) by BatchId = tostring(customDimensions.BatchId), OrganizationId = tostring(customDimensions.OrganizationId), ErrorMessage = message | order by ErrorCount desc | take 50
...
Query 2: Queue Depth Trending
customMetrics | where name == "Queue.Depth" | where timestamp > ago(24h) | extend QueueName = tostring(customDimensions.QueueName) | summarize AvgDepth = avg(value), MaxDepth = max(value), MinDepth = min(value) by QueueName, bin(timestamp, 5m) | render timechart
...
Query 3: Vendor Format Performance
customMetrics | where name == "Parser.Duration" | where timestamp > ago(7d) | extend VendorCode = tostring(customDimensions.VendorCode) | summarize p50 = percentile(value, 50), p95 = percentile(value, 95), p99 = percentile(value, 99), BatchCount = count() by VendorCode | order by p95 desc
...
Dependencies:
- Application Insights workspace (90-day retention)
- Serilog sinks for Application Insights and Console
- Alert action groups configured
- Dashboard permissions configured
- PagerDuty or similar on-call system
4.7 NFR-007: Disaster Recovery & Business Continuity
Requirement: The system shall have documented disaster recovery procedures with defined RTO/RPO targets, tested quarterly, to ensure business continuity for Nordic utility customers.
Priority: HIGH
Disaster Recovery Scenarios:
Scenario 1: Worker Instance Crash
Trigger: DocumentGenerator worker crashes during 32-item batch processing
Detection: Worker stops sending heartbeats, Azure Container Apps detects unhealthy instance
Recovery Procedure:
1. Azure Container Apps automatically starts new instance (2 min)
2. Queue message visibility timeout expires (5 min)
3. Message becomes visible in batch-items-queue again
4. New worker instance picks up message
5. Worker checks for already-processed invoices (idempotency)
6. Skips completed invoices, processes remaining items
7. Blob lease ensures no concurrent processing
...
RTO: < 5 minutes (automatic)
RPO: 0 (no data loss, idempotent operations)
Responsible: Automatic + Operations Manager (monitoring)
Scenario 2: PostgreSQL Database Failure
Trigger: Primary PostgreSQL instance becomes unresponsive
Detection: Health check failures, connection timeout errors in logs
Recovery Procedure:
1. Azure detects primary failure via health probes (30 seconds)
2. Auto-failover to read replica in secondary region (5 min)
3. DNS updated to point to new primary
4. Application reconnects automatically (connection retry logic)
5. Verify data integrity post-failover
6. Notify stakeholders of failover event
7. Investigate root cause
...
RTO: < 15 minutes
RPO: < 5 minutes (replication lag)
Responsible: Operations Manager (monitoring), DBA (verification)
Scenario 3: Azure Region Failure (West Europe)
Trigger: Complete West Europe region outage
Detection: Traffic Manager health checks fail for all West Europe endpoints
Recovery Procedure:
1. Traffic Manager detects 3 consecutive health check failures (90 seconds)
2. Traffic Manager routes all traffic to North Europe (2 min)
3. North Europe region activates read replica database as primary
4. North Europe workers process queues (messages replicated via GRS)
5. Verify system operational in North Europe
6. Communicate to customers about region failover
7. Monitor for West Europe recovery
8. Plan failback when West Europe restored
...
RTO: < 30 minutes
RPO: < 15 minutes (blob replication lag)
Responsible: Operations Manager (execution), CTO (decision), Communications (customer notification)
Scenario 4: Blob Storage Corruption
Trigger: Critical blob (organization config, template) becomes corrupted or accidentally deleted
Detection: Blob read errors, validation failures, user reports
Recovery Procedure:
1. Identify corrupted blob path and organization
2. Check soft delete (7-day retention):
- If within 7 days: Undelete blob immediately
3. If soft delete expired, check blob versions:
- Restore from previous version
4. If no versions, restore from geo-redundant copy:
- Access secondary region blob storage
- Copy to primary region
5. Verify restored blob integrity
6. Test with sample batch
7. Root cause analysis
...
RTO: < 1 hour
RPO: < 1 hour (version interval)
Responsible: Operations Manager (execution), Technical Architect (verification)
Scenario 5: Complete Data Loss (Catastrophic)
Trigger: Theoretical scenario - both regions and all backups lost
Detection: N/A (highly unlikely with Azure GRS)
Recovery Procedure:
1. Declare disaster
2. Restore PostgreSQL from geo-redundant backup (35-day retention)
3. Organizations re-upload batch files (source systems have copies)
4. Customers re-notified about invoices
5. Incident post-mortem and Azure investigation
...
RTO: < 4 hours
RPO: < 24 hours (last backup)
Responsible: CTO (declaration), All teams (execution)
Note: This scenario has probability < 0.001% given Azure GRS + geo-redundant backups + multi-region deployment.
Disaster Recovery Testing:
| Test Type | Frequency | Scope | Pass Criteria |
|---|---|---|---|
| Worker Crash Test | Monthly | Kill random worker mid-processing | Recovery < 5 min, no data loss |
| Database Failover Test | Quarterly | Force failover to replica | Recovery < 15 min, queries work |
| Region Failover Drill | Annually | Simulate West Europe outage | Recovery < 30 min, all services operational |
| Backup Restoration Test | Monthly | Restore PostgreSQL from backup | Successful restore, data integrity verified |
| Blob Undelete Test | Quarterly | Delete critical blob, restore | Successful recovery within 1 hour |
Acceptance Criteria:
| Criterion | Validation Method | Target |
|---|---|---|
| Multi-region deployment active | Verify services in both regions | Both regions operational |
| Traffic Manager failover tested | Simulate region failure | Failover < 2 min |
| Database auto-failover tested | Force primary DB failure | Failover < 15 min |
| Blob geo-replication verified | Write to primary, read from secondary | Data present |
| Disaster recovery procedures documented | Review runbook completeness | 100% complete |
| DR drill conducted quarterly | Check last drill date | Within 90 days |
| Backup restoration tested monthly | Check last restore test | Within 30 days |
| Recovery procedures automated where possible | Review manual steps | < 5 manual steps |
4.8 NFR-008: Data Consistency (Eventual Consistency Model)
Requirement: The system shall maintain data consistency using blob leases for exclusive access, ETag-based optimistic concurrency for metadata updates, and idempotent operations for safe retries.
Priority: HIGH
Consistency Guarantees:
Strong Consistency (Within Single Operation):
- Blob lease acquisition: One worker per 32-item batch
- PostgreSQL transactions: User/role operations ACID
- Queue message delivery: At-least-once delivery
Eventual Consistency (Across System):
- Batch statistics: Updated within 30 seconds
- Invoice status: Updated within 1 minute
- Dashboard metrics: Updated within 5 minutes
- Audit logs: Written asynchronously
Consistency Mechanisms:
1. Blob Lease for Exclusive Access:
// Ensures only one worker processes 32-item batch var lease = await AcquireBlobLeaseAsync( container: "{org}-batches-{year}", blob: "locks/{batch-id}/{message-id}.lock", duration: TimeSpan.FromMinutes(5)); try { // Process 32 invoices exclusively await ProcessBatchItemsAsync(message); } finally { await ReleaseBlobLeaseAsync(lease); }
...
2. ETag-Based Optimistic Concurrency:
// Prevents lost updates to batch metadata var download = await blobClient.DownloadContentAsync(); var etag = download.Value.Details.ETag; var metadata = JsonSerializer.Deserialize<BatchMetadata>(download.Value.Content); // Update metadata metadata.Statistics.ProcessedItems += 32; metadata.Metadata.UpdatedAt = DateTime.UtcNow; metadata.Metadata.Version++; // Upload with ETag condition await blobClient.UploadAsync(content, new BlobUploadOptions { Conditions = new BlobRequestConditions { IfMatch = etag } }); // Throws if ETag doesn't match (another worker updated)
...
3. Idempotent Operations:
// Safe to retry without duplicates public async Task ProcessInvoiceAsync(string invoiceId) { // Check if already processed var metadata = await TryGetInvoiceMetadataAsync(invoiceId); if (metadata?.Status == "delivered") { _logger.LogInformation("Invoice {InvoiceId} already processed, skipping", invoiceId); return; // Idempotent: safe to skip } // Process invoice (creates new blobs, doesn't update existing) await RenderInvoiceAsync(invoiceId); await GeneratePdfAsync(invoiceId); await EnqueueForDeliveryAsync(invoiceId); }
...
Acceptance Criteria:
| Criterion | Validation Method | Target |
|---|---|---|
| Blob leases prevent concurrent processing | Send same message to 2 workers | Only 1 processes |
| ETags prevent lost updates | 2 workers update same metadata | No lost updates |
| Retry operations are idempotent | Retry invoice processing 3x | Processed once only |
| No duplicate invoices generated | Crash during processing, retry | One PDF created |
| Concurrent batch updates handled | 10 workers update statistics | All updates applied |
| Race conditions prevented | Concurrent access testing | No race conditions |
| Data integrity after crash | Kill worker, verify data state | Consistent state |
Dependencies:
- Azure Blob Storage lease API
- ETag support in blob operations
- Retry logic with idempotency checks
- Worker coordination mechanisms
4.9 NFR-009: Multi-Region Data Residency (Nordic Compliance)
Requirement: The system shall enforce data residency requirements for Nordic countries, ensuring Swedish/Danish data stays in West Europe and Norwegian/Finnish data stays in North Europe, with no cross-border transfers except encrypted backups.
Priority: HIGH
Data Residency Rules:
| Country | Customer Base | Primary Region | Data Residency | Backup Region | Rationale |
|---|---|---|---|---|---|
| Sweden (SE) | ~10M population, largest market | West Europe | Enforced | North Europe (encrypted) | GDPR, Swedish Data Protection Law |
| Denmark (DK) | ~6M population | West Europe | Enforced | North Europe (encrypted) | GDPR, Danish data laws |
| Norway (NO) | ~5M population | North Europe | Enforced | West Europe (encrypted) | GDPR, Norwegian data laws, EEA regulations |
| Finland (FI) | ~5M population | North Europe | Enforced | West Europe (encrypted) | GDPR, Finnish data laws |
Traffic Routing Logic:
public class RegionRoutingService { public string GetProcessingRegion(string organizationId) { var org = await _orgService.GetOrganizationAsync(organizationId); // Route based on organization's country return org.CountryCode switch { "SE" => "westeurope", // Sweden → West Europe "DK" => "westeurope", // Denmark → West Europe "NO" => "northeurope", // Norway → North Europe "FI" => "northeurope", // Finland → North Europe _ => "westeurope" // Default: West Europe }; } public async Task<bool> ValidateDataResidencyAsync(string organizationId, string requestRegion) { var requiredRegion = GetProcessingRegion(organizationId); if (requestRegion != requiredRegion) { _logger.LogWarning( "Data residency violation attempted. Org: {OrgId}, Required: {Required}, Attempted: {Attempted}", organizationId, requiredRegion, requestRegion); return false; } return true; } }
...
Organization Configuration:
{ "organizationId": "uuid", "organizationCode": "VATTENFALL-SE", "countryCode": "SE", "dataResidency": { "primaryRegion": "westeurope", "allowedRegions": ["westeurope"], "backupRegions": ["northeurope"], "crossRegionProcessing": false, "crossRegionBackup": true } }
...
Acceptance Criteria:
| Criterion | Validation Method | Target |
|---|---|---|
| Swedish orgs process in West Europe | Verify blob container region | westeurope |
| Norwegian orgs process in North Europe | Verify blob container region | northeurope |
| Cross-region processing blocked | Attempt to process Swedish org in North Europe | Rejected |
| Cross-region backup allowed | Verify geo-redundant replication | Enabled |
| Latency < 100ms within region | API latency from Nordic countries | < 100ms |
| Automatic failover to secondary region | Simulate primary region failure | Failover works |
| Data residency config per organization | Update org config | Setting honored |
| Audit trail for cross-region access | Attempt cross-region, check logs | Attempt logged |
Dependencies:
- Azure Traffic Manager with geographic routing
- Blob storage geo-redundant replication
- Organization configuration enforcement
- Multi-region deployment automation
Risks & Mitigation (Nordic Legal Context):
| Risk | Likelihood | Impact | Mitigation Strategy | Owner |
|---|---|---|---|---|
| Schrems II implications (EU-US data transfer) | LOW | HIGH | - No US region usage - All data in EU (West/North Europe only) - Azure EU Data Boundary compliance - Standard Contractual Clauses with vendors | Legal/Compliance |
| Norwegian data sovereignty concerns | LOW | MEDIUM | - North Europe primary for Norwegian orgs - Option for Norway-only processing - No data transfer to other Nordics without consent - Compliance with Norwegian regulations | Legal/Compliance |
| Data residency audit by Datatilsynet (NO) or Datatilsynet (DK) | LOW | HIGH | - Documented data flows - Data residency configuration auditable - Logs prove compliance - Annual self-assessment | Legal/Compliance |
4.10 NFR-010: Maintainability & Code Quality
Requirement: The system shall be designed for long-term maintainability with clear code standards, comprehensive documentation, high test coverage, and adherence to .NET best practices.
Priority: MEDIUM
Code Quality Standards:
Project Structure (Updated per Oct 27 decision):
EGU.Flow/
├── EGU.Flow.AppHost/ # .NET Aspire orchestration
│ └── Program.cs
│
├── EGU.Flow.Core/ # Shared contracts, DTOs, interfaces
│ ├── DTOs/
│ │ ├── BatchUploadRequest.cs
│ │ ├── CanonicalInvoice.cs
│ │ └── DeliveryRequest.cs
│ ├── Enums/
│ │ ├── BatchStatus.cs
│ │ ├── DeliveryChannel.cs
│ │ └── VendorCode.cs
│ ├── Interfaces/
│ │ ├── IXmlParserService.cs
│ │ ├── ITemplateRenderingService.cs
│ │ └── IDeliveryService.cs
│ └── Common/
│ ├── Constants.cs
│ └── ErrorCodes.cs
│
├── EGU.Flow.Domain/ # Domain models and business logic
│ ├── Models/
│ │ ├── Organization.cs
│ │ ├── Batch.cs
│ │ ├── BatchItem.cs
│ │ ├── Template.cs
│ │ └── TemplateCategory.cs
│ └── DomainServices/
│ ├── VendorDetectionService.cs
│ └── DistributionRoutingService.cs
│
├── EGU.Flow.BusinessLogic/ # Business services
│ ├── Services/
│ │ ├── BatchService.cs
│ │ ├── OrganizationService.cs
│ │ ├── TemplateService.cs
│ │ └── SchemaRegistryService.cs
│ └── Mappers/
│ ├── GaselMapper.cs
│ ├── XellentMapper.cs
│ └── ZynergyMapper.cs
│
├── EGU.Flow.CoreApiService/ # ASP.NET Core REST API
│ ├── Controllers/
│ │ ├── BatchController.cs
│ │ ├── OrganizationController.cs
│ │ ├── TemplateController.cs
│ │ └── SchemaController.cs
│ ├── Middleware/
│ │ ├── OrganizationContextMiddleware.cs
│ │ ├── ErrorHandlingMiddleware.cs
│ │ └── RequestLoggingMiddleware.cs
│ └── Program.cs
│
├── EGU.Flow.ParserService/ # Console app: XML → JSON
│ ├── Workers/
│ │ └── BatchParserWorker.cs
│ ├── Parsers/
│ │ ├── GaselParser.cs
│ │ ├── XellentParser.cs
│ │ └── ZynergyParser.cs
│ └── Program.cs
│
├── EGU.Flow.DocumentGenerator/ # Console app: JSON → PDF
│ ├── Workers/
│ │ └── DocumentGeneratorWorker.cs
│ ├── Services/
│ │ ├── HandlebarsRenderingService.cs
│ │ └── PlaywrightPdfService.cs
│ └── Program.cs
│
├── EGU.Flow.EmailService/ # Console app: Email delivery
│ ├── Workers/
│ │ └── EmailDeliveryWorker.cs
│ └── Program.cs
│
├── EGU.Flow.PostalService/ # Console app: 21G integration
│ ├── Workers/
│ │ └── PostalBulkProcessor.cs
│ ├── Services/
│ │ ├── SftpService.cs
│ │ └── ZipArchiveService.cs
│ └── Program.cs
│
├── EGU.Flow.Web/ # Future: Blazor UI
│ └── (Phase 2)
│
└── EGU.Flow.Tests/ # Test projects
├── EGU.Flow.UnitTests/
├── EGU.Flow.IntegrationTests/
└── EGU.Flow.LoadTests/
...
Coding Standards:
- Style: Follow Microsoft C# coding conventions
- Naming: PascalCase for public members, camelCase for private
- Comments: XML documentation on all public APIs
- Async: Always use async/await, never .Result or .Wait()
- Logging: Structured logging with Serilog, correlation IDs
- Exceptions: Custom exception types, never swallow exceptions
- Null safety: Use nullable reference types (C# 12)
Acceptance Criteria:
| Criterion | Validation Method | Target |
|---|---|---|
| Code follows .NET conventions | EditorConfig + Roslyn analyzers | Zero warnings |
| XML documentation on public APIs | Documentation coverage report | 100% of public members |
| Unit test coverage | Code coverage report | > 70% |
| Integration test coverage | Test execution report | > 25% of critical paths |
| Swagger/OpenAPI for all endpoints | Verify Swagger UI | All endpoints documented |
| Correlation IDs in all logs | Trace request through system | Same ID across services |
| Health check endpoints present | GET /health on all services | All respond |
| Feature flags for gradual rollout | Verify feature flag configuration | Flags configurable |
| Database migration scripts versioned | Check migrations folder | Sequential numbering |
| Infrastructure as Code (Bicep) | All resources in Bicep templates | 100% infrastructure |
| No hardcoded values | Code scan | All config in appsettings/KeyVault |
Documentation Requirements:
| Document Type | Location | Update Frequency | Owner |
|---|---|---|---|
| API Documentation | Swagger UI at /swagger | Every API change | Dev Team |
| Architecture Decisions | Confluence ADR page | Per decision | Technical Architect |
| Deployment Procedures | Confluence + Git (docs/) | Per change | Ops Team |
| Operations Runbook | Confluence | Monthly review | Ops Manager |
| Disaster Recovery Plan | Confluence (restricted) | Quarterly | Ops Manager |
| GDPR Documentation | Confluence (restricted) | Annual review | Legal/Compliance |
| Test Data Generation Guide | Git (docs/) | Per update | QA Team |
Dependencies:
- EditorConfig file in repository
- Roslyn analyzer NuGet packages
- SonarQube or similar code quality tool
- Documentation review in PR process
4.11 NFR-011: Usability & Developer Experience
Requirement: The API shall be intuitive, well-documented, easy to integrate, with clear error messages, code samples, and Postman collections.
Priority: MEDIUM
API Design Principles:
RESTful Standards:
- Resource-based URLs:
/organizations/{id}/batches - HTTP verbs: GET (read), POST (create), PUT (update), DELETE (not used Phase 1)
- Status codes: 200 OK, 201 Created, 202 Accepted, 400 Bad Request, 401 Unauthorized, 403 Forbidden, 404 Not Found, 409 Conflict, 422 Unprocessable Entity, 429 Too Many Requests, 500 Internal Error, 503 Service Unavailable
- Consistent JSON response envelope
- Pagination for list endpoints
- Filtering and sorting support
Error Message Quality:
Bad Example:
{ "error": "Invalid input" }
...
Good Example:
{ "success": false, "errors": [ { "code": "INVALID_XML", "message": "XML file is not well-formed", "field": "file", "details": { "line": 142, "column": 23, "error": "Unexpected end tag: </Invoice>. Expected: </InvoiceHeader>", "suggestion": "Check that all opening tags have matching closing tags in the correct order", "documentationUrl": "https://docs.egflow.com/errors/INVALID_XML" } } ] }
...
Acceptance Criteria:
| Criterion | Validation Method | Target |
|---|---|---|
| RESTful design principles followed | API design review | All principles followed |
| Consistent request/response structure | Review all endpoints | Same envelope |
| Error messages include suggestions | Test error scenarios | Actionable guidance |
| Error messages include documentation links | Check error response | URLs present |
| Line/column numbers for XML errors | Upload invalid XML | Position info present |
| Comprehensive Swagger documentation | Review Swagger UI | All endpoints, examples |
| Code samples for common operations | Check documentation | C#, curl examples |
| Postman collection available | Import collection, run requests | All requests work |
| API versioning clear | Check URL structure | /v1/ in all paths |
| Deprecation warnings 6 months advance | Deprecate endpoint | Warning in response |
Postman Collection Contents:
EG Flow API Collection/
├── Authentication/
│ └── Get Access Token
├── Batch Management/
│ ├── Upload Batch
│ ├── Start Batch Processing
│ ├── Get Batch Status
│ ├── List Batch Items
│ └── Get Batch Item Details
├── Organization/
│ ├── Get Organization
│ └── Update Organization Config
├── Templates/
│ ├── List Templates
│ ├── Get Template
│ └── List Template Categories
└── Schema Management/
├── List Supported Formats
└── Validate XML
...
Dependencies:
- OpenAPI/Swagger generator
- Postman collection export
- Documentation website/portal
- Code sample generation
4.12 NFR-012: Internationalization (Nordic Languages)
Requirement: The system shall support Swedish language for invoices, email notifications, error messages, and UI elements, with foundation for future Norwegian, Danish, and Finnish support.
Priority: MEDIUM
Localization Requirements:
Phase 1: Swedish Only
- Invoice templates: Swedish text
- Email notifications: Swedish
- Error messages: Swedish (with English fallback in details)
- Date formats: YYYY-MM-DD (ISO 8601, universal)
- Number formats: Space as thousand separator, comma as decimal (Swedish standard)
- Example: 1 234 567,89 kr
- Currency: SEK (Swedish Krona)
- Time zone: CET/CEST (Europe/Stockholm)
Phase 2: Multi-Language (Future)
- Norwegian (Bokmål): For Norwegian utilities
- Danish: For Danish utilities
- Finnish: For Finnish utilities
- Language detection based on organization country
Swedish-Specific Formatting:
Dates:
- Invoice date: "2025-11-21" (ISO format, universal)
- Display: "21 november 2025" (Swedish long format)
Numbers:
- Amount: 1 234,56 (space separator, comma decimal)
- Percentage: 25,0% (comma decimal)
- Quantity: 1 234 (integer, space separator)
Currency:
- Symbol: "kr" or "SEK"
- Position: After amount "1 234,56 kr"
Swedish Terms:
- Faktura (Invoice)
- Förfallodatum (Due date)
- Mätpunkt (Metering point)
- Elförbrukning (Electricity consumption)
- Att betala (Amount to pay)
- Moms (VAT)
...
Handlebars Helpers for Swedish Formatting:
{{!-- Swedish number formatting --}} {{formatNumber amount decimals=2}} → "1 234,56" {{!-- Swedish currency --}} {{formatCurrency amount}} → "1 234,56 kr" {{!-- Swedish date --}} {{formatDate date format="long"}} → "21 november 2025" {{!-- Swedish percentage --}} {{formatPercent rate}} → "25,0%"
...
Acceptance Criteria:
| Criterion | Validation Method | Target |
|---|---|---|
| Invoice templates in Swedish | Review rendered PDF | Swedish text |
| Email notifications in Swedish | Receive test email | Swedish subject/body |
| Error messages in Swedish | Trigger various errors | Swedish messages |
| Numbers formatted Swedish style | Check invoice amounts | "1 234,56" format |
| Dates in ISO 8601 | Check invoice JSON | "2025-11-21" format |
| Currency symbol positioned correctly | Check rendered invoice | "kr" after amount |
| Swedish characters (åäö) render correctly | PDF with åäö | Characters correct |
| Time zone CET/CEST used | Check timestamps | Europe/Stockholm |
Dependencies:
- Swedish localization resources
- Handlebars custom helpers
- PDF font with Swedish character support
- Future: i18n library for multi-language
4.13 NFR-013: API Backward Compatibility
Requirement: API versions shall maintain backward compatibility for minimum 12 months after new version release, with clear deprecation warnings and migration guides.
Priority: HIGH
Versioning Strategy:
URL Path Versioning:
Current: https://api.egflow.com/v1/organizations/{id}/batches
Future: https://api.egflow.com/v2/organizations/{id}/batches
Both versions run simultaneously during transition period
...
Version Lifecycle:
v1.0 Released: January 2026
↓
v2.0 Released: January 2027
↓ (v1 deprecation announced)
v1 Supported: Until January 2028 (12 months)
↓
v1 Sunset: January 2028
...
Deprecation Warning (HTTP Header):
HTTP/1.1 200 OK Deprecation: true Sunset: Sat, 31 Jan 2028 23:59:59 GMT Link: <https://docs.egflow.com/api/v2/migration>; rel="successor-version"
...
Acceptance Criteria:
| Criterion | Validation Method | Target |
|---|---|---|
| v1 supported 12 months after v2 release | Verify both versions work | Both return 200 |
| Breaking changes only in major versions | Review v1.1, v1.2 changes | No breaking changes |
| Deprecation warnings 6 months advance | Check headers 6 months before sunset | Deprecation header present |
| Migration guide published | Review documentation | Complete guide available |
| v1 clients continue working during v2 rollout | Test v1 client after v2 deploy | No disruption |