Intelligent Financial Close: technical specification

Architecture overview, database schema, ML implementation details, and deployment requirements for the prototype.

Reference implementation built on a non-confidential 19-account practice ledger so the full workflow can stay public, runnable, and auditable. Full source on GitHub.

Architecture overview

The system is designed for transparency over scale. Every component is legible and traceable so a reviewer can follow a single journal entry from ingestion to exception report without hitting a black box.

Data layer. SQLite with a normalised enterprise schema, chosen over a warehouse so the database can be opened, inspected, and queried without infrastructure.
ML engine. scikit-learn with Z-score anomaly detection. I chose a statistical model instead of deep learning because the threshold and scoring logic are readable in the repo.
Automation layer. Python task scheduler with declared dependency management. 17 tasks, explicit sequencing.
Interface. Jupyter notebooks for the interactive walkthrough. No separate frontend in the core prototype.

Key components

1. Data processing engine

Batch transaction processing against the practice ledger.
Automated data validation on ingestion.
Exception logging with account-level granularity.

2. Machine learning module

Z-score anomaly detection at a 2.5σ threshold. The threshold is conservative on a clean practice ledger because it flags roughly the top one percent of movements without generating noise. On a real ledger, the threshold would need calibration per account type.

import numpy as np
from sklearn.ensemble import IsolationForest
from sklearn.preprocessing import StandardScaler

class FinancialAnomalyDetector:
    def __init__(self, contamination=0.1, random_state=42):
        self.model = IsolationForest(
            contamination=contamination,
            random_state=random_state,
            n_estimators=100,
        )
        self.scaler = StandardScaler()
        self.is_fitted = False

    def fit(self, X):
        X_scaled = self.scaler.fit_transform(X)
        self.model.fit(X_scaled)
        self.is_fitted = True

    def predict(self, X):
        if not self.is_fitted:
            raise ValueError("Model must be fitted before prediction")
        X_scaled = self.scaler.transform(X)
        predictions = self.model.predict(X_scaled)
        anomaly_scores = self.model.score_samples(X_scaled)
        return predictions, anomaly_scores

3. Workflow automation

17 close tasks modelled with declared dependencies. A delay in any task surfaces its downstream impact immediately rather than becoming visible only at the end of the cycle.

from enum import Enum

class TaskStatus(Enum):
    PENDING = "pending"
    IN_PROGRESS = "in_progress"
    COMPLETED = "completed"
    FAILED = "failed"

class WorkflowManager:
    def __init__(self, config_path):
        with open(config_path, "r") as f:
            self.config = json.load(f)
        self.tasks = {}
        self.initialize_tasks()

    def get_ready_tasks(self):
        return [
            t for t in self.tasks.values()
            if t.status == TaskStatus.PENDING
            and all(
                self.tasks[dep].status == TaskStatus.COMPLETED
                for dep in t.dependencies
            )
        ]

Database schema

CREATE TABLE transactions (
    id              INTEGER PRIMARY KEY,
    account_id      TEXT    NOT NULL,
    amount          DECIMAL(10,2),
    transaction_date DATE,
    category        TEXT,
    risk_score      DECIMAL(3,2)
);

CREATE TABLE close_tasks (
    id              INTEGER PRIMARY KEY,
    task_name       TEXT    NOT NULL,
    status          TEXT    DEFAULT 'pending',
    dependencies    TEXT,   -- JSON array of task IDs
    started_at      DATETIME,
    completed_at    DATETIME
);

CREATE TABLE anomalies (
    id              INTEGER PRIMARY KEY,
    transaction_id  INTEGER REFERENCES transactions(id),
    anomaly_score   DECIMAL(5,4),
    flagged_at      DATETIME,
    reviewed        BOOLEAN DEFAULT FALSE
);

Performance characteristics

Metric	Value
Monthly transactions	1,000+
Anomaly classification time	Under 2 seconds
Z-score threshold	2.5σ
Model confidence signal	87% confidence-style output

These are prototype figures on a clean practice ledger. Real-world performance depends on data quality, volume, and the variance characteristics of the specific ledger.

Deployment requirements

Prerequisites

Python 3.8 or later
4 GB RAM minimum
50 GB storage

Setup

python -m venv venv
source venv/bin/activate

pip install -r requirements.txt

python init_database.py

Dependencies

scikit-learn==1.3.0
pandas==2.0.3
numpy==1.24.3
python-dotenv==1.0.0

Configuration

DATABASE_URL=sqlite:///financial_data.db
ML_MODEL_PATH=./models/anomaly_detector.pkl

Workflow definition

{
  "tasks": [
    {
      "id": "data_collection",
      "name": "Data Collection",
      "dependencies": [],
      "automation": true,
      "duration_estimate": 30
    },
    {
      "id": "validation",
      "name": "Data Validation",
      "dependencies": ["data_collection"],
      "automation": true,
      "duration_estimate": 45
    }
  ]
}

Security considerations

Role-based access control on the API layer.
Data encryption at rest and in transit.
Audit logging for all transactions and anomaly reviews.

Testing

# Unit tests
python -m pytest tests/ -v

# Integration tests
python -m pytest tests/integration/ -v

Full source code and additional documentation are available on GitHub.