Welcome to the ARC Tool

Archetype Representation and Clustering Tool

Learn More

About PREDICT

PREDICT (Predictive Renovation Data Intelligence Clustering Toolchain) is a project focused on enabling intelligent renovation pathways across Europe. It aims to leverage clustering algorithms and data analytics to uncover hidden patterns in building stock performance and renovation potential.

The ARC Tool is one of PREDICT’s core innovations, designed to cluster and visualise building data, detect outliers, and inform renovation and investment decisions with explainable AI models.

What is the ARC Tool?

The ARC Tool (Archetype Representation and Clustering Tool) is a modular web-based platform that empowers stakeholders—energy agencies, local authorities, building owners, and analysts—to:

Cluster buildings using intelligent algorithms (e.g. K-Means, DBSCAN, Agglomerative)
Evaluate renovation groupings with visual and statistical metrics
Detect anomalies in building datasets (outliers)
Assess renovation opportunities based on Smart Readiness, EPC, EUI, and more
Filter, visualise, and export results to support investment planning

The ARC Tool supports the creation of representative archetypes critical for Renovation Wave strategies and EU Green Deal compliance.

Work Package Structure

WP1: Data Harmonisation & Preprocessing
WP2: Clustering Engine Design (ARC Core)
WP3: Evaluation Metrics & Visualisation
WP4: Integration with PREDICT Dashboard
WP5: Validation with Pilot Datasets
WP6: Replication & Exploitation Plan

Clustering Algorithms & Evaluation Metrics

The ARC Tool uses advanced unsupervised learning methods to cluster buildings into renovation-relevant groups. Supported algorithms include:

K-Means: Simple partitioning based on centroid distance
DBSCAN: Density-based clustering to detect noise and core points
Agglomerative: Hierarchical clustering merging closest groups

Clusters are evaluated using four main validation scores:

Silhouette Score: Measures cluster separation (0.5–1 is good)
Davies-Bouldin Index: Lower is better (< 0.5 is good)
Calinski-Harabasz Index: Higher is better (> 2000 is strong)
Elbow Method: Detects optimal number of clusters based on WCSS

Visual panels and metrics guide users in selecting the best algorithm and filtering strategy.