Welcome to the ARC Tool

Archetype Representation and Clustering Tool

ARC Tool Visual

About PREDICT

PREDICT (Predictive Renovation Data Intelligence Clustering Toolchain) is a project focused on enabling intelligent renovation pathways across Europe. It aims to leverage clustering algorithms and data analytics to uncover hidden patterns in building stock performance and renovation potential.

The ARC Tool is one of PREDICT’s core innovations, designed to cluster and visualise building data, detect outliers, and inform renovation and investment decisions with explainable AI models.

What is the ARC Tool?

The ARC Tool (Archetype Representation and Clustering Tool) is a modular web-based platform that empowers stakeholders—energy agencies, local authorities, building owners, and analysts—to:

  • Cluster buildings using intelligent algorithms (e.g. K-Means, DBSCAN, Agglomerative)
  • Evaluate renovation groupings with visual and statistical metrics
  • Detect anomalies in building datasets (outliers)
  • Assess renovation opportunities based on Smart Readiness, EPC, EUI, and more
  • Filter, visualise, and export results to support investment planning

The ARC Tool supports the creation of representative archetypes critical for Renovation Wave strategies and EU Green Deal compliance.

Work Package Structure

  1. WP1: Data Harmonisation & Preprocessing
  2. WP2: Clustering Engine Design (ARC Core)
  3. WP3: Evaluation Metrics & Visualisation
  4. WP4: Integration with PREDICT Dashboard
  5. WP5: Validation with Pilot Datasets
  6. WP6: Replication & Exploitation Plan

Clustering Algorithms & Evaluation Metrics

The ARC Tool uses advanced unsupervised learning methods to cluster buildings into renovation-relevant groups. Supported algorithms include:

  • K-Means: Simple partitioning based on centroid distance
  • DBSCAN: Density-based clustering to detect noise and core points
  • Agglomerative: Hierarchical clustering merging closest groups

Clusters are evaluated using four main validation scores:

  • Silhouette Score: Measures cluster separation (0.5–1 is good)
  • Davies-Bouldin Index: Lower is better (< 0.5 is good)
  • Calinski-Harabasz Index: Higher is better (> 2000 is strong)
  • Elbow Method: Detects optimal number of clusters based on WCSS

Visual panels and metrics guide users in selecting the best algorithm and filtering strategy.

Ready to Start?