articleOct 1, 2016Closed access

The Synthetic Data Vault

IIT@MIT

Indexed incrossref

Abstract

The goal of this paper is to build a system that automatically creates synthetic data to enable data science endeavors. To achieve this, we present the Synthetic Data Vault (SDV), a system that builds generative models of relational databases. We are able to sample from the model and create synthetic data, hence the name SDV. When implementing the SDV, we also developed an algorithm that computes statistics at the intersection of related database tables. We then used a state-of-the-art multivariate modeling approach to model this data. The SDV iterates through all possible relations, ultimately creating a model for the entire database. Once this model is computed, the same relational information allows the SDV…

Citation impact

557
total citations
FWCI
11.99
Percentile
100%
References
13
Citations per year

Authors

3

Topics & keywords

Keywords
  • Computer science
  • Synthetic data
  • Data modeling
  • Data mining
  • Relational database
  • Data model (GIS)
  • Intersection (aeronautics)
  • Generative model
UN Sustainable Development Goals
  • Industry, innovation and infrastructure
No related works found for this paper.