Session 18-B

How to do Data Mastering

Data mastering is the process of unifying multiple independently constructed data sets about an entity, for example customers, suppliers, or parts. Every large enterprise has this information in data silos and must perform unification to get full value from their data.

The new candidate solution is to use Large Language Models (LLMs) for data mastering (e.g. ChatGPT). LLMs join other candidate technologies, including rule engines, traditional machine learning, and deep neural networks. In this talk, I explain why LLMs are unlikely to work, why deep neural networks are generally avoided, and why rule systems don’t work at scale. This leaves traditional machine learning as the “last candidate standing”.

Speaker

michael stonebraker

Adjunct Professor, CTO, MIT/TAMR