I get questions like this a lot:

  • Where did this data come from?
  • How do I know I can trust the source?
  • What types of QA checks were applied to this data?

Data lineage is such a chronic issue in data engineering. This blog post from Airbyte gives a good overview & mentions some interesting products/projects that can maybe help out with data lineage.

Unfortunately, I have limited flexibility to purchase or install tools for this in my current role. Anyone rolled their own solution for this?