Q&A with Mark Nelson, VP of Engineering at Tableau
This month we welcome Mark Nelson to talk about all things data. Mark is a VP of Engineering at Tableau. Mark has spent over a decade in software engineering leadership roles helping build and manage some of the largest software products in the world today. Most of those products involved creating, managing, or integrating big data. If you use any SaaS products today, odds are that Mark worked on some foundational technology for that software. Mark was previously at Twilio and Salesforce.
Building a data-driven organization is easier said than done. What best practices do you see companies implement to use data to their advantage?
Being data-driven essentially means easily having the ability to get trusted insights from appropriate data sources to be able to take action. Getting to those trusted insights is definitely easier said than done. It starts with having trusted data sources with named owners for both the contents of the data and the schema definition (metadata). Organizations need to define what are the sources of truth which can include databases, data warehouses and/or data lake (Editor’s Note: A data lake or warehouse refers to a centralized software solution where you store all of your raw data). If you have lots of random data sources (i.e. excel files on personal laptops) with no fidelity or control in change history, the integrity of the data is suspect and the question is do you really have a source of truth you can trust?
Typically, mature organizations have a data governance owner who is responsible for curation of the data needed to run an organization. In order to have trusted data sources, the data governance owner leverages or builds a data catalog (Editor’s Note: A data catalog is more of dictionary, it’s a tool that provides detailed annotation to the raw data in your data lake so that the end user of the data can create accurate reports). The data catalog gives insight into all of the curated data sources in an organization, with an owner for each data source, definitions for the data and field types and has the ability to track metadata changes over time (lineage). Having a well-maintained data catalog means the signals you are getting downstream can be trusted, whether you use them to build Machine Learning models or for a Data Analyst building Business Intelligence reports and visualizations.
Further to that, there is classification of data sensitivity and Personally-identifiable Information (PII). In this day and age of GDPR, Schrems II, CCPA, etc., privacy is an even bigger concern. Legal direction into your organization's position on data and identifying what can be used as a controller vs. a processor is key.
What resources would you suggest for SMBs to get smarter about building a data-driven organization?
Start with identification of your sources of truth and catalog them. Define who has access to change the data and/or the meta-data and have a mechanism to track change. Yes, this is work, but is critical to ensuring you can trust your data.
Work with your legal team to create a compliance posture for privacy and how you manage data privacy.
Determine what business questions you want to answer - depending on where your data is kept and the answers you need to get will help you to weigh your options in how to get the most important insights. Defining those business questions is most critical in evaluating different options that will best meet your organization's needs so you don't get enamored by demoware.
Most SMBs don't have the resources to hire data analysts — how can companies with limited resources get the most out of their data?
Again, it starts with identifying what questions you need answered and work backwards from there. If you have software engineers, most software engineers can solve these problems through SQL or scripting. Additionally, there are Service Integrators (SI's) who can come in to help you build your dashboards and reports to get your business going. If none of those are an option, and need a self-service option, many Business Intelligence (BI) tools are designed to be pretty straightforward to be able to prepare your data and even give you recommendations on how to build your insights. For example, Tableau and Power BI offer tooling to make it easier for business users to connect to many different data sources, curate them and create an insight. Communities of data visualization creators, such as with Tableau, are a particularly great place to get help with building data-driven insights.
Data is only as valuable as the insights it provides. What can help a CEO cut through the noise to get to the signal?
It really comes down to the questions you need answered and working backwards. All too often, people feel that the more dashboards and reports you provide, the smarter you must be, when the complete opposite is true. For example, if the job-to-be-done for the CEO is to provide visibility into open sales pipeline, closed deal rates, and deal close times, focus on identifying the data sources that have the necessary information to provide those curated insights. Less is truly more.
Machine learning is a hot topic these days, but ML is far from a silver bullet. How should SMBs think about the potential for ML? What's required from a data perspective for it to be effective?
(Editor's Note: If you missed our previous blog on AI, or want more detail on what AI and ML is, Andreessen Horowitz published a great, non-technical starter guide about AI.)
ML...yeah. ML is powerful, but only if you know everything about your data and can trust its fidelity and completeness. Having really strong data hygiene (through a data catalog) and enough data to train a model on is key. The most common mistake I see with people starting out with ML is 1) they don't know where their data source came from and 2) they're training models on far too little data. If you don't know where the data came from and who has touched it and if you're working with a small dataset that doesn't pressure test your model, how can you trust it?
Finally, I can't reiterate this enough, Data Privacy is a thing and it's here to stay. Using sensitive data to train models needs to be worked on in close conjunction with your legal and compliance team to ensure you don't open yourself up to risk.
What are you reading or watching these days?
With 2 young daughters and my youngest having Autism, I'm reading and learning more on human behavior to understand her better and communicate more effectively for how she processes what is going on around her. Turns out it's useful knowledge for working with people in tech as well! Haha!
From an industry perspective, I'm reading more and learning about what's happening with data mesh and open data standards. More and more organizations want freedom from ecosystem lock-ins to be able to use best-of-breed technologies to help turn all of their people into data-driven knowledge workers.
In my personal time, my wife and I are catching up on Yellowstone and need to start watching Ted Lasso!