Data Science Maturity Models (DSMMs)
About Big Data Maturity Models: Much recent data science work (2015-20) has focused on Big Data. Some Big Data Maturity Models (DSMMs) parallel traditional models, identifying shared levels, domains, and attributes. These models contributed to the model below. BDMMs focusing more on unique attributes (e.g., impact of volume and velocity on data capture and processing), were not included.
Source Types Reviewed
DSMMs, DSMM Assessment Tools, evaluations and/or comparisons of DSMMs, Big Data MM, models in adjacent sub-fields. From vendors, large tech companies, educational centers, researchers.
DSMM Core Features
Levels (designate the level of capability or performance), Domains (designate the performance areas measured across levels), Attributes (descriptors of domain performance within each level).
The majority of Data Science Maturity Models (DSMMs) reviewed cover 4-5 levels with outliers reaching 6 or 7 levels by breaking out levels more discretely or including a level prior to data awareness. Many maturity models for core data science (and adjacent fields like data warehousing, AI, Big Data, etc.) use variations or synonyms of level terms from earlier models, even going back to the Capability Maturity Model Integration (CMMI) originating at Carnegie Mellon in the mid-1980s. The “Consensus Composite” model presented here is congruent with the majority of these.
Most domains were similar or sub-components of each other across reviewed models. Their similarity can be seen when multiple DSMMs are compared for domain alignment (see Comuzzi). The “Consensus Composite” model included uses these domains:
- Organization (awareness, people, sponsorship, roles, segmentation, strategy, enterprise, culture)
- Infrastructure (architecture, tech, platforms, tools)
- Data Management (“the” data, size, sources, complexity, methodology, etc.)
- Analytics (what is being done with data, practices, automated, integrated, etc.)
- Governance (who is controlling, or making available access, security, privacy, asset management, etc.)
- Best Practices (representing best actions available at a level, and/or actions critical to capability growth)
Many common attributes exist across reviewed models (implementing a data management strategy, department-level data projects, etc.). However, there is little commonality for when these attributes occur. Because levels are not defined by calendar duration, the delta between attributes in each model is unclear. Attributes included below were identified as key due to commonality across models, and/or criticality for continued capability building to the next level.
DSMMs can be used to place an organization on the DSM spectrum by simple comparison, or by completing one of the available DSMM Assessments. The DSMM can help identify key challenges within each domain or level, and identify crucial steps as organizational capacity is built.
Level One: Ad-Hoc
At the beginning of the Data Science Maturity Model (DSMM), companies have no strategy governing the application of data science. Those involved are often motivated by the abundant references in industry journals, buzz around Artificial Intelligence (AI) and Machine Learning (ML), awareness of competition’s work, or most importantly, the nascent desire to answer critical business questions. The somewhat specialized jargon of data scientists, as well as the multitude of specialized platforms and tool sets available make decision-making difficult for developing competency without data specialists already on board. They may be asking key data questions, but have few resources to answer, or an obvious roadmap to get them there. In general, the executives with sponsorship abilities are unlikely to understand the power of analytics to drive decisions and interest is low.
Infrastructure for data analysis is absent, or a specific business may have developed a system for its own isolated activities. Platforms and tools in use for data efforts are repurposed from other functions, not designed for Business Intelligence (BI) or analytics. Data volumes are typically small and incomplete, and activities limited by desktop hardware. At most, one business may purchase a front-end BI tool allowing a power user to create simple visualizations of past data so an executive can see the business at-a-glance.
Data management is uncontrolled by a strategy; instead, primary internal data sources are owned by IT, with a data management strategy limited to ownership of several databases.
Analytics is limited to financial, regulatory, and compliance data, a few historical dashboards for management reporting, and performance management using accessible KPIs. Building a data set to analyze is a slow, manual process, merging data from different spreadsheets using an ad-hoc methodology.
Data projects are isolated, and typically unbudgeted and standalone from an existing business strategy. During this initial level, data activities such as these happen in silos, and no data science group would be aware of another’s work product or activities. More often, business heads simply ask IT for data or reports that describe what happened last week or last month, which they can compare to scheduled goals. Because of the isolated work effort, data project results are limited in reach and business value.
Any governance will likely be IT-centric rather than business or business/IT centric, as the ownership of data at this stage is synonymous with ownership of hardware and technology. Unfortunately, business leaders who want more data to answer questions, may not ask IT due to the slow response and difficulty associated with a net new request. Often, new business data requests are beyond current IT capabilities.
Best Practices for Practitioners at Level One: Ad-Hoc
- Identify use cases within a business area by clearly articulating business needs and projected impact. Proving analytics value to stakeholders is key to driving greater adoption of data science as a core business competency.
- Become a change agent and find ways to help manage large-scale change. Remember, at this level, decisioning is not data-driven, no sponsorship (or need) is acknowledged to make all data available to everyone who could use it to perform better, much less an integrated data platform and toolset that spans the enterprise and drives the business model. Help identify the vision for analytics and the steps needed to achieve that vision.
- Collaborate with multiple stakeholders to foster relationships between other guarded parties who may be protective of their own data ownership. Opportunities to collaborate will increase in the Foundational Level.
- Ensure collaboration is encouraged at the data scientist level as a matter of practicality: a new challenge for one data scientist may be an old challenge to another. Sharing best practices early will encourage a culture of finding best solutions, not insisting that the solution must be invented in your group.
Level Two: Foundational
During the foundational level, more data enthusiasts are beginning to self-educate, attending webinars, reviewing industry articles, and increasing the knowledge base among multiple business lines. Dedicated teams, and defined use cases are at work, but data science may still be seen as little more than a few novelty special projects by the company. Multiple organizations are focusing on functional area excellence, and beginning to experiment with internal and external data to improve parts of their business. Often an executive sponsor drives discussion beyond a business segment, and at this point other areas such as marketing may be asking business data questions.
While minimal data infrastructure supports data science teams, the need for an enterprise data infrastructure has become apparent: timeline of projects using manual and disparate data sources, tool variations across the enterprise, and inherent errors of manual data combining. Different business lines use a variety of self-service tools, front-end BI, often acquired for a specific project purpose. Localized data warehouses are likely, but limited to current systems and sources. Within some business areas, larger data volumes may become part of increasingly complex data projects, with some ability to manage unstructured data assessment; however, this is usually an exception.
Analytics is still rudimentary overall, but advances are being made with the inclusion of ML and predictive analytics for solving business problems. Depending on company sector and experience (i.e., some history of data analysis such as loan approval, credit risk analysis), some groups or individuals may be more adept at advanced analysis. They would, however, be operating locally at the department or business line level. With the advances in user-friendly BI tool sets, some teams may be developing reports beyond month-end goal status, and gaining support from business heads.
True governance is in its earliest stages, with most orgs yet to identify a steering committee to address at an enterprise level. Individual departments enforce policies and controls in their respective silos. However, without centralized governance, practitioners cannot discover what data assets are available throughout the enterprise except through “key people” contacts who control legacy sources.
Best Practices for Practitioners at Level Two: Foundational
- Use case opportunities begin to surface widely, and small project-based collaborations in analytics begin between multiple businesses and IT. Using these opportunities to drive acceptance, show value, and encourage other practitioners can accelerate the move to enterprise-level data science adoption.
- Sponsorship of a corporate data management steering committee (let IT chair the effort) could drive critical decisions for governance and a master data management strategy.
- Center the Business: At each step in the DSMM, ensure that the business or operation goals, as measured by clearly linked data, are always the center and focus. Avoid allowing the data project to become its own goal instead of the enabler to drive business results.
- Begin investigating Mobile Analytics–access data, measuring and analyzing data. Combining this data stream with traditionally used data may surface new use cases.
Level Three: Integrated
In this level, integration occurs across the enterprise and the larger data science platform, providing the breadth, depth, and stability that eventually supports the cultural level of maturity of the data-driven enterprise.
Executives recognize that pursuing data is key to meeting future goals. Successful data science projects in multiple business lines with a variety of staff have raised the awareness of data’s business relevance and ROI. Advanced data modelers and statisticians in IT and the business have joined business analysts and other practitioners. And the data science team now works with integrated data architecture, BI tools, and specialized apps in using targeted technology for data mining and analytics, achieving insights from disparate data sources. Data approaches are aligned across the enterprise. The company recognizes data science as a fundamental competitive advantage, and the business strategy now welcomes insight gained from data, helping drive data analytics practices across the enterprise.
Infrastructure is now comprised of enterprise-level elements. Integrating and automating workflows begins, accelerating data projects while removing manual effort and associated errors. Easily accessible integrated data tools allow more participants with little formal data experience, expanding the pool of practitioners.
An enterprise data management strategy becomes a realistic initiative, with efforts to identify, organize, and evaluate all current data assets. The need to address large volume data is recognized, and may be the first step toward a separate but parallel trajectory to create a Big Data strategy.
Data science projects are now business centric. Collaboration is recognized as a necessity across the enterprise, with data projects crossing traditional barriers between different functional units. Automation is now part of the analytics workflow, replacing repetitive manual tasks (collecting, cleaning, processing data), and decreasing data project timelines. While predictive analytics was introduced earlier, it is now used frequently to predict likelihood of specific results to particular business processes. The continued growth across the enterprise and into each business and functional area of data-driven decisioning helps advance the data-driven mission to the cultural level.
At the integration level, comprehensive governance becomes an obvious necessity as numerous data practitioners and data sources become visible at an enterprise level. Earlier, multiple stakeholders functioned as gatekeepers of siloed kingdoms; now is an opportunity to surface valuable sources of data insight, provide security, and implement access policies systematically that drive innovation while maintaining data integrity.
Best Practices for Practitioners at Level Three: Integrated
- Democratizing data access allows all members to use available data to understand their business and its performance. To make this possible, analytics activities must be made available to more people. Watch out for any gatekeeping approach that uses data as leverage to maintain power for an individual or group of people.
- At the early integration level, new types of collaboration of analytics teams and business teams should be part of any effort for data-driven business solutions. Look for opportunities to “crowdsource” data project tasks outside the team: allowing non data scientists to perform steps aligned with their expertise (e.g., data selection, data cleaning, etc.). Increasingly enterprise-wide involvement is essential to promoting a culture of data-driven business.
- Big Data (BD) competency is likely a part of any organization reaching for a transformative data science level. Fundamental differences exist between DSMM and BDMM, for example the specialized tools required for manipulating high volume and velocity data streams. Nevertheless, adoption of BD competencies does require many familiar steps of a technical maturity model–identifying a use case, building associated technical competencies, localized adoption, and corporate adoption.
- Examine new types of analytics outside of your current practices: for example, text analytics, geospatial analytics, and clickstream analytics. Some or all might create valuable connections with currently used data.
Level Four: Cultural
The most noticeable feature of the cultural level is the almost complete lack of silo-contained ownership. Data-driven decision-making is embraced by the enterprise, and data science resources are fully supported with staffing, technology, and funding. A Chief Data Officer may be responsible for overseeing data as a corporate asset. Businesses are now effectively data-driven.
The company continues to invest in enterprise-level infrastructure, maintaining standardized, integrated data science platforms and tool sets, enabling collaboration, modeling, and tracking of data science projects. Best practices are in place for data science products, and metadata tools are introduced. Quantifiable metrics are introduced for evaluating data science projects.
Enterprise analytics is now possible. Data projects now span functions and business lines, with the ability to quantify value of insights applied to different business activities. Data teams now go beyond traditional descriptive and diagnostic, and commonly use predictive and prescriptive analytics to understand the business well enough to know which decisions will provide the best possible outcome. Use cases now include AI/ML throughout the organization. A data-driven insights culture is established across the businesses, integrating results into new business policies and processes that create added value for the company.
Enterprise-level governance is established for data science elements, and access to data sources is managed quickly and efficiently. Confidence in enterprise governance is reflected in the data-driven decision-making culture across all business units.
Best Practices for Practitioners at Level Four: Cultural
- Best practices should be integrated into infrastructure across the data life cycle. Workflow integration with analytics, metadata tagging within projects, even data project methodology–best practice parameters should drive data science processes. With integration tools now available, new best practices should continue to be codified into all systems.
- Regularly assess new methodologies and tools for new insights and improved data scientist productivity. Business-embedded analytics is more likely to proactively anticipate needs, providing updated solutions much faster than traditionally separated teams.
Level Five: Transformational
The transformational level is unique–few companies can reach this level. The prerequisites are comprehensive and complex (i.e., the culmination of capability-building actions from all prior levels). Companies that reach this level were often founded as a data-driven company from the beginning, and are typically the leaders defining their business sector and driving change for not only their industry, but adjacent fields dependent on it. Awareness of data primacy for the company is extremely high, and spreads outside of the organization. Data science underpins the entire organization at this point, and refined practices and strategies create continuous business model innovation, enabling complete market disruption (think Netflix, Google, Amazon).
At this level, the increasingly high data standards are part of everyday business. Resources are constantly refined by use cases, technical drivers, and thought leadership. The company looks to data scientists for innovations, and understands the accompanying technical and organizational needs. The company is likely operating as a data service provider, for external and internal clients, partners, and employees.
The company has inspected every data source available for ML, and has pursued high value use cases. Metadata management tools are integrated, formalized, and applied for all data assets throughout the life cycle. A core skill is the ability to consolidate disparate heterogenous data into complete high-quality data sets in a consumable format.
Formalized data science methodology best practices are integrated into all projects across the enterprise. Employees and external partners may seamlessly collaborate and share analytics throughout the enterprise.
Information governance is integrated throughout business processes. Much day-to-day governance is automated, with access tracking managing a record of all work across the data life cycle and enterprise. On-demand access to data resources is available on premise and in cloud with seamless governance controls as well.
Best Practices for Practitioners at Level Five: Transformational
At level five, the organization has codified innovation by integrating approaches that extract the best from people, process and technology, and then re-integrating the discovered or innovated value back into the system. The ethos of always assessing new approaches and toolsets is part of the culture, and will surface, capture, and integrate best practices.
Bardess, (2019). The Bardess Data Science Maturity Curve. Bardess Blog. https://www.bardess.com/the-bardess-data-science-maturity-curve/
Braun, Henrik, (2015). Evaluation of Big Data Maturity Models: A Benchmarking Study to Support Big Data Maturity Assessment in Organizations. Tampere University of Technology.
Comuzzi, M., Patel, A. (2016). How Organizations Leverage Big Data: A Maturity Model. In Industrial Management and Data Systems. 2016.
Data Science Central, (2012). Big Data Analytics Maturity Model. Source: IDC Asia/Pacific Business Analytics Practice (2011). https://www.datasciencecentral.com/profiles/blogs/big-data-analytics-maturity-model
El-Darwiche, B., Koch, V., Meer, D., R., S., & Tohme, W., (2014). Big Data maturity: An action plan for policymakers and executives. In: The Global Information Technology Report. World Economic Forum, 43–51. Note: Article from Booz & Company.
Guerra, P., Borne, K. (2016). 10 Signs of Data Science Maturity. O’Reilly. https://www.oreilly.com/content/10-signs-of-data-science-maturity/
Halper, F., Stodder, D. (2016). A Guide to Achieving Big Data Analytics Maturity. TDWI.
Halper, F., Stodder, D. (2014). TDWI Analytics Maturity Model Guide. TDWI Research.
Hornick, Mark, (2018). A Data Science Maturity Model for Enterprise Assessment. Oracle Blog. https://blogs.oracle.com/r/a-data-science-maturity-model-for-enterprise-assessment-part-1 and https://blogs.oracle.com/r/data-science-maturity-model-summary-table-for-enterprise-assessment-part-12
Hornick, Mark, (2018). Data Science Maturity Model V1.3 XLSX Summary Table for Enterprise Assessment. Oracle Blog. https://app.compendium.com/api/post_attachments/9a4f0341-e293-489c-a3fe-6a5bad665029/view
Hortonworks, (2016). Big Data Maturity Model. A Hortonworks White Paper. http://hortonworks.com/wp-content/uploads/2016/04/Hortonworks-Big-Data-Maturity-Assessment.pdf
IDC, (2013). Big Data Maturity Tool. http://csc.bigdatamaturity.com/ .
Infotech, (2013). Big Data maturity assessment tool. http://www.infotech.com/research/it-big-data-maturity-assessment-tool
Knowledgent, (2014). Big Data Maturity Assessment. https://bigdatamaturity.knowledgent.com (site offline after Accenture acquisition)
Luersen, Seth, (2017). Data Warehouse in the Age of AI Maturity. memSQL Blog. https://www.memsql.com/blog/memsql-maturity-framework/
Malchi, Yoni, (2019). Six Stages of Transforming into a More Data-Driven Organization. World Wide Technology. https://www.wwt.com/article/data-maturity-curve
McKinsey Global Institute, (2016). The Age of Analytics: Competing in a Data-Driven World. McKinsey & Co.
Moore, Danny T., (2014). Roadmaps and Maturity Models: Pathways Toward Adopting Big Data. 2014 Proceedings of the Conference for Information Systems Applied Research.
Nott, Chris, (2014). Big Data & Analytics Maturity Model. IBM Blog. https://www.ibmbigdatahub.com/blog/big-data-analytics-maturity-model
Onis, Teresa de, (2016). The Four Stages of the Data Maturity Model. CIO Digital Magazine. June 2016.
Physioc, Heather, (2018). How to Diagnose Your SEO Client’s Search Maturity. MOZ. https://moz.com/blog/seo-client-maturity
Radcliffe, J., (2014). Leverage a Big Data maturity model to build your Big Data roadmap. Tech. rep., Radcliffe Advisory Services Ltd.
Steele, Mac, (2016). Introducing the Data Science Maturity Model. Domino Blog. https://blog.dominodatalab.com/introducing-the-data-science-maturity-model/
Veenstra, Fleur Van, (2013). Big Data in Small Steps: Assessing the Value of Data. ECP-jaarcongres 2013.