Company Blog
If data scientists were professional athletes, they’d be ditching their swim caps and donning cleats right about now.
Traditional data science teams function like swim teams. Although all members strive for one goal — winning the meet — each works individually, concentrating on his or her isolated lane or heat. As each player finishes his heat, the team’s score is tallied. If enough team members hit their individual goals, then the team succeeds. But team members’ efforts are isolated and desynchronized, and some members might pull off incredible times, while others lag behind in their heats.
Today, the industry is moving toward open data science (ODS), which enables data scientists to function more like a soccer team: Team members aren’t restricted to their own lanes but are instead free to move about the field. Even goalies can score in soccer, and ODS developers are similarly encouraged to contribute wherever their skills intersect with development challenges. No team members are relegated to “second-string” or niche roles: In a given project, developers might contribute to model building, while domain experts like quants offer insights about code structure or visualization. As with soccer, the pace is fast, and the process is engaging and fun.
Why ODS?
Data is the indisputable king of the modern economy, but too many data scientists still function like swimmers, each working with his own set of tools to manage heaps of data. To work as a true team, data scientists (and their tools) must function together.
ODS is the revolution rising to meet this challenge. Instead of forcing data scientists to settle on a single language and proprietary toolset, open data science champions inclusion. Just as ODS encourages data scientists to function as one, it also recognizes the potential of open source tools — for data, analytics and computation — to form a connected, collaborative ecosystem.
Developers, analytics and domain experts adhering to ODS principles gain these key advantages over traditional and proprietary approaches:
- Availability: Open source tools offer many packages and approaches within an ecosystem to solve problems that make it faster to build solutions quickly.
- Innovation: The lack of vendor lock-in means the developer community is continuously updating tools, which is ideal for collaborative workflows and quick-to-fail projects.
- Interoperability: Rather than pigeonholing developers into a single language or set of tools, ODS embraces a rich ecosystem of compatible open source tools.
- Transparency: Open data science tools facilitate transparency between teams, emphasizing innovations like the Jupyter/IPython notebook that allows developers to easily share code and visualizations.
ODS Is Making Analytics More Precise
Over the past few years — largely thanks to the growth of ODS — the pace of innovation has quickened in the analytics field. Now, researchers and academics immediately release their algorithms to the open source community, which has empowered data scientists using these tools to become ever more granular in their analyses.
Because open data science has provided more ways to build, deploy and consume analytics, businesses can know their customers more personally than ever before.
Previously, a sales analysis might have yielded insights like “Women over 50 living in middle-class households are most likely to buy this product.” Now, these analyses are shedding ever-greater light on buyer personas. Today, an analyst might tell marketers, “Stay-at-home moms in suburban areas who tend to shop online between 3 p.m. and 6 p.m. are more likely to buy the premium product than the entry-level one.”
Notice the difference? The more precise the analysis is, the greater its use to business professionals. Information about where, how and when buyers purchase a product are invaluable insights for marketers.
ODS has also broadened data scientists’ options for visualization. Let’s say a telecommunications company needs to illustrate rotational churn in a way that shows which customers are likely to leave over a period of time. Perhaps the company is using Salesforce, but Salesforce doesn’t offer the rotational churn analysis the business is looking for. If the business’ data scientist is using ODS tools, then that data scientist can create a model for the rotational churn and embed it into Salesforce. The analysis can include rich, contextual visualization that clearly illustrates customers’ rotational churn patterns.
Data Science Isn’t One-Size-Fits-All
Just like there’s no one right way to run a business or market a product, there’s no one right way to do data science.
For instance, one data scientist might employ a visual component framework, while another accomplishes the same task with a command line interface or integrated language development environment. Business analysts and data engineers in the data science team may prefer spreadsheets and visual exploration.
Thanks to this flexibility, ODS has become the foundation of modern predictive analytics and one platform has emerged to manage its tools successfully: Anaconda.
Anaconda is the leading open source analytics ecosystem and full-stack ODS platform for Python, R, Scala and many other data science languages. It adeptly manages packages, dependencies and environments for multiple languages.
Data science teams using Anaconda can share models and results through Jupyter/IPython notebooks and can utilize an enormous trove of libraries for analytics and visualization. Furthermore, Anaconda supercharges Python to deliver high performance analytics for data science team that want to scale up and out their prototypes without worrying about incurring delays when moving to production.
With Anaconda, data science teams decrease project times by better managing and illustrating data sets and offering clearer insights to business professionals. Get your data science team out of the pool by making the switch to ODS and Anaconda. Before you know it, your data scientists will be scoring goals and on their way to the World Cup.
If you are interested in learning more about this topic and are attending the Gartner BI & Analytics Summit in Grapevine, TX, join me in our session on Why Open Data Science Matters, Tuesday, March 15th from 10:45-11:30amCT.