Company Blog
OilRefinery.jpg
Clik here to view.

Integrating resources isn’t easy. Traditionally, businesses chose vendors with all-in-one solutions to cover the task. This approach may seem convenient, but what freedoms must be sacrificed in order to achieve it?
This vendor relationship resembles the troubled history of coal mining towns in the Old West, where one company would own everything for sale in the town. Workers were paid with vouchers that could only be redeemed at company-owned shops.
The old folk tune "Sixteen Tons" stated it best: "I owe my soul to the company store."
With any monopoly, these vendors have no incentive to optimize products and services. There's only one option available — take it or leave it. But, just as some miners would leave these towns and make their way across the Wild West, many companies have chosen to forge their own trails with the freedom of Open Data Science — and they've never looked back.
Open Data Science: Providing Options
Innovation and flexibility are vital to the evolving field of Data Science, so any alternative to the locked-in vendor approach is attractive. Fortunately, Open Data Science provides the perfect ecosystem of options for true innovation.
Sometimes vendors provide innovation, such as with the infrastructure surrounding linear programming. This doesn’t mean they’re able to provide an out-of-the-box solution for all teams — adapting products to different businesses and industries requires work.
Most of the real innovation is emerging from the open source world. The tremendous popularity of Python and R, for example, bolsters innovation on all kinds of analytics approaches.
Given the wish to avoid a “mining town scenario” and the burgeoning innovation in Open Data Science, why are so many companies still reluctant to adopt it?
Companies Should Not Hesitate to Embrace Open Source
There are several reasons companies balk at Open Data Science solutions:
- Licensing. Open source has many licenses: Apache, BSD, GPL, MIT, etc. This wide array of choices can produce analysis paralysis. In some cases, such as GPL, there is a requirement to make source code available for redistribution.
- Diffuse contact. Unlike with vendor products, open source doesn’t provide a single point of contact. It’s a non-hierarchical effort. Companies have to manage keeping software current, and this can feel overwhelming without a steady guide they can rely on.
- Education. With rapid change, companies find it difficult to stay on top of the many acronyms, project names, and new techniques required with each piece of open source software.
Fortunately, these concerns are completely surmountable. Most licenses are appropriate for commercial applications, and many companies are finding open source organizations to act as a contact point within the Open Data Science world — with the added benefit of a built-in guide to the ever-changing landscape of open source, thereby also solving the problem of education.
The Best Approach for Starting an Open Data Science Initiative
There are several tactics organizations can use to effectively adopt Open Data Science.
For instance, education and establishing a serious training program is crucial. One focus here has to be on reproducibility. Team members should know how the latest graph was produced and how to generate the next iteration of it.
Much of this requires understanding the architecture of the applications one is using, so dependency management is important. Anything that makes the process transparent to the team will promote understanding and engagement.
Flexible governance models are also valuable, allowing intelligent assessment of open source pros and cons. For example, it shouldn’t be difficult to create an effective policy on what sort of open source licenses work best.
Finally, committing resources to successful adaptation and change management should be central to any Open Data Science arrangement. This will almost always require coding to integrate solutions into one’s workflow. But this effort also supports retention: companies that shun open source risk losing developers who seek the cutting edge.
Tackle the Future Before You're Left Behind
Unlike the old mining monopolies, competition in Data Science is a rapidly-changing world of many participants. Companies that do not commit resources to education, understanding, governance and change management risk getting left behind as newer companies commit fully to open source.
Well-managed open source analytics environments, like Anaconda, provide a compatible and updated suite of modern analytics programs. Combine this with a steady guide to traverse the changing landscape of Open Data Science, and the Data Science gold mine becomes ripe for the taking.