Advertisement

Microservices for Big Data: Flipping the Paradigm

By on

ad_mapr_111616“Microservices are essentially applications that are broken down into small pieces. Using microservices, businesses can prevent large-scale failure by isolating problems, and save on computing resources,” commented Jack Norris, Senior Vice President of Data and Applications at MapR. Microservices are now altering a major Data Management paradigm of the last 30 years.

In a recent DATAVERSITY® interview Norris said, “This class of applications delivers real-time analytics, high-frequency decision making and other solution architectures that require immediate operations on large volumes of data.” Working together as a cohesive unit instead of isolated in silos, “this architecture leads to greater responsiveness, better decisions, less complexity, lower costs, and lower risks.” Norris said that MapR’s company focus is “to be the foundation at the center of this data revolution.”

MapR started in 2009 and according to Norris, “We’ve been on a straight line of innovation since.” Having a technical team with deep roots in enterprise storage and a CTO who spent time at Google on the BigTable group, MapR’s founders understood the need for innovation at the data platform layer. Their goals were to provide high-scale enterprise storage capability in conjunction with real-time database operations, and to create one platform that is a convergence of file system, database, and stream. “If you have that as an underpinning, that really opens things up and [it] will transform how applications are developed and delivered in the enterprise.”

The MapR Paradigm

Traditionally, data is stored and organized by a series of application-driven processes that extract, transform, and load data from hundreds of separate data silos, with analytic functions in a separate data store, and operations production systems that are separate from the analytics, Norris said. When we look at the volume and variety of data, he said, “we tend to start at the endpoint and then marvel at it.” The reality is that “most of that content is machine-generated content, most of that data was created one event at a time,” and it takes at least a day before any of that production data finds its way into the data warehouse.

Using microservices that can process operational and analytical data simultaneously, separations and delays can be removed, creating a “revolutionary” level of responsiveness to real-time events. This level of responsiveness can be used in a variety of contexts for a multitude of benefits: to provide opportunities for improving customer experience, to anticipate time loss due to equipment breakdowns, or to decrease the impact of fraud and security risks as they’re happening.

Emerging Issues

Norris stated that IDC and Gartner are predicting flat IT budget growth for the next five years, “and that kind of flat, stable growth masks some big changes underneath.” With an anticipated continued decline in spending on legacy technologies and an uptick in spending on next-gen technologies, the typical CIO can’t wait the usual year and a half for an application to roll out, he says. An increasing number of discussions about containers and re-usability, along with a growing interest in microservices are arising from a desire to lower costs and increase productivity without sacrificing innovation, Norris says, but “the stumbling block is the data.” Ephemeral, lightweight applications are considered good candidates, “but if it requires sharing data, if it requires a stable application, or in-depth analytics, those are typically passed over for deployment” due to constraints with data. But Norris said that in the right configuration, microservices can transcend those limitations.

Convergence is Key

As an example, Norris illustrated how a manufacturing company could transcend data limitations by leveraging a convergence of multiple microservices to anticipate and respond to downtime caused by parts breakage:

“You’ve got a microservice inspecting events as they pass through. You’ve got another microservice that’s looking at patterns of data and identifying what’s an interesting match. You’ve got another one that, based on [whether that’s an interesting match or not], is doing a database query, looking at availabilities for replacement parts. You’ve got another microservice that’s looking at labor schedules and who could be scheduled to replace that part.”

All this is accomplished by a collection of perhaps seven different microservices working together, each with a contributing role, he says. Some processes help identify and replace parts before they break down, some involve file operations, some are database operations, and some are streaming analytics, and “when those fire off, it’s based on events, and so it’s an orchestration” accomplished with a series of published and subscribed events, which include relevant data for that event. Norris said,

 “What we’re talking about now is not just a one-way stream, but a series of bi-directional streams and events and processes that are all part of the same fabric, that not only make it simpler to manage the data, but [also to] squeeze out the latency, because we’re not working across separate systems; we’re working on the same converged data platform.”


Rather than looking back and reacting to events in the past, this access to real-time data provides opportunities to anticipate and adjust before potential problems arrive.

Microservices in Practice

A stateful application has limitations that converged microservices do not, “and that’s an issue,” Norris said, because “if you move [the application] from this group of servers over here to this web server farm over here, it needs to find the data; and how you find the data, and what the links are, change.” With a converged data platform, all of those links are consistent. “You have a single namespace across all of the content, so you can move data across any location and it’s the same mountpoint.”

The Paradigm Flips

This “fundamental structural change” can also open up new chances to work with existing processes. “Now you’ve opened up all those applications to all of the stateful applications as well, and then the microservices that are running in those containers are free to access that data directly,” providing opportunities for increased productivity, he said:

 “Once developers are free to not have to worry about where data is located or how they’re accessing it, administrators are then free to focus on the data as a whole,” he says. It’s “a separation of roles that leads to both groups being much more productive.”

The current operating paradigm assumes that production and analytics are two completely different infrastructures, separated not only in terms of technology, but also separated in terms of time. Norris sees the potential of changing that paradigm to one where all those formerly separated processes are operating at the same time and microservices can accomplish that. He said his customers are seeing the potential, as well:

“We did a survey a year ago and 18% of our customers have 50 or more applications running on a single cluster. So that’s further proof that what we’re doing is really flipping the development process – starting with the data and bringing the applications to the data, rather than the other way around. It’s a huge trend, and we’re at the beginning of it.”

 

Leave a Reply