Connected Data is everywhere. Visualizing connections makes them accessible and adds value. But what is the best way to do this?
Do you still work with big data? You probably do, and if you do, you know: there’s not much point in having tons of data, per se. The value comes from being able to get insights from your data by making connections. This is where visualization can help.
“Big data” is passe. Not because it’s not around anymore, more because everyone has it. Definitions and “4Vs” aside, big data comes down to having more data than what you can comfortably navigate and comprehend.
By now, we have all sorts of databases, cloud and on-premise systems, to be able to store everything we ever wished we could store, and more. We are moving past the infrastructure phase, and towards the value phase. The question now is: what are you going to do with your data?
Ideally, you’d like to use your data to derive insights. To do this, it’s not so much the volume of data that counts, but more the connections. Where do customers in high-performing outlets in your network come from? What does a fraud attempt look like, in terms of the perpetrator’s background and connections?
These are the types of questions leveraging connections in your data can help answer. There are a number of approaches you can take to turn your big data to connected data, and visualization is a valuable tool in your toolset. It’s just the way our perception works: visualizing elevates the abstract to become tangible and helps navigate vast amounts of data.
So what are some options for visualizing connected data? First, we have low-level database / datasource viewers. These are tools that are readily available, and using them you can access data directly. Think of data navigation tools that come bundled with databases, for example. But such tools are mostly useful for technical users. Business users may find them hard to use, and giving them access to raw data may not be a good idea.
Another option is generic visualization applications. These work on a higher level, so they are easier to use, they don’t necessarily require technical expertise, and they can offer some analytics. Still, they do not work with all back-ends (typically they focus on SQL), their visualizations may be too generic, and the cost per user may be significant.
But there is a 3rd, empowering option: software libraries for visualization. Remember – with great power comes great responsibility. Great power here means flexibility in terms of data acquisition, transformation, display, interaction, and integration. Great responsibility means putting in the work to develop your custom end-user experience.
yFiles by yWorks is a family of software libraries for the visualization of graphs, diagrams, and networks of all kinds. It has many things going for it: A complete set of automatic layout algorithms. The ability to view, edit, and create diagrams leveraging a visualization engine for all diagramming needs. It’s customizable and extensible, and runs on all major software platforms.
But rather than writing about yFiles, it’s best to show and tell a story of yFiles visualization. Every year, the Graph Drawing and Network Visualization conference organizes a contest where they provide data to be visualized. In 2019, one of the creative topics was exploring the relationships between Marvel superheros and the movies in which they appear.
The yWorks team won the award for the best visualization, and they have released it as a demo for everyone to play with. Let’s see how this was built, following Sebastian Müller, yWorks CTO. Sebastian presented this work in Connected Data London 2019, and it was warmly received by the audience.
In this visualization, all movies have been arranged according to their in-universe timeline. Characters are shown as edges that pass through the movies they appear in. The thickness of edges models the relative screen time of characters within a movie. The overall layout is automatically computed based on the hierarchical layout style in yFiles.
It all begins by importing the data. In this case, the data was in GraphML, a specialized XML format for graph data. But yFiles can connect and import data from virtually any source which allows programmatic access: JSON, XML, text, binary, database connectors. As an example, if you are already using a graph database, yFiles can connect to Neo4j.
To get an initial layout, users have several out-of-the-box options: Organic, Hierarchic, Orthogonal, Tree, Circular, Balloon, or Radial. Then comes adding text labels, which can be done for nodes, edges, ports, and background. Any number of labels, any color, font, size, including icons, can be added with a few lines of code.
Different node types can be styled using different shapes, colors, fills, opacities, strokes, icons, and badges. Having done that, it’s time for some graph analysis. yFiles lets users apply graph algorithms such as centrality, clustering, paths, flows, reachability, and cycle detection. This is a great feature for connected data analysis, so more on this to follow.
What’s next in the journey to insights? Using properties of nodes and edges, including those extracted through algorithms, to nest and group them. With yFiles, you can have arbitrary nesting depths, you can drill-down, and expand or collapse on any visualization.
Sebastian continued to unfold this journey of data processing and visualization with yFiles, leading us through Metaball Blob Groups, layouts with custom constraints, and data enrichment and transformation.
You can have fun playing with the Marvel Universe visualization. Since the graph offers many details to explore, yWorks created an interactive app that provides features like hover effects, filtering, zooming, and panning. But there is something more than fun and games here, and a couple of points worth making.
We did mention how yFiles lets you use graph algorithms out of the box. This may not sound like much if you are not familiar with graph algorithms, but it’s a big deal. Using graph algorithms lets you get insights about your data. Which nodes are the more central in your network? What is the shortest path to connect any two nodes?
These are questions graph algorithms can help answer. The ability to apply graph algorithms is a big part of the drive towards connected data, and having a library that supports this means you can use them regardless of your back-end. No matter what format your data is in, yFiles can get graph insights out of them for you.
How are two Marvel super-heroes connected, for example? Getting that information off the top of your head will be hard even for hard-core Marvel fans. Getting that information out of your data is doable, but requires much work, many hops and joins. Getting that information out of your connected, visualized data is fast and intuitive; with just a little bit of customized interaction, that information is just a click away.
The broader point this goes to show is that yFiles gives you flexibility. The programming library can be used not just to visualize existing data, but also to modify or even create data from scratch. It’s up to the application or demo developer whether this is enabled in the interface. To see this in action, just try out the graph algorithms demo. You’re in for a surprise.
With yFiles, you can add arbitrary data/properties and labels to nodes, edges, labels, and ports. The data can be used to drive the visualization, the algorithms, and the layout or it can just sit there and be stored along with the elements. As an example, the graph analysis demo lets users modify weights on edges, by pressing F2 on a selected edge or double-clicking a label or edge.
And this is not just a toy: edits can be persisted back to a database, communicated to a 3rd party service or storage system, or used locally, transiently, for the visualization. yFiles supports GraphML and a simple JSON format for serializing and deserializing diagrams, but developers can create their own adapters to their data storage of choice.
Would you like to go beyond standard data visualization, and get the benefits of automatic layouts, algorithms, and 2-way editing customized for your specific use-case? Then you should definitely have a look at yFiles and give it a try.