At a special session of SciPy 2018 in Austin, representatives of a wide range of open-source Python visualization tools shared their visions for the future of data visualization in Python. We heard updates on Matplotlib , , , and many more. I attended as a representative of , , Datashader , , and , and my Anaconda colleague Jean-Luc Stevens attended representing . This first post surveys the packages currently available and shows how they are linked, and subsequent posts will discuss how these tools have been evolving in recent years, and how they will go forward from here.
The Current Landscape
To set the stage, I showed Jake VanderPlas’s overview of how the many different visualization libraries in Python currently relate to each other:
Here, you can see several main groups of libraries, each with a different origin, history, and focus. One clearly separable group is the ” ” libraries for visualizing physically situated data (in the lower left of the figure). These tools ( , , , , , , and ) primarily build on the 1992 OpenGL graphics standard, delivering graphics-intensive visualizations of physical processes in three or four dimensions (3D over time), for regular or irregularly gridded data. These libraries predate HTML5’s support for rich web applications, generally focusing on high-performance desktop-GUI applications in engineering or scientific contexts.
The other libraries nearly all fall into the ” ” group, focusing on visualizations of information in arbitrary spaces, not necessarily the three-dimensional physical world. InfoVis libraries use the two dimensions of the printed page or computer screen to make abstract spaces interpretable, typically with axes and labels. The InfoVis libraries can be further broken down into numerous subgroups:
One of the oldest and by far the most popular of the InfoVis libraries, released in 2003, with a very extensive range of 2D plot types and output formats. Matplotlib also predated HTML5’s support for rich web applications, focusing instead on static images for publication along with interactive figures using desktop-GUI toolkits like Qt and GTK. Matplotlib includes some 3D support, but much more limited than the SciVis libraries provide.
A variety of tools have built on Matplotlib’s 2D-plotting capability over the years, either using it as a rendering engine for a certain type of data or in a certain domain ( , , , , etc.), or providing a higher-level API on top to simplify plot creation ( , , , ), or extending it with additional types of plots ( , etc.).
Once HTML5 allowed rich interactivity in browsers, many libraries arose to provide interactive 2D plots for web pages and in Jupyter notebooks, either using custom JS ( , ) or primarily wrapping existing JS libraries like D3 ( , ). Wrapping existing JS makes it easy to add new plots created for the large JS market (as for Plotly), while using custom JS allows defining lower level JS primitives that can be combined into completely new plot types from within Python (as for Bokeh).
Many other libraries, even beyond those listed in Jake’s diagram, provide other complementary functionality (e.g. for visualizing networks).
Differentiating Factors Between Viz Tools
The above breakdown by history and technology helps explain how we got to the current profusion of Python viz packages, but it also helps explain why there are such major differences in user-level functionality between the various packages. Specifically, there are major differences in the supported plot types, data sizes, user interfaces, and API types that make the choice of library not just a matter of personal preference or convenience, and so they are very important to understand:
The most basic plot types are shared between multiple libraries, but others are only available in certain libraries. Given the number of libraries, plot types, and their changes over time, it is very difficult to precisely characterize what’s supported in each library, but it is usually clear what the focus is if you look at the example galleries for each library. As a rough guide:
The architecture and underlying technology for each library determine the data sizes supported, and thus whether the library is appropriate for large images, movies, multidimensional arrays, long time series, meshes, or other sizeable datasets :
Because of the wide range in data size (and thus to some extent data type) supported by these types of libraries, users needing to work with large sizes will need to choose appropriate libraries at the outset.
User Interfaces and Publishing
The various libraries differ dramatically in the ways that plots can be used.
Standalone web-based dashboards and apps : Plotly graphs can be used in separate deployable apps with , and Bokeh, HoloViews, and GeoViews can be deployed using Bokeh Server . Most of the other InfoVis libraries can be deployed as dashboards using the new library, including at least Matplotlib, Altair, Plotly, Datashader, hvPlot, Seaborn, plotnine, and yt. However, despite their web-based interactivity, the ipywidgets-based libraries (ipyleaflet, pythreejs, ipyvolume, bqplot) are difficult to deploy as public-facing apps because the Jupyter protocol allows arbitrary code execution (but see the defunct Jupyter dashboards project and flask-ipywidgets for potential solutions).
Users thus need to consider whether a given library will cover the range of uses they expect for their visualizations.
The various InfoVis libraries offer a huge range of programming interfaces suitable for very different types of users and different ways of creating visualizations. These APIs differ by orders of magnitude in how much code is needed to do common tasks and in how much control they provide to the user for handling uncommon tasks and for composing primitives into new types of plots:
Each of these APIs is suited to users with different backgrounds and goals, making some tasks easy and concise, and others more difficult. Apart from Matplotlib, most libraries support one or at most two alternative APIs, making it important to choose a library whose approach fits with each user’s technical background and preferred workflows.
As you can see, there is a huge range of visualization functionality available for Python, with a diversity in approach and focus that is reflected in the large number of libraries available. Differences between approaches remain important and have far-reaching implications, meaning that users need to take these differences into consideration before investing deeply into any particular approach. But as we saw at SciPy 2018, trends toward convergence are helping make it less crucial which libraries users select. To learn more about these emerging trends, stay tuned for Part II of this series, Python Data Visualization 2018: Moving Toward Convergence.
Article by channel:
Everything you need to know about Digital Transformation
The best articles, news and events direct to your inbox
Read more articles tagged: Data Visualisation