Quantcast
Channel: Planet Python
Viewing all articles
Browse latest Browse all 22462

Continuum Analytics News: Introducing GeoViews

$
0
0
PostedThursday, September 1, 2016
Jim Bednar
Continuum Analytics

Philipp Rudiger
Continuum Analytics

.bk-toolbar-active a[target="_blank"]:after { display:none; }

GeoViews is a new Python library that makes it easy to explore and visualize geographical, meteorological, oceanographic, weather, climate, and other real-world data. GeoViews was developed by Continuum Analytics, in collaboration with the Met Office. GeoViews is completely open source, available under a BSD license freely for both commercial and non-commercial use, and can be obtained as described at the Github site.

GeoViews is built on the HoloViews library for building flexible visualizations of multidimensional data. GeoViews adds a family of geographic plot types, transformations, and primitives based primarily on the Cartopy library, plotted using either the Matplotlib or Bokeh packages. GeoViews objects are just like HoloViews objects, except that they have an associated geographic projection based on cartopy.crs. For instance, you can overlay temperature data with coastlines using simple expressions like gv.Image(temperature)*gv.feature.coastline, and easily embed these in complex, multi-figure layouts and animations using both GeoViews and HoloViews primitives, while always being able to access the raw data underlying any plot.

This post shows you how GeoViews makes it simple to use point, path, shape, and multidimensional gridded data on both geographic and non-geographic coordinate systems.

importnumpyasnpimportxarrayasxrimportpandasaspdimportholoviewsashvimportgeoviewsasgvimportirisimportcartopyfromcartopyimportcrsfromcartopyimportfeatureascffromgeoviewsimportfeatureasgfhv.notebook_extension('bokeh','matplotlib')%output backend='matplotlib'
%opts Feature [projection=crs.Robinson()]
HoloViewsJS, MatplotlibJS, BokehJS successfully loaded in this cell.

Built-in geographic features

GeoViews provides a library of basic features based on cartopy.feature that are useful as backgrounds for your data, to situate it in a geographic context. Like all HoloViews Elements (objects that display themselves), these GeoElements can easily be laid out alongside each other using '+' or overlaid together using '*':

gf.coastline+gf.ocean+gf.ocean*gf.land*gf.coastline

Other Cartopy features not included by default can be used similarly by explicitly wrapping them in a gv.Feature GeoViews Element object:

%%opts Feature.Lines (facecolor='none' edgecolor='gray')
graticules = gv.Feature(cf.NaturalEarthFeature(category='physical', name='graticules_30',scale='110m'), group='Lines')
graticules

The '*' operator used above is a shorthand for hv.Overlay, which can be used to show the full set of feature primitives provided:

%%output size=450
features = hv.Overlay([gf.ocean, gf.land, graticules, gf.rivers, gf.lakes, gf.borders, gf.coastline])
features

Projections

GeoViews allows incoming data to be specified in any coordinate system supported by Cartopy's crs module. This data is then transformed for display in another coordinate system, called the Projection. For instance, the features above are displayed in the Robinson projection, which was declared at the start of the notebook. Some of the other available projections include:

projections=[crs.RotatedPole,crs.Mercator,crs.LambertCylindrical,crs.Geostationary,crs.Gnomonic,crs.PlateCarree,crs.Mollweide,crs.Miller,crs.LambertConformal,crs.AlbersEqualArea,crs.Orthographic,crs.Robinson]

When using matplotlib, any of the available coordinate systems from cartopy.crs can be used as output projections, and we can use hv.Layout (what '+' is shorthand for) to show each of them:

hv.Layout([features.relabel(group=p.__name__)(plot=dict(projection=p()))forpinprojections]).display('all').cols(3)

The Bokeh backend currently only supports a single output projection type, Web Mercator, but as long as you can use that projection, it offers full interactivity, including panning and zooming to see detail (after selecting tools usin the menu at the right of the plot):

%%output backend='bokeh'
%%opts Overlay [width=600 height=500 xaxis=None yaxis=None] Feature.Lines (line_color='gray' line_width=0.5)
features

Tile Sources

As you can see if you zoom in closely to the above plot, the shapes and outlines are limited in resolution, due to the need to have relatively small files that can easily be downloaded to your local machine. To provide more detail when needed for zoomed-in plots, geographic data is often divided up into separate tiles that can be downloaded individually and then combined to cover the required area. GeoViews lets you use any tile provider supported by Matplotlib (via cartopy) or Bokeh, which lets you add image or map data underneath any other plot. For instance, different sets of tiles at an appropriate resolution will be selected for this plot, depending on the extent selected:

%%output dpi=200
url = 'https://map1c.vis.earthdata.nasa.gov/wmts-geo/wmts.cgi'
gv.WMTS(url, layer='VIIRS_CityLights_2012', crs=crs.PlateCarree(), extents=(0, -60, 360, 80))

Tile servers are particularly useful with the Bokeh backend, because the data required as you zoom in isn't requested until you actually do the zooming, which allows a single plot to cover the full range of zoom levels provided by the tile server.

%%output backend='bokeh'
%%opts WMTS [width=450 height=250 xaxis=None yaxis=None]

from bokeh.models import WMTSTileSource
from bokeh.tile_providers import STAMEN_TONER

tiles = {'OpenMap': WMTSTileSource(url='http://c.tile.openstreetmap.org/{Z}/{X}/{Y}.png'),
         'ESRI': WMTSTileSource(url='https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{Z}/{Y}/{X}.jpg'),
         'Wikipedia': WMTSTileSource(url='https://maps.wikimedia.org/osm-intl/{Z}/{X}/{Y}@2x.png'),
         'Stamen Toner': STAMEN_TONER}

hv.NdLayout({name: gv.WMTS(wmts, extents=(0, -90, 360, 90), crs=crs.PlateCarree())
            for name, wmts in tiles.items()}, kdims=['Source']).cols(2)

If you select the "wheel zoom" tool in the Bokeh tools menu at the upper right of the above figure, you can use your scroll wheel to zoom into all of these plots at once, comparing the level of detail available at any location for each of these tile providers. Any WTMS tile provider that accepts URLs with an x and y location and a zoom level should work with bokeh; you can find more at openstreetmap.org.

Point data

Bokeh, Matplotlib, and GeoViews are mainly intended for plotting data, not just maps, and so the above tile sources and cartopy features are typically in the background of the actual data being plotted. When there is a data layer, the extent of the data will determine the extent of the plot, and so extent will not need to be provided explicitly as in the previous examples.

The simplest kind of data to situate geographically is point data: longitude and latitude coordinates for locations on the Earth's surface. GeoViews makes it simple to overlay such plots onto Cartopy features, tile sources, or other geographic data. For instance, let's load a dataset of all the major cities in the world with their population counts over time:

cities=pd.read_csv('./assets/cities.csv',encoding="ISO-8859-1")population=gv.Dataset(cities,kdims=['City','Country','Year'])cities.tail()
CityCountryLatitudeLongitudeYearPopulation
10025ValenciaVenezuela (Bolivarian Republic of)10.17-68.002050.02266000.0
10026Al-HudaydahYemen14.7942.942050.01854000.0
10027Sana'a'Yemen15.3644.202050.04382000.0
10028Ta'izzYemen13.5744.012050.01743000.0
10029LusakaZambia-15.4228.172050.02047000.0

Now we can convert this text-based dataset to a set of visible points mapped by the latitude and longitude, and containing the population, country, and city name as values. The longitudes and latitudes in the dataframe are supplied in simple Plate Carree coordinates, which we will need to declare explicitly, since each value is just a number with no inherently associated units. The .to conversion interface lets us do this succinctly, giving us points that are instantly visualizable either on their own or in a geographic context:

cities=population.to(gv.Points,kdims=['Longitude','Latitude'],vdims=['Population','City','Country'],crs=crs.PlateCarree())
%%output backend='bokeh'
%%opts Overlay [width=600 height=300 xaxis=None yaxis=None] 
%%opts Points (size=0.005 cmap='inferno') [tools=['hover'] color_index=2]
gv.WMTS(WMTSTileSource(url='https://maps.wikimedia.org/osm-intl/{Z}/{X}/{Y}@2x.png')) * cities

Note that since we did not assign the Year dimension to the points key or value dimensions, it is automatically assigned to a HoloMap, rendering the data as an animation using a slider widget. Because this is a Bokeh plot, you can also zoom into any location to see more geographic detail, which will be requested dynamically from the tile provider (try it!). The matplotlib version of the same plot, using the Cartopy Ocean feature to provide context, provides a similar widget to explore the Year dimension not mapped onto the display:

%%output size=200
%%opts Feature [projection=crs.PlateCarree()]
%%opts Points (s=0.000007 cmap='inferno') [color_index=2]
gf.ocean * cities[::4]

However, the matplotlib version doesn't provide any interactivity within the plot; the output is just a series of PNG images encoded into the web page, with one image selected for display at any given time using the Year widget.

Shapes

Points are zero-dimensional objects with just a location. It is also important to be able to work with one-dimensional paths (such as borders and roads) and two-dimensional areas (such as land masses and regions). GeoViews provides special GeoElements for paths (gv.Path) and polygons (gv.Polygons). The GeoElement types are extensions of the basic HoloViews Elementshv.Path and hv.Polygons that add support for declaring geographic coordinate systems via the crs parameter and support for choosing the display coordinate systems via the projection parameter. Like their Holoviews equivalents, gv.Path and gv.Polygons accept lists of Numpy arrays or Pandas dataframes, which is good for working with low-level data.

In practice, the higher-level GeoElements gv.Shape (which wraps around a shapely shape object) and gv.Feature (which wraps around a Cartopy Feature object) are more convenient, because they make it simple to work with large collections of shapes. For instance, the various features like gv.ocean and gv.coastline introduced above are gv.Feature types, based on cartopy.feature.OCEAN and cartopy.feature.COASTLINE, respectively.

We can easily access the individual shapely objects underlying these features if we want to work with them separately. Here we will get the geometry corresponding to the Australian continent and display it using shapely's inbuilt SVG repr (not yet a HoloViews plot, just a bare SVG displayed by Jupyter directly):

land_geoms=list(gf.land.data.geometries())land_geoms[21]

Instead of letting shapely render it as an SVG, we can now wrap it in the gv.Shape object and let matplotlib or bokeh render it, alone or with other GeoViews or HoloViews objects:

%%opts Points (color="black")
%%output dpi=120
australia = gv.Shape(land_geoms[21], crs=crs.PlateCarree())

australia * hv.Points([(133.870,-23.700)]) * hv.Text(133.870,-21.5, 'Alice Springs')

The above plot uses HoloViews elements (notice the prefix hv.), which do not have any information about coordinate systems, and so the plot works properly only because it was specified as PlateCarree coordinates (bare longitude and latitude values). You can use other projections safely as long as you specify the coordinate system for the Text and Points objects explicitly, which requires using GeoViews Elements (prefix gv.):

%%opts Points (color="black")
pc=crs.PlateCarree()
australia(plot=dict(projection=crs.Mollweide(central_longitude=133.87))) * \
gv.Points([(133.870,-23.700)],crs=pc) * gv.Text(133.870,-21.5, 'Alice Springs',crs=pc)

You can see why the crs parameter is important if you change the above cell to omit ,crs=pc from gv.Points and gv.Text; neither the dot nor the text label will then be in the correct location, because they won't be transformed to match the Mollweide projection used for the rest of the plot.

Multiple shapes can be combined into an NdOverlay object either explicitly:

%output dpi=120 size=150
%%opts NdOverlay [aspect=2]
hv.NdOverlay({i: gv.Shape(s, crs=crs.PlateCarree()) for i, s in enumerate(land_geoms)})

or by loading a collection of shapes from a shapefile, such as this collection of UK electoral district boundaries:

%%opts NdOverlay [aspect=0.75]
shapefile='./assets/boundaries/boundaries.shp'
gv.Shape.from_shapefile(shapefile, crs=crs.PlateCarree())

One common use for Shapefiles make it possible to create choropleth maps, where each part of the geometry is assigned a value that will be used to color it. Constructing a choropleth by combining a bunch of shapes one by one can be a lot of effort and is error prone, but is straightforward when using a shapefile that assigns standardized codes to each shape. For instance, the shapefile for the above UK plot assigns a well-defined geographic code for each electoral district's MultiPolygon shapely object:

shapes=cartopy.io.shapereader.Reader(shapefile)list(shapes.records())[0]
<Record: <shapely.geometry.multipolygon.MultiPolygon object at 0x11786a0d0>, {'code': 'E07000007'}, <fields>>

To make a choropleth map, we just need a dataset with values indexed using these same codes, such as this dataset of the 2016 EU Referendum result in the UK:

referendum=pd.read_csv('./assets/referendum.csv')referendum=hv.Dataset(referendum)referendum.data.head()
leaveVoteshareregionNameturnoutnamecode
04.100000Gibraltar83.500000GibraltarBS0005003
169.599998North East65.500000HartlepoolE06000001
265.500000North East64.900002MiddlesbroughE06000002
366.199997North East70.199997Redcar and ClevelandE06000003
461.700001North East71.000000Stockton-on-TeesE06000004

To make it simpler to match up the data with the shape files, you can use the .from_records method of the gv.Shape object to build a gv.Shape overlay that automatically merges the data and the shapes to show the percentage of each electoral district who voted to leave the EU:

%%opts NdOverlay [aspect=0.75] Shape (cmap='viridis')
gv.Shape.from_records(shapes.records(), referendum, on='code', value='leaveVoteshare',
                     index=['name', 'regionName'], crs=crs.PlateCarree())

As usual, the matplotlib output is static, but the Bokeh version of the same data is interactive, allowing both zooming and panning within the geographic area and revealing additional data such as the county name and numerical values when hovering over each shape:

%%output backend='bokeh'
%%opts Shape (cmap='viridis') [xaxis=None yaxis=None tools=['hover'] width=400 height=500]
gv.Shape.from_records(shapes.records(), referendum, on='code', value='leaveVoteshare',
                     index='name', crs=crs.PlateCarree(), group='EU Referendum')

As you can see, essentially the same code as was needed for the static Matplotlib version now provides a fully interactive view of this dataset.

For the remaining sections, let's set some default parameters:

%opts Image [colorbar=True] Curve [xrotation=60] Feature [projection=crs.PlateCarree()]
hv.Dimension.type_formatters[np.datetime64]='%Y-%m-%d'
%output dpi=100 size=100

Here in this blog post we will use only a limited number of frames and plot sizes, to avoid bloating the web page with too much data, but when working on a live server one can append widgets='live' to the %output line above. In live mode, plots are rendered dynamically using Python based on user interaction, which allows agile exploration of large, multidimensional parameter spaces, without having to precompute a fixed set of plots.

Gridded data

In addition to point, path, and shape data, GeoViews is designed to make full use of multidimensional gridded (raster) datasets, such as those produced by satellite sensing, systematic land-based surveys, and climate simulations. This data is often stored in netCDF files that can be read into Python with the xarray and Iris libraries. HoloViews and GeoViews can use data from either library in all of its objects, along with NumPy arrays, Pandas data frames, and Python dictionaries. In each case, the data can be left stored in its original, native format, wrapped in a HoloViews or GeoViews object that provides instant interactive visualizations.

To get started, let's load a dataset originally taken from iris-sample-data) containing surface temperature data indexed by 'longitude', 'latitude', and 'time':

xr_dataset=gv.Dataset(xr.open_dataset('./sample-data/ensemble.nc'),crs=crs.PlateCarree(),kdims=['latitude','longitude','time'],vdims=['surface_temperature'])xr_dataset
:Dataset   [latitude,longitude,time]   (surface_temperature)

Here there is one "value dimension", i.e. surface temperature, whose value can be obtained for any combination of the three "key dimensions" (coordinates) longitude, latitude, and time.

We can quickly build an interactive tool for exploring how this data changes over time:

surf_temp=xr_dataset.to(gv.Image,['longitude','latitude'])*gf.coastlinesurf_temp[::2]

Here the slider for 'time' was generated automatically, because we instructed HoloViews to lay out two of the key timensions as x and y coordinates in an Image when we called .to(), with the value dimension 'surface_temperature' mapping onto the color of the image pixels by default, but we did not specify what should be done with the remaining 'time' key dimension. HoloViews is designed to make everything visualizable somehow, so it automatically generates a slider to cover this "extra" dimension, allowing you to explore how the surface_temperature values change over time. In a static page like this blog post each frame will be embedded into the page, however in a live Jupyter notebook it is trivial to explore large datasets and render each frame dynamically.

You could instead have told HoloViews to lay out the remaining dimension spatially (as faceted plots), in which case the slider will disappear because there is no remaining dimension to explore. As an example, here let's grab just the first three frames, then lay them out spatially:

surf_temp[::2].layout()

Normalization

By default, HoloViews will normalize items displayed together as frames in a slider or animation, applying a single colormap across all items of the same type sharing the same dimensions, so that differences are clear. In this particular dataset, the range changes relatively little, so that even if we turn off such normalization in layouts (or in animation frames using {+framewise}) the results are similar:

%%opts Image {+axiswise}
surf_temp[::2].layout()

Here you can see that each frame has a different range in the color bar, but it's a subtle effect. If we really want to highlight changes over a certain range of interest, we can set explicit normalization limits. For this data, let's find the maximum temperature in the dataset, and use it to set a normalization range by using the redim method:

max_surface_temp=xr_dataset.range('surface_temperature')[1]printmax_surface_tempxr_dataset.redim(surface_temperature=dict(range=(300,max_surface_temp))).\
  to(gv.Image,['longitude','latitude'])[::2]*gf.coastline(style=dict(edgecolor='white'))
317.331787109

Now we can see a clear cooling effect over time, as the yellow and white areas close to the top of the normalization range (317K) vanish in the Americas and in Africa. Values outside this range are clipped to the ends of the color map.

Non-Image views of gridded data

gv.Image Elements are a common way to view gridded data, but the .to() conversion interface supports other types as well, such as filled or unfilled contours and points:

%%output size=100 dpi=100
%%opts Points [color_index=2 size_index=None] (cmap='jet')
hv.Layout([xr_dataset.to(el,['longitude', 'latitude'])[::5, 0:50, -30:40] * gf.coastline
           for el in [gv.FilledContours, gv.LineContours, gv.Points]]).cols(3)

Non-geographic views of gridded data

So far we have focused entirely on geographic views of gridded data, plotting the data on a projection involving longitude and latitude. However the .to() conversion interface is completely general, allowing us to slice and dice the data in any way we like. To illustrate this, let's load an expanded version of the above surface temperature dataset thad adds an additional 'realization' dimension.

kdims=['realization','longitude','latitude','time']xr_ensembles=xr.open_dataset('./sample-data/ensembles.nc')dataset=gv.Dataset(xr_ensembles,kdims=kdims,vdims=['surface_temperature'],crs=crs.PlateCarree())dataset
:Dataset   [realization,longitude,latitude,time]   (surface_temperature)

The realization is effectively a certain set of modelling parameters that leads to different predicted values for the temperatures at given times. We can see this clearly if we map the data onto a temperature versus time plot:

%%output backend='bokeh'
%%opts Curve [xrotation=25] NdOverlay [width=600 height=400 legend_position='left']
sliced = dataset.select(latitude=(0, 5), longitude=(0,10))
sliced.to(hv.Curve, 'time').overlay('realization')

Here there is no geographic organization to the visualization, because we selected non-geographic coordinates to display. Just as before, the key dimensions not selected for display have become sliders, but in this case the leftover dimensions are longitude and latitude. (Realization would also be left over and thus generate a slider, if it hadn't been mapped onto an overlay above.)

Because this is a static web page, we selected only a small portion of the data to be available in the above plot, i.e. all data points in the range 0,10 for latitude and longitude. If this code were running on a live Python server, one could instead access all the data dynamically:

hv.Layout([dataset.to(hv.Curve,'time',dynamic=True).overlay('realization')])

We can also make non-geographic 2D plots, for instance as a HeatMap over time and realization, again at a specified longitude and latitude:

%%opts HeatMap [show_values=False colorbar=True]
sliced.to(hv.HeatMap, ['realization', 'time'])

In general, any HoloViews Element type (of which there are many!) can be used for non-geographic dimensions selected in this way, while any GeoViews GeoElement type can be used for geographic data.

Reducing and aggregating gridded data

So far all the conversions shown have incorporated each of the available coordinate dimensions, either explicitly as dimensions in the plot, or implicitly as sliders for the leftover dimensions. However, instead of revealing all the data individually in this way, we often want to see the spread of values along one or more dimensions, pooling all the other dimensions together.

A simple example of this is a box plot where we might want to see the spread of surface_temperature on each day, pooled across all latitude and longitude coordinates. To pool across particular dimensions, we can explicitly declare the "map" dimensions, which are the key dimensions of the HoloMap container rather than those of the Elements contained in the HoloMap. By declaring an empty list of mdims, we can tell the conversion interface '.to()' to pool across all dimensions except the particular key dimension(s) supplied, in this case the 'time' (plot A) and 'realization' (plot B):

%%opts BoxWhisker [xrotation=25 bgcolor='w']
hv.Layout([dataset.to.box(d, mdims=[]) for d in ['time', 'realization']])

This approach also gives us access to other statistical plot types. For instance, with the seaborn library installed, we can use the Distribution Element, which visualizes the data as a kernel density estimate. In this way we can visualize how the distribution of surface temperature values varies over time and the model realizations. We do this by omitting 'latitude' and 'longitude' from the list of mdims, generating a lower-dimensional view into the data, where a temperature histogram is shown for every 'realization' and 'time', using GridSpace:

%opts GridSpace [shared_xaxis=True fig_size=150] 
%opts Distribution [bgcolor='w' show_grid=False xticks=[220, 300]]
importseaborndataset.to.distribution(mdims=['realization','time']).grid()

Selecting a particular coordinate

To examine one particular coordinate, we can select it, cast the data to Curves, reindex the data to drop the now-constant latitude and longitude dimensions, and overlay the remaining 'realization' dimension:

%%opts NdOverlay [xrotation=25 aspect=1.5 legend_position='right' legend_cols=2] Curve (color=Palette('Set1'))
dataset.select(latitude=0, longitude=0).to(hv.Curve, ['time']).reindex().overlay()

Aggregating coordinates

Another option is to aggregate over certain dimensions, so that we can get an idea of distributions of temperatures across all latitudes and longitudes. Here we compute the mean temperature and standard deviation by latitude and longitude, casting the resulting collapsed view to a Spread Element:

%%output backend='bokeh'
lat_agg = dataset.aggregate('latitude', np.mean, np.std)
lon_agg = dataset.aggregate('longitude', np.mean, np.std)
hv.Spread(lat_agg) * hv.Curve(lat_agg) + hv.Spread(lon_agg) * hv.Curve(lon_agg)

As you can see, with GeoViews and HoloViews it is very simple to select precisely which aspects of complex, multidimensional datasets that you want to focus on. See holoviews.org and geo.holoviews.org to get started!


Viewing all articles
Browse latest Browse all 22462

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>