Easy access to environmental data for analyzing environmental drivers of the dynamics of small pelagic fish

This notebook demonstrates how easy it is to retrieve data relevant to analysing environmental drivers of small pelagics using ERDDAP servers and Python. In the examples, the code needed to actually get the data, which is short and simple, are separated from code to map or graph the data once the data ar in memory, so that the more complicated plotting code doesn't obscure how simple it is to extract the desired data.

ERDDAP servers provide access to literally petabytes of data, including satellite data, fisheries survey data, glider data, animal tracking data and more.

To start we define some functions that are used throughout.

In [2]:
from mpl_toolkits.basemap import Basemap
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
import numpy.ma as ma
import cmocean
import urllib
import pandas as pd
from netCDF4 import Dataset
import datetime
/Users/rmendels/anaconda/lib/python2.7/site-packages/matplotlib/font_manager.py:273: UserWarning: Matplotlib is building the font cache using fc-list. This may take a moment.
  warnings.warn('Matplotlib is building the font cache using fc-list. This may take a moment.')
In [3]:
def getURL(myURL, outFile):
    response = urllib.urlretrieve(myURL, outFile)
    return(response)
In [4]:
def plot_map(grid, lats, lons, myPalette):
    southlat = lats.min()
    northlat = lats.max()
    westlon = lons.min()
    eastlon = lons.max()
    latdif = abs(northlat - southlat)
    londif = abs(westlon - eastlon)
    westedge = westlon - londif * 0.1
    eastedge = eastlon + londif * 0.1
    edgedif = abs(westedge-eastedge)
    northedge = northlat - latdif * 0.1
    southedge = southlat - latdif * 0.1
    mylons,  mylats = np.meshgrid(lons, lats)
    plt.figure(figsize=(12,8))
    map = Basemap(projection = 'merc', lat_0 = (southlat + northlat)/2, (lon_0) = (westlon + eastlon)/2,
        resolution = 'h', area_thresh = 0.1,
        llcrnrlon = westedge, llcrnrlat = southedge,
        urcrnrlon = eastedge, urcrnrlat = northedge)
    map.drawcoastlines()
    map.drawcountries()
    im1 = map.fillcontinents(color = '0.3')
    im2 = map.drawmapboundary()
    im3 = map.drawmeridians(np.arange((westedge - np.mod(westedge, 5)), (eastedge + np.mod(eastedge, 5)),5),
        labels=[True, False, False, True])
    im4 = map.drawparallels(np.arange((southedge - np.mod(southedge, 5)), (northedge + np.mod(northedge, 5)),5),
        labels=[True, False, False, True])
    im5 = map.pcolormesh(mylons, mylats, grid, shading = 'flat',cmap = myPalette, latlon = True)
    cb = map.colorbar(im5, "right", size = "5%", pad = "2%")
    plt.show()

MUR SST

MUR (Multi-scale Ultra-high Resolution) is an analyzed SST product at 0.01-degree resolution going back to 2002, providing one of the longest satellite based time series at such high resolution (see https://podaac.jpl.nasa.gov/dataset/MUR-JPL-L4-GLOB-v4.1). The latest data available for a region off the west coast can be extracted and downloaded by:

In [7]:
murURL = 'http://coastwatch.pfeg.noaa.gov/erddap/griddap/jplMURSST41.nc?analysed_sst[(last):1:(last)][(22.):1:(51.)][(-140.):1:(-105)]'
result = getURL(murURL, 'murSST.nc')
sstFile = Dataset('murSST.nc')
sst = sstFile.variables['analysed_sst'][:,:,:]
latitude = sstFile.variables['latitude'][:]
longitude = sstFile.variables['longitude'][:]
sstFile.close()
sst = np.squeeze(sst)

and then the downloaded netcdf file read into memory:

In [ ]:
sstFile = Dataset('murSST.nc')
sst = sstFile.variables['analysed_sst'][:,:,:]
latitude = sstFile.variables['latitude'][:]
longitude = sstFile.variables['longitude'][:]
sstFile.close()
sst = np.squeeze(sst)

and then plotted using matplotlib and basemap:

In [10]:
plot_map(sst, latitude, longitude, cmocean.cm.thermal)
<matplotlib.figure.Figure at 0x10d92b990>

VIIRS SST and Chlorophyll

VIIRS (Visible Infrared Imaging Radiometer Suite) is a scanning radiometer, that collects visible and infrared imagery and radiometric measurements of the land, atmosphere, cryosphere, and oceans. VIIRS data is used to measure cloud and aerosol properties, ocean color, sea and land surface temperature, ice motion and temperature, fires, and Earth's albedo. Both NASA and NOAA provide VIIRS-based high resolution SST and chlorophyll products.

We look at the latest 3-day composite SST product at 750 meter resolution developed by ERD from a real-time NOAA product (see http://coastwatch.noaa.gov/cwn/cw_products_sst.html). Download the data from ERDDAP:

In [14]:
viirsSSTUrl = 'http://coastwatch.pfeg.noaa.gov/erddap/griddap/erdVHsstaWS3day.nc?sst[(last):1:(last)][(0.0):1:(0.0)][(41.):1:(31.)][(-128.):1:(-115.)]'
result = getURL(viirsSSTUrl, 'viirsSST.nc')

read in the data from the netcdf file:

In [ ]:
viirssstFile = Dataset('viirsSST.nc')
viirsSST = viirssstFile.variables['sst'][:,:,:,:]
latitude = viirssstFile.variables['latitude'][:]
longitude = viirssstFile.variables['longitude'][:]
viirssstFile.close()
viirsSST = np.squeeze(viirsSST)

and plot the data:

In [15]:
plot_map(viirsSST, latitude, longitude, cmocean.cm.thermal)

We can obtain a time series at a location, here (36., -126.), getting the data as a .csv file, which can be read directly into pandas:

In [5]:
myURL = 'http://coastwatch.pfeg.noaa.gov/erddap/griddap/erdVHsstaWS3day.csv?sst[(2015-01-01):1:(2015-12-31)][(0.0):1:(0.0)][(36):1:(36)][(-126):1:(-126)]'
junk = pd.read_csv(myURL, parse_dates=True, index_col='time', skiprows=[1])

and plot the data using pandas plot methods:

In [6]:
junk['sst'].plot()
plt.show()

We look at a similar 3-day composite for chloropyll for the same region from a scientific quality product developed by NOAA (see http://coastwatch.noaa.gov/cwn/cw_products_sst.html), first extracting the data using ERDDAP :

In [16]:
viirsChlaURL = 'http://coastwatch.pfeg.noaa.gov/erddap/griddap/erdVHNchla3day.nc?chla[(last):1:(last)][(0.0):1:(0.0)][(41.):1:(31.)][(-128.):1:(-115.)]'
result = getURL(viirsChlaURL, 'viirsSST.nc')

then reading in the data from the netcdf file:

In [ ]:
viirsChlaFile = Dataset('viirsSST.nc')
viirsChla = viirsChlaFile.variables['chla'][:,:,:,:]
latitude = viirsChlaFile.variables['latitude'][:]
longitude = viirsChlaFile.variables['longitude'][:]
viirsChlaFile.close()
viirsChla = np.squeeze(viirsChla)

then plotting the data:

In [19]:
plot_map(np.log(viirsChla), latitude, longitude, cmocean.cm.algae)

Temperature at 70m in the north Pacific from the SODA model output

This is an example of an extract from a 4-D dataset (results from the "Simple Ocean Data Assimilation (SODA)" model - - see http://www.atmos.umd.edu/~ocean/), and illustrate the case where the z-coordinate does not have the default name "altitude". Water temperature at 70m depth is extracted for the North Pacific Ocean. As in the other examples, first we download the desired data from the ERDDAP server:

In [22]:
soda70URL = 'http://coastwatch.pfeg.noaa.gov/erddap/griddap/hawaii_d90f_20ee_c4cb.nc?temp[(2010-12-15):1:(2010-12-15)][(70.02):1:(70.02)][(20.25):1:(60.25)][(135.25):1:(240.25)]'
result = getURL(soda70URL, 'soda70.nc')

then read in the data from the netcdf file:

In [ ]:
soda70File = Dataset('soda70.nc')
soda70Temp = soda70File.variables['temp'][:,:,:,:]
latitude = soda70File.variables['latitude'][:]
longitude = soda70File.variables['longitude'][:]
soda70File.close()
soda70Temp = np.squeeze(soda70Temp)

and then plot the result:

In [24]:
plot_map(soda70Temp, latitude, longitude, cmocean.cm.thermal)

IFREMER

The French agency IFREMER also has an ERDDAP server. We obtain salinity data at 75 meters from "Global Ocean, Coriolis Observation Re-Analysis CORA4.1" model off the west coast of the United States. Download the data:

In [25]:
ifrURL = 'http://www.ifremer.fr/erddap/griddap/ifremer_tds0_6080_109e_ed80.nc?PSAL[(2013-05-15):1:(2013-05-15)][(75.):1:(75.)][(30.):1:(50.)][(-140.0):1:(-110.)]'
result = getURL(ifrURL, 'ifr.nc')

read in the data from the netcdf file:

In [ ]:
soda70File = Dataset('soda70.nc')
soda70Temp = soda70File.variables['temp'][:,:,:,:]
latitude = soda70File.variables['latitude'][:]
longitude = soda70File.variables['longitude'][:]
soda70File.close()
soda70Temp = np.squeeze(soda70Temp)

and plot the result:

In [26]:
plot_map(ifrPSAL, latitude, longitude, cmocean.cm.haline)

CalCOFI data

CalCOFI (California Cooperative Oceanic Fisheries Investigations - http://www.calcofi.org) is a multi-agency partnership formed in 1949 to investigate the collapse of the sardine population off California. The organization's members are from NOAA Fisheries Service, Scripps Institution of Oceanography, and California Department of Fish and Wildlife. The scope of this research has evolved into the study of marine ecosystems off California and the management of its fisheries resources. The nearly complete CalCOFI data, both physical and biological, are available through ERDDAP.

The following retrieves the data that are used in a script developed by Dr. Andrew Leising of the Southwest Fisheries Science Center. The original script has been used to automate the generation of several yearly reports about the California Current Ecosystem. The script gets chlorophyll and primary productivity data from the hydrocasts, and then calculates a seasonally adjusted chlorophyll anomaly as well as a seasonally adjusted primary productivity. The full modified script can be seen in the R notebook. First the data are accessed as a .csv file using Pandas:

In [7]:
ccsURL = 'http://coastwatch.pfeg.noaa.gov/erddap/tabledap/siocalcofiHydroCasts.csv?year,month,intchl,intc14&time>=1984-01-01T00:00:00Z&time<=2014-04-17T05:35:00Z'
junk = pd.read_csv(ccsURL, parse_dates=True,  skiprows=[1])

then the data are grouped by month and year, averaged over those groups, and plotted:

In [8]:
junk1 = junk.groupby(['year', 'month']).mean()
myIndex = np.array(junk1.index.labels[0])
years = np.array(junk1.index.levels[0])
years = years[myIndex]
myIndex = np.array(junk1.index.labels[1])
months = np.array(junk1.index.levels[1])
months = months[myIndex]
df = pd.DataFrame({'year':years, 'month': months, 'day':1})
myTimes = pd.to_datetime(df)
intch1 = np.array(junk1['intchl'])
intch1 = pd.Series(np.log(intch1), index=myTimes)
intch1.plot()
plt.show()

CPS Trawl Surveys

The CPS (Coastal Pelagic Species) Trawl Life History Length Frequency Data contains the length distribution of a subset of individuals from a species (mainly non-target) caught during SWFSC-FRD fishery independent trawl surveys of coastal pelagic species. Measured lengths for indicated length type (fork, standard, total, or mantle) were grouped in 10 mm bins (identified by the midpoint of the length class) and counts are recorded by sex.

We will look at the number and location of sardines (Sardinops sagax) in the tows in March 2010 and 2011, and compare with monthly SST from satellites. First we extract the sardine data using Pandas:

In [5]:
cpsURL2010 = 'http://coastwatch.pfeg.noaa.gov/erddap/tabledap/FRDCPSTrawlLHHaulCatch.csv?latitude%2Clongitude%2Ctime%2Cscientific_name%2Csubsample_count&time%3E=2010-04-01&time%3C=2010-04-30&scientific_name=%22Sardinops%20sagax%22'
cpsURL2011 = 'http://coastwatch.pfeg.noaa.gov/erddap/tabledap/FRDCPSTrawlLHHaulCatch.csv?latitude%2Clongitude%2Ctime%2Cscientific_name%2Csubsample_count&time%3E=2011-04-01&time%3C=2011-04-30&scientific_name=%22Sardinops%20sagax%22'
cps2010 = pd.read_csv(cpsURL2010, parse_dates=True, skiprows=[1])
cps2011 = pd.read_csv(cpsURL2011, parse_dates=True, skiprows=[1])

then we extract the monthly MODIS SST data and read in the data from the netcdf files:

In [ ]:
sst201004URL = 'http://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWsstdmday.nc?sst[(2010-04-16T12:00:00Z):1:(2010-04-16T12:00:00Z)][(0.0):1:(0.0)][(22.0):1:(51.0)][(220.0):1:(255.0)]'
sst201104URL = 'http://coastwatch.pfeg.noaa.gov/erddap/griddap/erdMWsstdmday.nc?sst[(2011-04-16T12:00:00Z):1:(2011-04-16T12:00:00Z)][(0.0):1:(0.0)][(22.0):1:(51.0)][(220.0):1:(255.0)]'
result = getURL(sst201004URL, 'sst201004.nc')
sst201004File = Dataset('sst201004.nc')
sst201004 = sst201004File.variables['sst'][:,:,:,:]
latitude = sst201004File.variables['latitude'][:]
longitude = sst201004File.variables['longitude'][:]
sst201004File.close()
sst201004 = np.squeeze(sst201004)
result = getURL(sst201104URL, 'sst201104.nc')
sst201104File = Dataset('sst201104.nc')
sst201104 = sst201104File.variables['sst'][:,:,:,:]
sst201104File.close()
sst201104 = np.squeeze(sst201104)

and form the plot of the counts over the SST for each year:

In [8]:
southlat = 22.
northlat = 51.
westlon = -130
eastlon = -110
latdif = abs(northlat - southlat)
londif = abs(westlon - eastlon)
westedge = westlon - londif * 0.1
eastedge = eastlon + londif * 0.1
edgedif = abs(westedge-eastedge)
northedge = northlat - latdif * 0.1
southedge = southlat - latdif * 0.1
lats = np.array(cps2010['latitude'])
lons = np.array(cps2010['longitude'])
subsample_count = np.array(cps2010['subsample_count'])
# 2010 Plot
plt.figure(figsize=(12,8))
map = Basemap(projection = 'merc', lat_0 = (southlat + northlat)/2, (lon_0) = (westlon + eastlon)/2,
    resolution = 'h', area_thresh = 0.1,
    llcrnrlon = westedge, llcrnrlat = southedge,
    urcrnrlon = eastedge, urcrnrlat = northedge)
mylons,  mylats = np.meshgrid(longitude-360, latitude)
x, y = map(lons, lats)
map.drawcoastlines()
map.drawcountries()
im1 = map.fillcontinents(color = '0.3')
im2 = map.drawmapboundary()
im3 = map.drawmeridians(np.arange((westedge - np.mod(westedge, 5)), (eastedge + np.mod(eastedge, 5)),5),
    labels=[True, False, False, True])
im4 = map.drawparallels(np.arange((southedge - np.mod(southedge, 5)), (northedge + np.mod(northedge, 5)),5),
    labels=[True, False, False, True])
im5 = map.pcolormesh(mylons, mylats, sst201004, shading = 'flat',cmap = cmocean.cm.thermal, latlon = True)
im6 = map.scatter(x,y, marker='o', c=subsample_count)
plt.show()
# 2011 Plot
lats = np.array(cps2011['latitude'])
lons = np.array(cps2011['longitude'])
subsample_count = np.array(cps2011['subsample_count'])
plt.figure(figsize=(12,8))
map = Basemap(projection = 'merc', lat_0 = (southlat + northlat)/2, (lon_0) = (westlon + eastlon)/2,
    resolution = 'h', area_thresh = 0.1,
    llcrnrlon = westedge, llcrnrlat = southedge,
    urcrnrlon = eastedge, urcrnrlat = northedge)
x, y = map(lons, lats)
map.drawcoastlines()
map.drawcountries()
im1 = map.fillcontinents(color = '0.3')
im2 = map.drawmapboundary()
im3 = map.drawmeridians(np.arange((westedge - np.mod(westedge, 5)), (eastedge + np.mod(eastedge, 5)),5),
    labels=[True, False, False, True])
im4 = map.drawparallels(np.arange((southedge - np.mod(southedge, 5)), (northedge + np.mod(northedge, 5)),5),
    labels=[True, False, False, True])
im5 = map.pcolormesh(mylons, mylats, sst201104, shading = 'flat',cmap = cmocean.cm.thermal, latlon = True)
im6 = map.scatter(x,y, marker='o', c=subsample_count)
plt.show()

We can also look at the distribution of sardines through the years. First get the data:

In [9]:
sardinesURL = 'http://coastwatch.pfeg.noaa.gov/erddap/tabledap/FRDCPSTrawlLHHaulCatch.csv?latitude%2Clongitude%2Ctime&scientific_name=%22Sardinops%20sagax%22'
sardinesData = pd.read_csv(sardinesURL, parse_dates=True, skiprows=[1])

then plot the result with a different color for each year:

In [10]:
southlat = 22.
northlat = 51.
westlon = -130
eastlon = -110
latdif = abs(northlat - southlat)
londif = abs(westlon - eastlon)
westedge = westlon - londif * 0.1
eastedge = eastlon + londif * 0.1
edgedif = abs(westedge-eastedge)
northedge = northlat - latdif * 0.1
southedge = southlat - latdif * 0.1
lats = np.array(sardinesData['latitude'])
lons = np.array(sardinesData['longitude'])
sardineYear = pd.to_datetime(np.array(sardinesData['time']))
plt.figure(figsize=(12,8))
map = Basemap(projection = 'merc', lat_0 = (southlat + northlat)/2, (lon_0) = (westlon + eastlon)/2,
    resolution = 'h', area_thresh = 0.1,
    llcrnrlon = westedge, llcrnrlat = southedge,
    urcrnrlon = eastedge, urcrnrlat = northedge)
x, y = map(lons, lats)
map.drawcoastlines()
map.drawcountries()
im1 = map.fillcontinents(color = '0.3')
im2 = map.drawmapboundary()
im3 = map.drawmeridians(np.arange((westedge - np.mod(westedge, 5)), (eastedge + np.mod(eastedge, 5)),5),
    labels=[True, False, False, True])
im4 = map.drawparallels(np.arange((southedge - np.mod(southedge, 5)), (northedge + np.mod(northedge, 5)),5),
    labels=[True, False, False, True])
im5 = map.scatter(x,y, marker='o', c=sardineYear)
plt.show()

NDBC Buoys

NOAA's National Data Buoy Center (NDBC) collects world-wide data from buoys in the ocean. ERDDAP can be searched for the location of all buoys in a bounding box with latitudes(37N, 47N) and longitudes (124W, 121W):

In [11]:
ndbcURL = 'http://coastwatch.pfeg.noaa.gov/erddap/tabledap/cwwcNDBCMet.csv?station,longitude,latitude&longitude>=-124&longitude<=-121&latitude>=37&latitude<=47'
junk = pd.read_csv(ndbcURL, skiprows=[1])

and the results plotted:

In [12]:
lats = np.array(junk['latitude'])
lons = np.array(junk['longitude'])
stations = np.array(junk['station'])
southlat = 37
northlat = 47.
westlon = -130
eastlon = -110
latdif = abs(northlat - southlat)
londif = abs(westlon - eastlon)
westedge = westlon - londif * 0.1
eastedge = eastlon + londif * 0.1
edgedif = abs(westedge-eastedge)
northedge = northlat - latdif * 0.1
southedge = southlat - latdif * 0.1
plt.figure(figsize=(12,8))
map = Basemap(projection = 'merc', lat_0 = (southlat + northlat)/2, (lon_0) = (westlon + eastlon)/2,
    resolution = 'h', area_thresh = 0.1,
    llcrnrlon = westedge, llcrnrlat = southedge,
    urcrnrlon = eastedge, urcrnrlat = northedge)
x, y = map(lons, lats)
map.drawcoastlines()
map.drawcountries()
im1 = map.fillcontinents(color = '0.3')
im2 = map.drawmapboundary()
im3 = map.drawmeridians(np.arange((westedge - np.mod(westedge, 5)), (eastedge + np.mod(eastedge, 5)),5),
    labels=[True, False, False, True])
im4 = map.drawparallels(np.arange((southedge - np.mod(southedge, 5)), (northedge + np.mod(northedge, 5)),5),
    labels=[True, False, False, True])
im5 = map.scatter(x,y, marker='o', color='r')
plt.show()

Looking at wind speed for 2012 for station "46012":

In [50]:
ndbcURL1 = 'http://coastwatch.pfeg.noaa.gov/erddap/tabledap/cwwcNDBCMet.csv?time,wspd&station="46012"&time>=2012-01-01&time<=2013-01-01'
junk = pd.read_csv(ndbcURL1, parse_dates=True, index_col='time', skiprows=[1])
junk['wspd'].plot()
plt.show()

IOOS Glider Data

The mission of the IOOS Glider DAC is to provide glider operators with a simple process for submitting glider data sets to a centralized location, enabling the data to be visualized, analyzed, widely distributed via existing web services and the Global Telecommunications System (GTS) and archived at the National Centers for Environmental Information (NCEI). The IOOS Glider Dac is accessible through http://data.ioos.us/gliders/erddap/. Extracting and salinity from part of the path of one glider deployed by the Scripps Institution of Oceanography:

In [13]:
gliderURL = 'https://data.ioos.us/gliders/erddap/tabledap/sp064-20161214T1913.csv?time,latitude,longitude,depth,salinity&time>=2016-12-14&time<=2016-12-23'
junk = pd.read_csv(gliderURL, parse_dates=True, index_col='time', skiprows=[1])

and making a 3-D plot of the result:

In [14]:
fig = plt.figure(figsize=(12,8))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(junk['longitude'],junk['latitude'], -junk['depth'], c=junk['salinity'],  marker='o', cmap = cmocean.cm.haline)
plt.show()
In [ ]: