Easy, scriptable way to sub-sample unstructured THREDDS data?

I am trying to get a subset of data from a triangular mesh model that is served by THREDDS. I would like to be able to specify the LAT / LON bounding box and just get the data from this field. Data URL:

http://www.smast.umassd.edu:8080/thredds/dodsC/FVCOM/NECOFS/Forecasts/NECOFS_MET_FORECAST.nc

With mesh data, it is fairly easy to multiply data from a THREDDS server. Does anyone know the best way is to get a subdomain of a triangular grid that is served by THREDDS?

For data with a grid, I use Ferret as my OPeNDAP client, and I can script to load the process. I would like to do something similar here, although I could use Matlab, Python, or other tools.

Thanks,

Steve

+5
source share
2 answers

Opendap and NetCDF do not allow extraction using irregular indexing. You can request only the start, stop and step.

And since this is a triangular grid, there is no guarantee that the nodes of the triangles in the same region have the same indices. Therefore, if you want to get only those nodes in the bounding box, you will have to request them one by one. And it’s slow. Therefore, in many cases, it is faster to determine the minimum and maximum indices and query the entire fragment in one part, and then extract the indices as necessary.

Here is a comparison of two approaches in python. In this example, extracting a subset that includes all indices is about 10 times faster than looping through each index, extracting time series:

import netCDF4 import time import numpy as np url='http://www.smast.umassd.edu:8080/thredds/dodsC/FVCOM/NECOFS/Forecasts/NECOFS_GOM3_FORECAST.nc' nc = netCDF4.Dataset(url) ncv = nc.variables lon = ncv['lon'][:] lat = ncv['lat'][:] tim = ncv['time'][:] # find indices inside box box = [-71.4,41,-70.2,41.5] ii = (lon>=box[0])&(lon<=box[2])&(lat>=box[1])&(lat<=box[3]) # jj will have just indices from inside the box: jj = np.where(ii)[0] 

If we iterate over each index, it is slow:

 # loop over indices, extracting time series time0=time.time() zi = np.zeros((len(tim),len(jj))) k=0 for j in jj: zi[:,k]=ncv['zeta'][:,j] k +=1 print('elapsed time: %d seconds' % (time.time()-time0)) elapsed time: 56 seconds 

But if we iterate over the range at each time step, it is much faster:

 time0=time.time() zi2 = np.zeros((len(tim),len(jj))) jmin=jj.min() jmax=jj.max() for i in range(len(tim)): ztmp = ncv['zeta'][i,jmin:jmax+1] zi2[i,:] = ztmp[jj-jmin] print('elapsed time: %d seconds' % (time.time()-time0)) elapsed time: 6 seconds 

Of course, the results may vary depending on the size of the unstructured grid, the proximity of the points in your subset, the number of points you earn, etc. But hopefully this gives you an idea of ​​the issues involved.

+4
source

A rich answer provides the best way to do this if you have really tight performance limits. But, according to him, the results may vary depending on the grid area and the subset.

If you are only interested in one time (no pun intended), there is a marginal cost and much simpler code to simply capture the entire spatial domain as it is from the dap server:

 time0 = time.time() zi3 = ncv['zeta'][0, :] zi3 = zi3[jj] print('elapsed time: %d seconds' % (time.time() - time0)) elapsed time: 0 seconds 

When profiling (even with preliminary distribution of arrays), the coarse subset min / max that Rich does is about 2x as if quickly, though, given this grid and the specific spatial subset. The NECOFS grid is relatively small in the world of unstructured grids. For example, you may have problems with Atlantic-wide ADCIRC grids.

POSTSCRIPT: you grab the entire grid when you get lat and lon to determine the subset indices.

+2
source

Source: https://habr.com/ru/post/1206361/


All Articles