Help: availability v.1

Description

The ph5ws-availability web service returns detailed time span information of what timeseries data is available at the DMC archive.

There are two service query methods:

/extent

Produces lists of available time extents (earliest to latest) for selected channels (network, station, location and quality) and time ranges.

/query

Produces lists of contiguous time spans for selected channels (network, station, location, channel and quality) and time ranges.

Help Contents

Sample queries

/extent Sample queries

Extent information for all network YW, station 1002 channels in text format (default)
http://service.iris.edu/ph5ws/availability/1/extent?network=YW&station=1002

Extent information for all network YW, station 1002 channels in text format (default) in a given time range
http://service.iris.edu/ph5ws/availability/1/extent?network=YW&station=1002&start=2016-06-21T16:59:08&end=2016-06-21T17:29:08

Extent information for all network YW, station 1002 channels in JSON format
http://service.iris.edu/ph5ws/availability/1/extent?network=YW&station=1002&format=json

Extent information for all network YW, sorted by number of time-spans descending, limited to 100 rows
http://service.iris.edu/ph5ws/availability/1/extent?network=YW&orderby=timespancount_desc&limit=100

Any channel that has more that has more than 1,000,000 timespans cannot be processed by the /query method. This will reveal which channels, if any, cannot be processed (ie those with more than 1 million timespans)
http://service.iris.edu/ph5ws/availability/1/extent?network=*&orderby=timespancount_desc&limit=500

Extent information for all network YW, sorted by update-date, limited to 100 rows
http://service.iris.edu/ph5ws/availability/1/extent?network=YW&orderby=latestupdate&limit=100

Extent information for all network YW between two dates with qualities and sample rates merged.
http://service.iris.edu/ph5ws/availability/1/extent?network=YW&start=2016-06-21&end=2016-06-22&merge=quality,samplerate

Restriction information for all network ZI in text format. Only non-restricted data displayed, so all are OPEN.
http://service.iris.edu/ph5ws/availability/1/extent?network=ZI&includerestricted=true

/query Sample queries

Demonstrations of wildcard and multiple selections via CSV (comma separated values)

All DP channels for a network
http://service.iris.edu/ph5ws/availability/1/query?start=2016-06-21&end=2016-06-22&network=YW&channel=DP?

Network Z1, stations 10006, and 10007 location — and DH2 channel
http://service.iris.edu/ph5ws/availability/1/query?start=2011-10-19T00:00:00&end=2011-10-26T23:59:00&network=Z1&location=—&station=10006,10007&channel=DH2

Note: the , (comma) and ? (question mark) characters may be displayed as %2C and %3F after you click on the previous two links.

HTTP POST queries

/extent and /query methods can be accessed via HTTP POST. All of the parameters that can be submitted with the GET method are allowed in POST.

The general form of a POST is parameter=value pairs, one per line, followed by an arbitrary number of channel and, optionally, time window selection lines:

parameter=<value>
parameter=<value>
parameter=<value>
Net Sta Loc Chan [StartTime EndTime]
Net Sta Loc Chan [StartTime EndTime]
...

Start time and end times can be specified globally, such as:

...
start=2011-10-19T18:29:16.430000
end=2011-10-19T19:29:16.430000
ZI 10006 -- DP2
ZI 10007 -- DP2
...

or per line:

...
ZI 10006 -- DP2 2011-10-19T18:29:16.430000 2011-10-19T19:29:16.430000
ZI 10007 -- DP2 2011-10-19T18:29:16.430000 2011-10-19T19:29:16.430000
...

If not given, the start and end times default to the fully available time range. Additionally, global time ranges can be mixed with individual time ranges.

Using individual time ranges per line allows for multiple time window selection. For example:

...
ZI 10006 -- DP2 2011-10-19T18:29:16.430000 2011-10-19T19:29:16.430000
ZI 10007 -- DP2 2011-10-19T18:29:16.430000 2011-10-19T19:29:16.430000
...

Example POST body:

$ cat availability.request
mergequality=true
mergesamplerate=true
format=text
ZI 10006 -- DP2 2011-10-19T18:29:16.430000 2011-10-19T19:29:16.430000
ZI 10007 -- DP2 2011-10-19T18:29:16.430000 2011-10-19T19:29:16.430000
YW 100?  -- DP? 2016-06-22T14:04:37.420000 2016-07-26T07:34:37.420000

This example contains parameters common to both /extent and /query methods.

Submitting POST request files via wget and curl

Requests can be made with a selection file using either the wget or curl Unix command line utilities. The commands below will POST the selection file to the server and save the results in a text files

$ wget --post-file=availability.request -O availability.txt http://service.iris.edu/ph5ws/availability/1/query
$ curl -L --data-binary @availability.request -o availability.txt http://service.iris.edu/ph5ws/availability/1/query
$ wget --post-file=availability.request -O extents.txt http://service.iris.edu/ph5ws/availability/1/extent
$ curl -L --data-binary @availability.request -o extents.txt http://service.iris.edu/ph5ws/availability/1/extent

We recommend always using the -L option to allow curl to follow HTTP redirections specified by our systems. The DMC uses HTTP redirection during maintenance to keep servicing requests.

When using curl, you may wish to use the -f option. This will cause curl to return an exit code of 22 if data is not found or the request is improperly formatted. See http://curl.haxx.se/docs/manpage.html for more information.

Virtual Network Support

The ph5ws-availability service supports the selection of virtual networks . The list of current virtual networks can be viewed with the IRIS DMC MetaData Aggregator. This information can be also queried with the Virtualnetwork web service.

Virtual networks contain groupings of stations from different networks. Virtual network names start is the underscore character (_) and are not limited to two characters as are regular network names. For example _GSN.

In addition to logically grouping stations, virtual networks also impose implicit time ranges on stations. Theses time ranges can vary between stations.

When virtual networks are specified in queries to the ph5ws-availability service, the implicit station level time windows are applied to availability information.

It is generally not a good idea to mix queries from different virtual networks or virtual networks and regular networks as the application of the implicit station level time windows can become quite confusing!

Example Query:
The following query shows extent information for the entire _GSN virtual network for all BHZ channels with merged qualities and sample rates: http://service.iris.edu/ph5ws/availability/1/extent?net=_GSN&cha=BHZ&merge=quality,samplerate

Restricted Data Support

A small percentage of time series data held at the IRIS DMC is restricted . To access restricted time series data requires email and password credentials (see for example Accessing restricted data).

A confusing aspect of data restriction is that it only restricts the access of time series data and not the access of meta-data such as the information returned from the ph5ws-availability service. The ph5ws-availability service can reveal the availability of any time series data whether authentication is used or not. The service can be used to determine which data requires authentication and additionally allows for determining what data is available when authenticated with a valid email and password.

Intervals of time series data can be considered to be in one of three states:

  • OPEN No intervals require authentication to access
  • RESTRICTED All intervals require authentication to access
  • PARTIAL Some, but not all, intervals require authentication to access.

By default, the ph5ws-availability service only returns information about data which is OPEN. The includerestricted=<true|false> parameter which is common to both /query and /extent methods controls whether availability information about restricted data is also returned. The default value is false, meaning only information about OPEN data is returned. If includerestricted=true is specified both restricted on non-restricted information will be returned.

Example query
http://service.iris.edu/ph5ws/availability/1/extent?network=1A&cha=DPZ&show=restriction&includerestricted=true

The /query method accepts includerestricted but does not support show=restriction. By default only OPEN timespans are returned. With includerestricted=true specified OPEN, RESTRICTED and PARTIAL time spans are returned.

Authenticated Access: /extentauth /queryauth

The ph5ws-availability service also supports /extentauth and /queryauth methods. These behave identically to the /extent and /query methods except that they require HTTP digest access authentication. The information returned by these methods reflect what the given credentials give access to.

For testing and software development purposes, the authentication credentials: {[email protected], password=anonymous} may be used. Using these credentials, information returned by the /extentauth and /queryauth methods will be identical to the non-authenticated methods /extent and /query.

Chaining requests with /extent?...format=request...

The output from the /extent method, when format=request is specified, is compatible with the POST request input to the ph5ws-dataselect web service.

This makes it useful for chaining requests from the ph5ws-availability to the ph5ws-dataselect service.

When format=request is specified, the request parameters mergesamplerate=true and mergequality=true are automatically applied and there will be one row in the response per [network,station,location,channel,timerange] tuple.

The following simple examples show how to fetch miniSEED data for all 1C/DPZ channels for the time interval 2008-01-27T03:59:55 to 2008-01-29T08:15:05 using the wget command.

Step 1 Get the availability list and save to the file 1C-DPZ.request:

$ wget -O 1C-DPZ.request "http://service.iris.edu/ph5ws/availability/1/extent?net=1C&cha=DPZ&start=2008-01-27T03:59:55.000000&end=2008-01-29T08:15:05.000000&format=request" -nv
2019-04-02 14:20:11 URL:http://service.iris.edu/ph5ws/availability/1/extent?net=1C&cha=DPZ&start=2008-01-27T03:59:55.000000&end=2008-01-29T08:15:05.000000&format=request [13987] -> "1A-DPZ.request" [1]

Inspect the first few lines of the response:

$ head 1C-DPZ.request
1C 2001 -- DPZ 2008-01-27T03:59:55.000000Z 2008-01-29T07:15:05.000000Z
1C 2002 -- DPZ 2008-01-27T03:59:55.000000Z 2008-01-29T07:15:05.000000Z
1C 2003 -- DPZ 2008-01-27T03:59:55.000000Z 2008-01-29T07:15:05.000000Z
1C 2004 -- DPZ 2008-01-27T03:59:55.000000Z 2008-01-29T07:15:05.000000Z
1C 2005 -- DPZ 2008-01-27T03:59:55.000000Z 2008-01-29T07:15:05.000000Z
1C 2006 -- DPZ 2008-01-27T03:59:55.000000Z 2008-01-29T07:15:05.000000Z

Step 2 Retrieve miniSEED from ph5ws-dataselect

$ wget -O 1C-DPZ.miniSEED --post-file=1C-DPZ.request http://service.iris.edu/ph5ws/dataselect/1/query -nv
2019-04-02 14:20:11 URL:http://service.iris.edu/ph5ws/availability/1/extent?net=1C&cha=DPZ&start=2008-01-27T03:59:55.000000&end=2008-01-29T08:15:05.000000&format=request [13987] -> "1A-DPZ.request" [1]

The file 1C-DPZ.miniseed contains the miniSEED data.

Row Sorting

The orderby parameter is useful for quickly identifying channels with large numbers of timespans and channels that have been recently been updated.

Warning on webservice performance when using row sorting

WARNING: For general availability queries, users are recommend to use the default orderby=nslc_time_quality_samplerate row sorting. The other sorting options can potentially significantly reduce webservice performance and in some circumstances result in queries which time-out.

Row sorting examples

The top 100 entries in the YW network sorted by timespan count.
http://service.iris.edu/ph5ws/availability/1/extent?network=YW&orderby=timespancount_desc&limit=100

The top 100 entries in the YW network sorted by update time. This shows the most recently updated rows.
http://service.iris.edu/ph5ws/availability/1/extent?network=YW&orderby=latestupdate_desc&limit=100

Default Sorting: orderby=nslc_time_quality_samplerate

Sorting priority order is:

  • network code
  • station code
  • location code
  • channel code
  • earliest time
  • latest time
  • quality code (if mergequality=false, default)
  • sample rate (if mergesamplerate=false, default)

With this sorting option there is, by default, no limit to the number of rows returned.

Sorting by timespan count: orderby=timespancount, orderby=timespancount_desc

Sorting priority order is:

  • number of timespans (small to large: timespancount) or (large to small timespancount_desc)
  • network code
  • station code
  • location code
  • channel code
  • earliest time
  • latest time
  • quality code (if mergequality=false, default)
  • sample rate (if mergesamplerate=false, default)

Defaults: With these sorting options, the row limit defaults to 1000 (limit=1000), and the timespan-count field is shown.

Sorting by updated time: orderby=latestupdate, orderby=latestupdate_desc

Sorting priority order is:

  • latest data update time (past to recent: latestupdate) or (recent to past latestupdate_desc)
  • network code
  • station code
  • location code
  • channel code
  • earliest time
  • latest time
  • quality code (if mergequality=false, default)
  • sample rate (if mergesamplerate=false, default)

Defaults: With these sorting options, the row limit defaults to 1000 (limit=1000), and the updated field is shown.

Timespan Merging Logic

When the cache of assembled timespans is compiled, timespans from identical network, station, location, channel, sample-rate and quality tuples are merged together where possible.

As illustrated in the following figure, timespan A can be merged with timespan B if the start of B is in the window of time shown: End-of-A + 1/2-sample-period to End-of-A + 3/2-sample-period:


.
By default, timespans from identical network, station, location, channel and sample-rate tuples but different qualities are merged together using the logic shown above. This can be disabled by setting mergequality=false. If mergesamplerate=true is chosen, the same logic shown above will be applied, with the sample-period taken from Timespan B.

In general, the distribution of timespans for a network, station, location, channel, sample-rate and quality tuple can be quite complicated. This is illustrated in this figure:

If merge=overlap is provided, timespans that overlap in time will be merged together. Also, timespans that are separated by less than 1/2 sample-period will also be merged.

The mergegaps=<seconds> option will suture together timespans that are separated by no more than the given time.

Limitations

Cache Latency

In order to be performant, the ph5ws-availability service uses a cache of assembled timespan information. The cache is derived from an internal database which tracks miniSEED data in the DMC archive. The cache used by the ph5ws-availability service sutures together time segment information recorded in the database. The cache takes over an hour to assemble and is refreshed several times per day

The vast majority of the data contained in miniSEED archive does not change between cache refreshes, however, there will always be a certain amount of disagreement between the cache used by the webservice and the archive.

Realtime Data

The ph5ws-availability service only catalogs data in the archive and not data in the realtime system (BUD). Consequently, it is generally not useful for querying data availability close to realtime. It usual takes between 4 and 26+ hours for data to be copied from the BUD into the DMC archive. Data is archived in 24 hour segments by GMT day. Consequently, data from just before the end of a GMT day is placed into the archive quicker than data just after the start of a GMT day.

Memory Limitations

/query method

A small number of channels cannot be processed by the service’s /query method due to having too many timespans to load into memory. The maximum processable limit is currently set to 1,000,000 timespans. Any channel with more timespans than this value cannot be processed by the service. Clicking on the link http://service.iris.edu/ph5ws/availability/1/extent?orderby=timespancount_desc&limit=500 will show the top 500 channels sorted by number of timespans. As can be seen, only a comparatively, small number (less than 100) cannot be processed.

/extent method

Queries to the /extent which contain no time constraints are not subject to timespan memory limitations. For time constrained queries, channels with over 500,000 time spans, calendar day availability information is used rather that detailed timespan information when calculating availability extents. The returned earliest and latest times will be based on which days data was available rather than the detailed timespans. Displayed, timespan counts for such channels will be reported as -1.

For example, for a channel with > 500,000 time spans, if a query of ...start=2015-02-01T12:34:56&end=2015-02-04T10:00:00... is given, and the channel has some data on days 2015-02-01 and 2015-02-04, then the returned earliest and latest times will exactly matching the query times: 2015-02-01T12:34:56 to 2015-02-04T10:00:00. However, if the channel does not have data during day 2015-02-01, but does have data during the next day then the returned earliest and latest extent times would be: 2015-02-02T00:00:00 to 2015-02-04T10:00:00.

Be aware that if a virtual network is selected in the query, implicit timespans are attached to stations. Thus, even if no time constraints are present in the query, time constraints may be applied.

Missing Metadata.

A small amount of timeseries miniSEED data in the lacks meta-data. Currently, the ph5ws-availability service will show this data as available. Attempting to request this data using a tool such as ph5ws-dataselect may not work because the extraction logic looks for the metadata before doing the extraction.

Latest Update-Date Inaccuracies

In some circumstances the reported latest update-dates returned from the /query method maybe later (but never earlier) than their actual values.

This is a result of how the ph5ws-availability service catalogs these dates in it’s internal cache. In the cache, latest update-dates are stored per GMT calendar day per network, station, location, channel, quality, sample-rate tuple. Because of the way in which most data is archived, this method of caching results in accurate update-dates being reported. However if the requested time segment does not cover a part of a day that was most recently loaded, the reported time may be later than its actual value.

For the majority of the data this is not an issue. It is worth emphasizing that this behavior should never result in update-dates dates being reported as earlier than their actual values, only later.

/extent Timespan Count Inaccuracies

For performance reasons, when mergy=quality and/or merge=samplerate are selected, a simplification is used when calculating timespan counts; the timespan counts from the different qualities and/or sample rates are simply added together. If there is no overlap between the different qualities and sample rates, the returned values will be accurate, but if they do overlap the values might be higher than they should be. As an extreme example, if two different qualities were selected, and the qualities have identical data, the returned timespan count would be double the actual value.


Problems with this service?

Please send an email report of which service you were using, your URL query, and any error feedback to:
[email protected]
We will address your issue as soon as possible.