Extract data

Functions to extract particular data from a signal.

Select an interval of a signal

`signal.loc`

Retrieve a single data point or a range of data points in an interval based on timestamps. The
input dates are provided in square brackets.

If the input is a single date, e.g. signal.loc['2023-12-31'], the result is a single scalar
value (i.e. without the timestamp). This can, for example, be used to normalize a time series to
be 1 at a specific point in time, by writing:

signal / signal.loc['2000-01-01']

If the input is an interval, e.g. signal.loc['2024-01-01':'2024-12-31'], the result is a time
series consisting of the values in the given interval. Both endpoints are inclusive.

`signal.iloc`

Retrieve a single data point or a range of data points based on integer indexes.

The input indexes are provided in square brackets.

If the input is a single integer, e.g. signal.iloc[5], the result is a single scalar value
(i.e. without the timestamp).

If the input is an interval, e.g. signal.iloc[10:20], the result is a time series consisting
of the values with the given indexes. In this case, the first endpoint is inclusive and last one
is exclusive.

The index is 0-based, so iloc[0] refers to the first data point in the time series, iloc[1]
is the second and so on. If a negative integer is provided, the data points are counted from the
end, with iloc[-1] referring to the last data point, iloc[-2] to the penultimate and so
on.

It is also possible to specify a step as a third argument in order to return every nth data
point, e.g. signal.iloc[10:20:4].

Examples

Select the first data point as a scalar value:

signal.iloc[0]

Select the last data point as a scalar value:

signal.iloc[-1]

Select all but the first and last data points:

signal.iloc[1:-1]

Select every other data point, including the first one:

signal.iloc[::2]

`signal.at_time()`

signal.at_time(time: str)

Return a time series which consists only of the single data point at the requested time.

Parameters:
time – A string specifying a timestamp. This can either be a specific date such as 2024-05-31,
a fiscal period. See Time arguments.

`signal.at()`

signal.at(time: str)

Return the value at the given time as a scalar value.

Since this transformation returns a scalar value, it cannot be plotted in Plotter,
but it can be included in calculations involving other time series. For example:

signal / signal.at('2024-01-01')

scales the time series down so that it has the value 1.0 at 1 Jan 2024.

Parameters:
time – A string specifying a timestamp. This can either be a specific date such as 2024-05-31,
a fiscal period. See Time arguments.

Select time series from multi-time-series signal

Some signals return multiple time series and you may only be interested in some of them. This is
relevant for entity-independent signals that return multiple time series, and for entity-dependent
signals that return multiple time series per entity. In these cases the different time series have
different names, and these can be used to select the time series you are interested in.

Exact match

You can select time series by specifying the full name of the time series inside square brackets.
Multiple columns are selected by providing a tuple of strings. The selection is case insensitive.

Examples

Select the beta column of the underlying signal:

my_signal['beta']

Select both the alpha and beta columns:

my_signal[('alpha', 'beta')]

Substring and regex filtering

Alternatively you can select columns with the filter_columns method, which matches columns based
on a substring or regex search.

`signal.filter_columns()`

signal.filter_columns(pattern: str, *, case: bool = True, regex: bool = True)

Parameters:
- pattern – The pattern to search for.
- case – Whether the search is case sensitive.
- regex – Whether the pattern is treated as a regular expression

Examples

Select all columns that contain the substring beta:

my_signal.filter_columns('beta')

Select all columns that contain the substring beta regardless of case (e.g. BETA or BeTa):

my_signal.filter_columns('beta', case=False)

Select all columns that contain either alpha or beta:

my_signal.filter_columns('alpha|beta')

Select all columns that contain the character | (since it has a special meaning in regular
expressions, regex search must be disabled):

my_signal.filter_columns('|', regex=False)

📘
Note
If you are working with data obtained by traversing relationships in the entity graph, you should
use the graph filtering functionality if possible, rather than the bracket
syntax. This avoids evaluating the underlying signal for entities you are not interested in. If
you use square brackets or the filter_columns operation, the filtering happens after the
underlying signal is evaluated.

Top and bottom n columns

If there are many columns, it can be useful to select the top or bottom n columns based on
either the last value or the aggregation of the values across the evaluation period.

`signal.top_n()`

signal.top_n(n: int, func: str = 'last', *, show_other=False)

Selects the top n columns for each evaluation entity.

The func argument specifies how the ranking value is calculated from each column and
can be any of:

last
max
min
mean
median

When using last, we first find the last date where any column has data, and then
pick the value from each column on that date. This means that columns which do not have data on that
date will be left out.

Note that which columns are chosen, depends on the evaluation time period. Signal transformations may extend or modify
the evaluation period, so using this signal as an underlying signal in other transformations may lead to
unexpected results. Also, different parts of the app may evaluate signals with different time periods.

Parameters:
- n – The number of columns to return.
- func – The function to use when selecting the ranking value from each column.
- show_other – Whether to include an 'Other' time series containing the sum of all the excluded columns when
  there are more than n+1 columns. If True and there are exactly n+1 columns, we
  include all the columns rather than aggregating a single column as the 'Other' column.

`signal.bottom_n()`

signal.bottom_n(n: int, func: str = 'last', *, show_other=False)

Select the bottom n columns for each evaluation entity.

See the signal.top_n(..) method for more information.

Parameters:
- n – The number of columns to return.
- func – The function to use when selecting the ranking value from each column.
- show_other – Whether to include an 'Other' time series containing the sum of all the excluded columns when
  there are more than n+1 columns. If True and there are exactly n+1 columns, we
  include all the columns rather than aggregating a single column as the 'Other' column.

Example

Show the sales of the top 3 brands, along with the rest aggregated into an 'Other' time series:

data('ns.sales').for_type('ns.brand').top_n(3, 'last', show_other=True)