Extract data
Functions to extract particular data from a signal.
Select an interval of a signal
signal.loc
signal.locRetrieve a single data point or a range of data points in an interval based on timestamps. The input dates are provided in square brackets.
If the input is a single date, e.g. signal.loc['2023-12-31'], the result is a single scalar
value (i.e. without the timestamp). This can, for example, be used to normalize a time series to
be 1 at a specific point in time, by writing:
signal / signal.loc['2000-01-01']If the input is an interval, e.g. signal.loc['2024-01-01':'2024-12-31'], the result is a time
series consisting of the values in the given interval. Both endpoints are inclusive.
signal.iloc
signal.ilocRetrieve a single data point or a range of data points based on integer indexes.
The input indexes are provided in square brackets.
If the input is a single integer, e.g. signal.iloc[5], the result is a single scalar value
(i.e. without the timestamp).
If the input is an interval, e.g. signal.iloc[10:20], the result is a time series consisting
of the values with the given indexes. In this case, the first endpoint is inclusive and last one
is exclusive.
The index is 0-based, so iloc[0] refers to the first data point in the time series, iloc[1]
is the second and so on. If a negative integer is provided, the data points are counted from the
end, with iloc[-1] referring to the last data point, iloc[-2] to the penultimate and so
on.
It is also possible to specify a step as a third argument in order to return every nth data
point, e.g. signal.iloc[10:20:4].
Examples
Select the first data point as a scalar value:
signal.iloc[0]Select the last data point as a scalar value:
signal.iloc[-1]Select all but the first and last data points:
signal.iloc[1:-1]Select every other data point, including the first one:
signal.iloc[::2]signal.at_time()
signal.at_time()signal.at_time(time: str)Return a time series which consists only of the single data point at the requested time.
- Parameters:
time– A string specifying a timestamp. This can either be a specific date such as2024-05-31, a fiscal period. See Time arguments.
signal.at()
signal.at()signal.at(time: str)Return the value at the given time as a scalar value.
Since this transformation returns a scalar value, it cannot be plotted in Plotter, but it can be included in calculations involving other time series. For example:
signal / signal.at('2024-01-01')scales the time series down so that it has the value 1.0 at 1 Jan 2024.
- Parameters:
time– A string specifying a timestamp. This can either be a specific date such as2024-05-31, a fiscal period. See Time arguments.
Select time series from multi-time-series signal
Some signals return multiple time series and you may only be interested in some of them. This is relevant for entity-independent signals that return multiple time series, and for entity-dependent signals that return multiple time series per entity. In these cases the different time series have different names, and these can be used to select the time series you are interested in.
Exact match
You can select time series by specifying the full name of the time series inside square brackets. Multiple columns are selected by providing a tuple of strings. The selection is case insensitive.
Examples
Select the beta column of the underlying signal:
my_signal['beta']Select both the alpha and beta columns:
my_signal[('alpha', 'beta')]Substring and regex filtering
Alternatively you can select columns with the filter_columns method, which matches columns based
on a substring or regex search.
signal.filter_columns()
signal.filter_columns()signal.filter_columns(pattern: str, *, case: bool = True, regex: bool = True)- Parameters:
pattern– The pattern to search for.case– Whether the search is case sensitive.regex– Whether the pattern is treated as a regular expression
Examples
Select all columns that contain the substring beta:
my_signal.filter_columns('beta')Select all columns that contain the substring beta regardless of case (e.g. BETA or BeTa):
my_signal.filter_columns('beta', case=False)Select all columns that contain either alpha or beta:
my_signal.filter_columns('alpha|beta')Select all columns that contain the character | (since it has a special meaning in regular expressions, regex search must be disabled):
my_signal.filter_columns('|', regex=False)
NoteIf you are working with data obtained by traversing relationships in the entity graph, you should use the graph filtering functionality if possible, rather than the bracket syntax. This avoids evaluating the underlying signal for entities you are not interested in. If you use square brackets or the
filter_columnsoperation, the filtering happens after the underlying signal is evaluated.
Top and bottom n columns
If there are many columns, it can be useful to select the top or bottom n columns based on
either the last value or the aggregation of the values across the evaluation period.
signal.top_n()
signal.top_n()signal.top_n(n: int, func: str = 'last', *, show_other=False)Selects the top n columns for each evaluation entity.
The func argument specifies how the ranking value is calculated from each column and
can be any of:
lastmaxminmeanmedian
When using last, we first find the last date where any column has data, and then
pick the value from each column on that date. This means that columns which do not have data on that
date will be left out.
Note that which columns are chosen, depends on the evaluation time period. Signal transformations may extend or modify the evaluation period, so using this signal as an underlying signal in other transformations may lead to unexpected results. Also, different parts of the app may evaluate signals with different time periods.
- Parameters:
n– The number of columns to return.func– The function to use when selecting the ranking value from each column.show_other– Whether to include an'Other'time series containing the sum of all the excluded columns when there are more thann+1columns. IfTrueand there are exactlyn+1columns, we include all the columns rather than aggregating a single column as the'Other'column.
signal.bottom_n()
signal.bottom_n()signal.bottom_n(n: int, func: str = 'last', *, show_other=False)Select the bottom n columns for each evaluation entity.
See the signal.top_n(..) method for more information.
- Parameters:
n– The number of columns to return.func– The function to use when selecting the ranking value from each column.show_other– Whether to include an'Other'time series containing the sum of all the excluded columns when there are more thann+1columns. IfTrueand there are exactlyn+1columns, we include all the columns rather than aggregating a single column as the'Other'column.
Example
Show the sales of the top 3 brands, along with the rest aggregated into an 'Other' time series:
data('ns.sales').for_type('ns.brand').top_n(3, 'last', show_other=True)Updated about 9 hours ago