Defining a Data Model
Modelling entities, relationships, signals and time series in your data
Exabel allows you to create flexible data models to represent your data, and subsequently query, transform and analyze the data to derive investment insights.
You can create new entities (eg brands that belong to a company) and relationships (eg ownership by a company of brands), and link these to the existing company entities already provided by Exabel. Then, you can import time series data on any entity in your data model, and use our signal DSL to query and transform this data.
Example - why is this important?
Gap Inc. owns several brands, including Banana Republic. Banana Republic has stores across many countries. You have data at a very granular level, showing Gap's sales performance by brand and also by country.
An analyst may want flexibility to deep-dive into Gap at this granular level (Banana Republic in the USA), but equally also want to roll the lower-level metrics up to understand Gap's overall performance.
A well-defined data model will allow the analyst a lot of flexibility to:
- Query and analyze data on a granular level
- Systematically aggregate data up from a lower entity-level to a higher one
Best practice: data models
Data models do not have to be complex! In many cases, simple company-level data models will suffice.
Aim to ultimately connect entities to the
company
entity type, in order to enable aggregation up to company-level.Investment users (analysts / PMs) should define up-front the granularity of analysis desired. This will help your data engineering team design the data model and ensure that the right data is imported.
Before reviewing the example data models below, it is important to understand these key concepts:
Example #1: company-level data
The simplest data model, applicable if your data exists at company-level, or you choose to pre-aggregate as such:
Notes
- A single
sales
signal is defined for company entities, and encapsulates multiple time series - one for each company company
is a global entity type that is provided by the Exabel platform- Company entities are pre-loaded on the Exabel platform, so you don't have to import these
Steps
- Import time series data for each company, specifying
sales
as the signal. Whether you are using the file uploader or Exabel SDK, you will be able to specify the signal that each time series is tied to.
Example #2: companies with child entities
Another common data model has lower-level child entities that belong to each company. This example has segment child entities - these might be business or geographical segments, depending on your data:
Notes
- You can link each company to multiple segment entities.
- We create segments as child entities of companies, because each segment can only be owned by 1 company
- We define the
sales
signal are imported at both segment- and company-level, and import time series at both levels. Alternatively, you could choose to define the signal and import data only at segment-level, and aggregate to company-level dynamically in the Exabel platform. - The
HAS_SEGMENT
relationship type is set to be an ownership relationship. This denotes that the from-entity (company
) "owns" the to-entity (segment
). In most cases, you will be creating ownership relationships.
Steps
You will need to use the Exabel SDK in order to import entities, relationships, and non-company signals.
- Create the
segment
entity type in your namespace, as this is not a pre-defined global entity type.
For now, you will need to ask Exabel to create these for you. - Import all segment entities that exist in your data, with the
segment
entity type. - Import relationships connecting each segment to a company, using the
HAS_SEGMENT
relationship type. - Import time series data for each brand and each company, specifying
sales
as the signal.
Example #3: multiple top-level entities
Your data may have additional dimensions that you want to model, beyond the standard company-level entities.
In this example, we have an alternative data set for company job postings, segmented by both company and occupation (eg "sales jobs"):
Notes
- We model
occupation
as a top-level entity type, because the data set defines a limited number of standardized occupations that are seen across all companies. - We create a
company_and_occupation
associative entity type that is owned by both acompany
andoccupation
. - Example entities in this model might be:
occupation
:sales
,engineering
company
: Apple, Microsoftcompany_and_occupation
:apple_sales
,apple_engineering
,microsoft_sales
,microsoft_engineering
- There are 3 signals defined for the
company_and_occupation
entity:jobs_created
,jobs_deleted
, andjobs_active
.- The raw signals can therefore be retrieved for a given
company_and_occupation
entity. - You may also create signals that aggregate these up to company-level or occupation-level, by using the signal DSL.
- The raw signals can therefore be retrieved for a given
- The
jobs_active_duration
signal is defined for all 3 entity types. This means that you must import time series data for this signal, for entities across all 3 entity types.
Steps
You will need to use the Exabel SDK in order to import entities, relationships, and non-company signals.
- Create the
occupation
andcompany_and_occupation
entity types in your namespace, as these are not pre-defined global entity types. Definecompany_and_occupation
as an associative entity type.
For now, you will need to ask Exabel to create these for you. - Import all
occupation
andcompany_and_occupation
entities that exist in your data. - Import relationships connecting each
company_and_occupation
to the appropriatecompany
andoccupation
, using the corresponding relationship type. - Import time series data for the
jobs_created
,jobs_deleted
, andjobs_active
signals, for thecompany_and_occupation
entities. - Import time series data for the
jobs_active_duration
signal, for thecompany_and_occupation
,company
, andoccupation
entities.
Updated 10 months ago