Basic Concepts and Operations
Date and Timezone
- Customer Timezone: each Kubit customer should have specified a default timezone to be used when analyzing event data. Typically this the timezone used for the business calendar.
- Week: first day of the week is Monday given the seasonality pattern of mobile users (highest engagement on weekends).
- Install Days: D0 for the date of the install, D1 is the next day.
Product analytics starts with analyzing user behavior data. Instead of relying on professionals to build some SQLs or reports, Kubit offers several dedicated analytics tools called Formulas to enable everyone to get instant product insights in a Self-Service manner.
Here are some common controls appears in most formulas.
Every schema is a completely separated namespace which may contain dataset for a certain app, or a specific data model (eg star schema). Use the schema input to switch between.
Event is the action users take. Each event must have several key properties:
- Timestamp: when the event takes place
- Event Name: the name of the event
- User Id: user identifier
You can select one or more events by their Event Names from the dropdown list, or type in the text to search for a certain event.
- Option "Any" means including all events.
Filter is the constraint applied during analysis. There are two kinds of filters:
- Event Filter: filters applied to the events
- Global Filter: filters applied to the formula globally
Each filter contains three parts:
- Field: a field of the data model. eg App Version, Device Name, OS, Language
- Each field has a data type of: string, number, boolean, JSON or BigQuery Struct
- JSON or BigQuery Struct field can expose their internal data for filter or breakdown through Nested Property
- Exist operators: exists TRUE or FALSE
- String operators: =, !=, starts with, ends with, contains etc.
- Note: = and != can have multiple values
- Math operators: =, !=, >, >=, <, <=, between. Applicable to numeric and date data type
- Value(s): depends on the Field data type and selected Operator, there can be multiple values to be selected from a list of lookup values, or manually entered.
Regardless Event Filter or Group Filter, each of them supports up to two Filter Groups which are connected with a logic operator AND/OR. You can think of each Filter Group as a parenthesis.
Within the Filter Group you can specify up to three filters and connect them with AND/OR operators. Note: the evaluation order of filters within a certain Filter Group is decided by the natural order of logical operators, i.e. A or B and C = A or ( B and C ).
Groupby is a technical term for slice-and-dice, which means split the data into groups. There are two kinds of groupbys
Breakdown basically split the events into groups based on a certain field's values. Any event with the same field value are grouped into the same slice/group then measured. You can select up to two fields, including Nested Property to be used as breakdowns.
Up to 5 cohorts can be selected as groupbys which split the users into groups based on their cohorts then apply formulas to measure.
There are three controls specifying the time parameters for the formula
Specify a Start Date and End Date.
- Customer Timezone is assumed instead of browser's timezone
- Live Date: when the data is live, eg for today when only partial day data is available
- Default End Date: usually is the day before live date to reduce the confusing and cost of querying partial data
Select how the data should be bucketed in:
- Day: presented as a Date string, eg 05/09/2021
- Week (starting on Monday): presented as the Date string for the Monday of the week, eg 05/17/2021
- Month (calendar month): presented as the Date string for the first day of the month, eg 05/01/2021
- Year (calendar year): presented as the Date string for the first day of the year, eg 01/01/2021
Specify which date field to use for analytics which returns a Time Series dataset:
- Event: the default and most common choice. Use the events' timestamp to group events together.
- Install: use the users' Install Date to group events together. This is useful when computing Dn Retention.
Query is a Formula used to get Simple Measures from user events using mathematics functions like counting and aggregation (min/max/sum/avg), and compute Compound Measures using + and / operators on multiple Simple Measures.
Measure includes one Function (eg "Count Events") applied to at least one set of Events each can have its own Event Filters.
- Name: you can provide a name for each Measure so they can be easily identified on the result analysis chart. If the name is not specified, a default name like M1, M2 will be used.
- Format: you can decide how to format the Measure in the chart by selecting between #, % and $, and also specify the number of digits to keep.
- Saved Measure: you can save a measure when it is completely defined. All Saved Measures will be available in the Measure dropdown list for others to reuse.
You can use the math operator sign buttons in the lower-right corner to combine multiple Simple Measures into a Compound Measure, mostly ratios. For example:
- Listens/DAU = Count Events(Listen) / Unique Users (Login)
- Download Failure Rate = Count Events(Download Failed) / ( Count Events(Download Success) + Count Events(Download Failed))
Each Function can be applied to multiple groups of Event(s) each may have different filters.
- Count Events: count number of events
- Unique Users: count unique users with distinct User Id
- Unique Values: count unique number of values for a certain Event Property
- Min/Max/Sum/Average: aggregation functions on a certain Event Property
After specifying Measure(s) with Function and Event(s), you can also provide Global Filters (applicable to all Measures in the Query) and Groupbys (Breakdown or Cohort), then specify the Time criteria to execute the Query to get the analysis presented as a chart.
- Chart Type: you can toggle between different chart types, including Line, Bar, Stacked Bar and Stacked Percentage Area (last two only available when Groupbys are specified).
- Data points: mouse over a certain date to see the values for each group on that day
- Moving Average: display 7-Day Moving Average (MA7) line
- Hide/Show Certain Groups: the float number displayed in the dropdown list is the Pearson Correlation Value between this group and the total lines.
- Export to CSV or Jupyter Notebook: from the context menu on the top-right corner of the chart
Funnel is a formula to analyze the conversion rates between multiple steps in a flow. Kubit counts the unique users at each step, then divide it by the initial step to compute the conversion rates.
- Each step can have one more Events with Filters.
- You can give a name to each step to make it easier to differentiate in the chart. If no name is provided, default step names like S1, S2, S3 ... will be used.
- You can add or remove steps using the controls on the left side.
- You can also drag-n-drop steps to rearrange their orders.
The funnel chart shows the conversion rates between steps for each date, from which you can easily identify the trend.
- You can switch between Funnel and Line chart view.
- You can right-click on a date and "Expand this funnel" to display the funnel for that date only.
- You can click on each step's icon below the chart to enable/disable that step in the chart.
- Note: the conversion rates will also change if the first step is disabled.
Path is a formula using Sankey diagram to display the different paths users take between two events. It helps product people to learn the user journey in the app.
- Specify a Starting Event and an Ending Event.
- Choose between Forward or Backward Direction.
- The Sankey diagram displays different paths users flow.
- Mouse-over an edge to see the number of unique users on both sides and the conversion rate
- Use the two arrows above the diagram to increase or decrease number of steps.
Retention is a formula shows how different cohorts of users (defined by Starting Events on a date) retain over time: coming back on a certain date with at least one Returning Events).
- Specify the time bucket for Cohort and Retention, default is Day
- Select the Starting Event(s) and Returning Event(s)
The Retention chart displays each cohort as a row identified by Cohort Date, followed by how many unique users exist on that initial date, then how many of them came back on the following dates.
- Toggle between # and % to show number unique users or % of the total (retention rate).
- Use Line chart to show the trend of the retention for different cohorts
- If Groupby is used, only the All (total retention) or at most one group's retention can be displayed at a time.
- Instead of "Date - All", if a certain cohort date is selected in the dropdown box on the top of the chart, then you can compare all groups' retention for that cohort date together.
Cohort as a formula means a group users who match certain criteria. You can use any formula like Query, Funnel and Retention to construct this criteria, or even combine them together with AND/OR operators.
Once a cohort is built and saved, it can be used either as Groupby for any other formula, or in Cohort Import for marketing automation.
- For convenience, the Query, Funnel, Retention formula to be used in cohort should be saved with a name through "Rename Chart". For example:
- First Launch Count = Count Event(Launched) for Install Days = 0
- Sing Count = Count Event(Sing)
- When create a cohort, select the saved formula, then choose the condition and time window.
- Execute helps you to verify the cohort definition by knowing how many users match the criteria for a test time period.
- Once satisfied, you need to Save the cohort by giving it a name and description.
- The date range is for validation purposes and won't be saved into the cohort definition.
- When a cohort is used in Groupby, it will apply the date range from the formula itself.
- Saved cohorts can be found in Dictionary - Cohort
- Note: since cohorts are referenced by other formulas, it can't be updated or renamed after creation (immutable). Deleting effectively means disable it from future usage.
Kubit helps diagnosing the root cause of analytics issues through machine learning based automation.
For every analysis which is a time series, Kubit's anomaly detection engine automatically calls out outliers as red dots on the chart.
- Behind the scenes, the machine learning algorithm builds the model based on historical data to consider seasonalities, skip known incidents and reduce false alarms.
- Mouse-over shows the Z-score of the incident data points compared to the predicted values.
For any KPI, the Diagnose action automatically runs dozens of queries to slice-n-dice the KPI with all known dimensions, and calls out the anomalies automatically. This reduces the labor intensive effort required for repetitively checking on different reports, it also guarantees coverage during investigation.
The diagnosis is presented as an evidence pool for users to review and find insights.
Discuss analytics insights where you find them makes much more sense than doing it through emails or Slack.
A workspace is a place to put all the insights together and collaborate to find the answer visually.
Every analysis/chart can be added to a Workspace through the "Add to Workspace" button.
Workspace feels like Slack for Analytics.
- In the center is the Board to layout different charts together: "Edit Workspace" allows you to drag-n-drop charts to arrange them in rows and columns).
- For every chart, others can see the formula definition (mouse over the info icon). There is no guess work or mistakes possible.
- The message channels are on the right side where the default channel is public, and you can click on each user's avatar to talk in the private channel.
- Every workspace can be linked to a Slack channel so you can get notified without keeping the browser window open for Kubit.
One picture is worth a thousand words. Click on the little camera button on the board, select an area on the screen, then start drawing your points on the image. You can also add texts to make your point. How easy this is comparing to describing a chart in email?
Kubit also helps you to be more effective through automation.
Every analysis or dashboard can be scheduled to run repeatedly and email you the result.
If marketing automation feature is enabled, you can schedule one time or daily cohort import to partners like Braze. The cohorts can be used to segment users and target them with marketing campaigns on Braze side. For details, check out Braze Integration.
Besides saving Measures, Filters and Cohorts for others to build formulas with, there are other forms of sharing in Kubit.
A dashboard is the place with multiple charts refreshed daily for people to get a quick overview of many metrics together.
Most analyses/charts can be added to a Dashboard through the "Add to Dashboard" menu item in the context menu of the chart.
- There are limited operations support on each chart in the dashboard. For in-depth analysis, "Show in Formula" leads you to the definition for further exploration.
- The layout of the dashboard can be easily changed using "Edit Dashboard" then using drag-n-drop.
Any Query or Funnel can be promoted to be an KPI. Besides getting listed on the KPI tab, every KPI also supports the ability of Automated Diagnostics.
Kubit helps to manage and publish organization wide events like every release, marketing promotion, user acquisition campaigns or outage.
These events can be displayed (using the calendar icon above) on every chart along with the analysis result so known issues or incidents can be exposed upfront.
- User: your end-user/customer. Each user should have a unique identifier (User ID) to differentiate them. When the user is anonymous, some form of UUID can be used.
- Event: a data structure generated when a user takes a certain action (eg view a page, click a button, log in, sing a song etc) or when something happens to a user (eg send a push notification, subscription renewal, app made an API call). Events are the most critical data for behavioral and product analytics.
- Property: every event can have multiple properties associated with it to capture the context information when the event happens. For example: Timestamp, User ID, Country, Device, Gender, Age, App Version and Marketing Campaign etc. These properties can be used to filter events, breakdown metrics or aggregate into measures in analytics.
- Filter: a condition applied to some properties to filter events or users. For example: Age > 18, Country = US, Date between 2/1/2020 and 2/29/2020.
- Function: a computation method on events or its properties. For example: Count Events(Login), Unique Users(), Sum(Subscription Event’s Amount Property).
- Measure: a computed value from events and properties using functions and filters. A Compound Measure is composed of other measures in a math formula (eg X+Y, X/Y).
- Dimension: a special property usually is used for breakdown or lookup purposes. For example: Country dimension is a property called Country on all Events; Campaign dimension is a separate table which contains detailed information about every campaign while on Events there is only a campaign_id property which you can lookup (join) in the Campaign dimension table.
- Breakdown: a way to slice a measure into groups based on dimension(s). Sometimes it is called Slice-and-Dice, or Group By. For example: breakdown by Age, Country.
- Cohort: a group of users who match certain criteria. For example, “New Users” means those installed in the last 7 days; "Frequent Singers” means users who sang more than five songs every day.
- Formula: a generic term for any analytics question. It includes Query, Funnel, Path, Retention, Predication and Effect.
- Query: a question to get some measure, optionally with filters, breakdowns or cohorts.
- Funnel: a special analysis to visualize the remaining users at every step in order to measure conversion rates.
- Path: a special analysis which uses a Sankey diagram to visualize users’ journey during all the steps between a starting event and ending event.
- Retention: a special analysis which measures how users come back with a returning event over time since the occurrence of the starting event.
- Prediction: a special analysis to find different events’ predictive value relates to how users move from a source cohort to target cohort.
- Effect: a special analysis showing the impact of a trigger event on an affected event by measuring the changes in event count, frequency or percentage of active users.
- Analysis: a detailed examination of some formula (an instance after executing a formula). Usually it is associated with a visualization view (chart).
- Metric: an analytics view of some measure.