00:00:07:04 - 00:00:09:22
In this tutorial, we'll take a look
at how to link data
00:00:09:22 - 00:00:13:18
within your standalone data tables
within your platform.
00:00:13:20 - 00:00:16:21
For example, in this case
we wish to display
00:00:16:21 - 00:00:19:21
the name of each product
that's been ordered.
00:00:19:27 - 00:00:21:16
And the next to that we want a column
00:00:21:16 - 00:00:25:12
which has the customer name or customer
email that ordered it.
00:00:25:14 - 00:00:29:17
Now, these two pieces of data
do not exist in the same dataset,
00:00:29:19 - 00:00:33:19
so we need to link
data sets to actually achieve this
00:00:33:21 - 00:00:36:08
combination of data.
00:00:36:08 - 00:00:39:13
And this example would require the linking
of four data sets.
00:00:39:21 - 00:00:41:25
So we're going to have order,
00:00:41:25 - 00:00:50:19
we have order line customer and product.
00:00:50:22 - 00:00:53:04
So we'll start off
by making a new question
00:00:53:04 - 00:00:56:26
from within Metabase.
00:00:56:28 - 00:01:01:03
And then I'm going to
use one of my raw data sets
00:01:01:06 - 00:01:02:04
and I'm going to start off
00:01:02:04 - 00:01:06:06
just with my order line information.
00:01:06:08 - 00:01:07:16
So this will show me information
00:01:07:16 - 00:01:12:19
on all of the individual order lines.
00:01:12:21 - 00:01:15:27
Now we want to refine the displayed
column list.
00:01:16:00 - 00:01:18:01
So we're using this little arrow
icon here.
00:01:18:01 - 00:01:22:22
You can choose the columns of this dataset
to be displayed in the final visual.
00:01:22:25 - 00:01:26:08
This helps to reduce the sheer
amount of data that's being displayed,
00:01:26:10 - 00:01:29:10
particularly
when you start to link datasets together
00:01:29:15 - 00:01:33:01
as you start to get a vast amount
of columns and repeated data
00:01:33:01 - 00:01:38:09
a lot of the time as well.
00:01:38:12 - 00:01:40:21
We're then going to click on the Join Data
00:01:40:21 - 00:01:44:10
Icon so we can select another dataset
to link to our already
00:01:44:10 - 00:01:48:00
selected dataset.
00:01:48:03 - 00:01:56:02
And in this case,
I'm going to go for product.
00:01:56:05 - 00:01:59:01
The next
step is to identify matching columns.
00:01:59:01 - 00:02:03:14
So when joining two datasets, you
will need to identify a column in each set
00:02:03:16 - 00:02:05:09
which has matching data.
00:02:05:09 - 00:02:09:25
So in this case, for example, and
it's done it automatically, in this case
00:02:09:28 - 00:02:12:26
we can link
the product ID column of the order
00:02:12:26 - 00:02:16:23
line table to the ID
column of the product table.
00:02:16:29 - 00:02:19:29
Now, those are all the same
bits of information in both tables.
00:02:20:00 - 00:02:23:20
And by linking those together,
it knows how to relate one set of data
00:02:23:23 - 00:02:29:12
to the other sets of data.
00:02:29:15 - 00:02:31:13
Again, with our newly joined dataset,
00:02:31:13 - 00:02:35:19
we can choose the information
that we wish to display.
00:02:35:22 - 00:02:42:25
In this case,
I might just want the product name.
00:02:42:28 - 00:02:43:17
And then again,
00:02:43:17 - 00:02:47:21
I'm going to join another dataset so
multiple datasets can be linked together.
00:02:47:22 - 00:02:50:15
You aren't just limited to
00:02:50:17 - 00:02:52:12
the same process.
00:02:52:12 - 00:02:55:07
This time I'm going to go for my order
00:02:55:07 - 00:02:59:27
information,
00:02:59:29 - 00:03:02:29
look at the data
I want to display for the order
00:03:03:03 - 00:03:06:09
and then make sure I've got a link
between appropriate
00:03:06:09 - 00:03:13:05
columns.
00:03:13:08 - 00:03:16:18
Now, the order table is useful
in this respect as we're able to identify
00:03:16:18 - 00:03:20:21
the link between the order line
and the order in which it was part of.
00:03:20:24 - 00:03:23:15
And this will eventually help us
identify the customer
00:03:23:15 - 00:03:33:01
which is associated with the order itself.
00:03:33:03 - 00:03:35:29
Going to join one final dataset.
00:03:35:29 - 00:03:38:19
This is going to be our customer dataset.
00:03:38:19 - 00:03:41:09
Now again,
I'm going to repeat the process.
00:03:41:09 - 00:03:44:04
You can see this time
that it's not been able to actually
00:03:44:04 - 00:03:47:16
automatically define
a link between the datasets.
00:03:47:18 - 00:03:49:29
So what I need to do
is to manually pick these up.
00:03:49:29 - 00:03:52:16
So I'm starting off with
00:03:52:18 - 00:03:56:02
my previous
results and something to link to customer.
00:03:56:02 - 00:03:58:23
So I'm going to need
customer ID from somewhere.
00:03:58:23 - 00:04:01:16
Now I know this customer ID
is not in the order
00:04:01:16 - 00:04:05:05
line dataset,
which is identifying by default.
00:04:05:07 - 00:04:08:00
I'm going to collapse that list,
00:04:08:00 - 00:04:10:14
find my appropriate dataset.
00:04:10:14 - 00:04:13:14
So order ID
00:04:13:19 - 00:04:23:19
then in here I should be able to find
00:04:23:22 - 00:04:27:11
customer ID.
00:04:27:13 - 00:04:30:13
Then within the customer table
it's simply the ID number.
00:04:30:13 - 00:04:34:22
And then we've formed our link
between the datasets.
00:04:34:25 - 00:04:37:09
Again, I'm going to control
what information is displayed.
00:04:37:09 - 00:04:39:20
I don't need all of my columns
in this case.
00:04:39:20 - 00:04:47:20
I just want my email
00:04:47:23 - 00:04:48:13
and that's good.
00:04:48:13 - 00:04:50:24
So that's all the data joining.
We need to do.
00:04:50:24 - 00:04:53:17
The final thing I'm going to do in
my example is the same I've done in
00:04:53:17 - 00:04:56:06
previous
ones is just to cut down the results.
00:04:56:06 - 00:05:00:04
So we're only seeing results
from one individual storefront.
00:05:00:06 - 00:05:04:01
This is not a necessary step
if you are wanting to see data
00:05:04:01 - 00:05:07:26
from all of the storefronts.
00:05:07:28 - 00:05:08:05
And I'm
00:05:08:05 - 00:05:11:25
just going to go ahead and click Visualize
00:05:11:27 - 00:05:15:00
and what you can see there
because of all that joined up data,
00:05:15:00 - 00:05:20:07
we have a single table displayed combining
data from those multiple datasets.
00:05:20:09 - 00:05:25:20
So we have the product name,
which is coming from the product table,
00:05:25:22 - 00:05:30:12
and we're associating that with the email
by going through those alternate dataset.
00:05:30:14 - 00:05:35:06
So from products we've gone to order
line from order line we're going to order.
00:05:35:08 - 00:05:37:09
And then for order we've gone to customer.