Creating Snowflake schema / Link 2 Cubes

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Creating Snowflake schema / Link 2 Cubes

Bart
This post was updated on .
Hi there!

I've read that I can't do actually the snowflake schema. However, in my study research I need to do the snowflake schema which I present you with the image.

Schema

It's all about sailing the books. The only fact table here is "Sprzedaz_ksiazki" ("The fact of sailing the book").  All around dimensions are easy to do, but... Above I have "Ksiazka" which means "Book". Then I have a connection to "Autorstwo" which means "who wrote the book" (the book could be written by more than one author) and then the "Autor" dimension that is "Author" dimension.

Is there any way to do such a schema with available version of Kylin? Do I have to connect them directly to the fact and forget about the snowflake schema, or may I build them like in the image, or do I have to build 2 cubes and connect them in a different way?

I would be grateful if you could help me. I tried to search it here and on the stackoverflow, but there is no answer.

Best regards,
Bart
Reply | Threaded
Open this post in threaded view
|

Re: Creating Snowflake schema / Link 2 Cubes

Bart
OK.
I see that Kylin 2.0 will resolve my problem. I'm gonna wait :)
Reply | Threaded
Open this post in threaded view
|

Re: Creating Snowflake schema / Link 2 Cubes

Billy Liu
Please download the latest 2.0 beta2, and have a try. The feedback is very
welcomed.

2017-04-08 19:32 GMT+08:00 Bart <[hidden email]>:

> OK.
> I see that Kylin 2.0 will resolve my problem. I'm gonna wait :)
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/Creating-Snowflake-schema-Link-2-Cubes-tp7602p7603.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Creating Snowflake schema / Link 2 Cubes

Bart
This post was updated on .
i'm really sorry, but I can't see where I could add those extra dimensions to build a snowflake.  Is there any printsreen or tutorial to help?

To add - when I "merged into" files from conf/*, I don't know if my Kylin hadn't crushed down. I actually have a problem while creating a projekt - with json. Katalina and so on.
When I wanted to come back to my 1.6 version - the problem was still the same. But after a while - it went away. But I've lost my loaded hive tables into projects... nonono, I did sth wrong :/
Reply | Threaded
Open this post in threaded view
|

Re: Creating Snowflake schema / Link 2 Cubes

Bart
OK, guys. I've tried everything I could to build a snowflake.

I have a database looking like this:

Date (DataData) -> Fact (FaktSprzedazy) <- Book (Ksiazka) <- Author (Autor)

I've created a new model for that. Do I have to put a lookup tables? I deduced it by the descriptions that are next to that section. Then I tried to build my cube. Here is my error;



Probably I added a wrong lookup table (the third one on the list). But... I can't see other option to join two dimensions together which are the schema of snowflake (like a database above). Should I do it on my model level or building cube level?
Reply | Threaded
Open this post in threaded view
|

Re: Creating Snowflake schema / Link 2 Cubes

shaofengshi
Hi Bart,

Could you share the JSON of your data model and cube definition? The
picture you shared is unclear and we couldn't get the detail information.
The error message (in problem.jpg) is complaining for a rare case (rowkey
columns not matched with dimension columns); Did you remove a dimension
which previously be added?

Snowflake model is supported since 2.0 (beta now). You need specify each
table's type (fact table or lookup table) and their join conditions when
creating the model.

2017-04-10 20:25 GMT+08:00 Bart <[hidden email]>:

> OK, guys. I've tried everything I could to build a snowflake.
>
> I have a database looking like this:
>
> Date (DataData) *->* Fact (FaktSprzedazy) *<-* Book (Ksiazka) <- Author
> (Autor)
>
> I've created a new model for that. Do I have to put a lookup tables? I
> deduced it by the descriptions that are next to that section. Then I tried
> to build my cube. Here is my error;
>
> <http://apache-kylin.74782.x6.nabble.com/file/n7614/problem.jpg>
>
> Probably I added a wrong lookup table (the third one on the list). But... I
> can't see other option to join two dimensions together which are the schema
> of snowflake (like a database above). Should I do it on my model level or
> building cube level?
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/Creating-Snowflake-schema-Link-2-Cubes-tp7602p7614.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>



--
Best regards,

Shaofeng Shi 史少锋
Reply | Threaded
Open this post in threaded view
|

Re: Creating Snowflake schema / Link 2 Cubes

Yang
Kylin comes with a diagnosis tool that helps people to reproduce your
problem. Attach a diagnosis pack will speed up troubleshooting greatly.

http://kylin.apache.org/blog/2016/06/10/diagnosis-tool/

Cheers
Yang


On Tue, Apr 11, 2017 at 10:44 PM, ShaoFeng Shi <[hidden email]>
wrote:

> Hi Bart,
>
> Could you share the JSON of your data model and cube definition? The
> picture you shared is unclear and we couldn't get the detail information.
> The error message (in problem.jpg) is complaining for a rare case (rowkey
> columns not matched with dimension columns); Did you remove a dimension
> which previously be added?
>
> Snowflake model is supported since 2.0 (beta now). You need specify each
> table's type (fact table or lookup table) and their join conditions when
> creating the model.
>
> 2017-04-10 20:25 GMT+08:00 Bart <[hidden email]>:
>
> > OK, guys. I've tried everything I could to build a snowflake.
> >
> > I have a database looking like this:
> >
> > Date (DataData) *->* Fact (FaktSprzedazy) *<-* Book (Ksiazka) <- Author
> > (Autor)
> >
> > I've created a new model for that. Do I have to put a lookup tables? I
> > deduced it by the descriptions that are next to that section. Then I
> tried
> > to build my cube. Here is my error;
> >
> > <http://apache-kylin.74782.x6.nabble.com/file/n7614/problem.jpg>
> >
> > Probably I added a wrong lookup table (the third one on the list).
> But... I
> > can't see other option to join two dimensions together which are the
> schema
> > of snowflake (like a database above). Should I do it on my model level or
> > building cube level?
> >
> > --
> > View this message in context: http://apache-kylin.74782.x6.
> > nabble.com/Creating-Snowflake-schema-Link-2-Cubes-tp7602p7614.html
> > Sent from the Apache Kylin mailing list archive at Nabble.com.
> >
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
Reply | Threaded
Open this post in threaded view
|

Re: Creating Snowflake schema / Link 2 Cubes

Bart
In reply to this post by shaofengshi
I think I know what I did wrong. I tried "derived columns" on everything beyond the fact table.

If I have a model like above and take a "join between dimensions" column as normal, it built a cube.

I'm gonna check it once more, because I had no chance to do that - free time.

I will let you know!
Reply | Threaded
Open this post in threaded view
|

Re: Creating Snowflake schema / Link 2 Cubes

Bart
I have a problems all the time with snowflke schema...

I will attach a link to diagnosis of my process.

https://www.dropbox.com/s/rbhl9y9ywob54na/project_2017_04_27_10_50_29.zip?dl=0

Wish you could help me with that...
Reply | Threaded
Open this post in threaded view
|

Re: Creating Snowflake schema / Link 2 Cubes

Bart
I can see that the Kylin has problems with counting cardinality with more than 3-4 dimensions. I just took the data source to my project, counting cardinality and... 3 out of 8 are staying at 5%. If I count one by one - it works...

Reply | Threaded
Open this post in threaded view
|

Re: Creating Snowflake schema / Link 2 Cubes

Yang
Hi Bart,

With the diagnosis pack, the problem is much clearer.


   - Seems you are using a CDH quickstart sandbox which is very limited in
   resource. In that case, it is expected that running 2 or 3 table stats at
   the same time could hang due to resource competition (thus deadlock) in
   YARN. The same thing happens in my HDP sandbox btw.
   - Also the cube Ksiegarski_cube contains 30 dimensions, which is
   challenging even for some Kylin expert. Suggest start with a simpler cube
   with 20 dimensions or less. And use hierarchy / joint / mandatory more
   heavily to control the overall cuboid number.


Cheers

Yang



On Fri, Apr 28, 2017 at 2:32 AM, Bart <[hidden email]> wrote:

> I can see that the Kylin has problems with counting cardinality with more
> than 3-4 dimensions. I just took the data source to my project, counting
> cardinality and... 3 out of 8 are staying at 5%. If I count one by one - it
> works...
>
>
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/Creating-Snowflake-schema-Link-2-Cubes-tp7602p7803.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Creating Snowflake schema / Link 2 Cubes

LeReunionais

I see that in Bart's use case, there was a one to many relationship between books and author (one book is written by many authors). Looking at json model definition he posted, I can only see a one to one relationship.

Could you explain how this one to many relationship is archived?
Reply | Threaded
Open this post in threaded view
|

Re: Creating Snowflake schema / Link 2 Cubes

Yang
We could create multiple models and cubes like
- Model 1: Book (alone)
- Model 2: Author (alone)
- Model 3: Book join Author

Depending on the incoming query, Kylin will choose the right model with
best performance.




On Tue, Jun 13, 2017 at 12:23 PM, LeReunionais <[hidden email]> wrote:

>
> I see that in Bart's use case, there was a one to many relationship between
> books and author (one book is written by many authors). Looking at json
> model definition he posted, I can only see a one to one relationship.
>
> Could you explain how this one to many relationship is archived?
>
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/Creating-Snowflake-schema-Link-2-Cubes-tp7602p8220.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>