sql语句中当in的列表超过一定数据量的时候性能突然变差,如何解决?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

sql语句中当in的列表超过一定数据量的时候性能突然变差,如何解决?

wangzy24
如下两个sql,差别是第二个hotel_code in的列表少一个元素(如红框标注),但性能差别很大:



个人觉得是in的元素过多,转换成rowkey比较多,导致kylin直接对表进行全部扫描,如果想不管rowkey多少,始终想通过rowkey进行查询该如何做?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sql语句中当in的列表超过一定数据量的时候性能突然变差,如何解决?

shaofengshi
It might be related with storage layer cache. You can make more tests to
see the differences. Besides, if you can provide more logs when executing
these two queries, that would be good for analysis.

在 2017年8月3日 下午4:26,wangzy24 <[hidden email]>写道:

> 如下两个sql,差别是第二个hotel_code in的列表少一个元素(如红框标注),但性能差别很大:
> <<a href="http://apache-kylin.74782.x6.nabble.com/file/n8630/%E5%BE%">http://apache-kylin.74782.x6.nabble.com/file/n8630/%E5%BE%
> AE%E4%BF%A1%E5%9B%BE%E7%89%87_20170803162007.png>
> <<a href="http://apache-kylin.74782.x6.nabble.com/file/n8630/%E5%BE%">http://apache-kylin.74782.x6.nabble.com/file/n8630/%E5%BE%
> AE%E4%BF%A1%E5%9B%BE%E7%89%87_20170803162100.png>
>
> 个人觉得是in的元素过多,转换成rowkey比较多,导致kylin直接对表进行全部扫描,如果想不管rowkey多少,
> 始终想通过rowkey进行查询该如何做?
>
>
> --
> View this message in context: http://apache-kylin.74782.x6.
> nabble.com/sql-in-tp8630.html
> Sent from the Apache Kylin mailing list archive at Nabble.com.
>



--
Best regards,

Shaofeng Shi 史少锋
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sql语句中当in的列表超过一定数据量的时候性能突然变差,如何解决?

mahongbin
take a look at kylin.storage.hbase.max-fuzzykey-scan
in org.apache.kylin.common.KylinConfigBase#getQueryScanFuzzyKeyMax

2017-08-03 20:25 GMT+08:00 ShaoFeng Shi <[hidden email]>:

> It might be related with storage layer cache. You can make more tests to
> see the differences. Besides, if you can provide more logs when executing
> these two queries, that would be good for analysis.
>
> 在 2017年8月3日 下午4:26,wangzy24 <[hidden email]>写道:
>
> > 如下两个sql,差别是第二个hotel_code in的列表少一个元素(如红框标注),但性能差别很大:
> > <<a href="http://apache-kylin.74782.x6.nabble.com/file/n8630/%E5%BE%">http://apache-kylin.74782.x6.nabble.com/file/n8630/%E5%BE%
> > AE%E4%BF%A1%E5%9B%BE%E7%89%87_20170803162007.png>
> > <<a href="http://apache-kylin.74782.x6.nabble.com/file/n8630/%E5%BE%">http://apache-kylin.74782.x6.nabble.com/file/n8630/%E5%BE%
> > AE%E4%BF%A1%E5%9B%BE%E7%89%87_20170803162100.png>
> >
> > 个人觉得是in的元素过多,转换成rowkey比较多,导致kylin直接对表进行全部扫描,如果想不管rowkey多少,
> > 始终想通过rowkey进行查询该如何做?
> >
> >
> > --
> > View this message in context: http://apache-kylin.74782.x6.
> > nabble.com/sql-in-tp8630.html
> > Sent from the Apache Kylin mailing list archive at Nabble.com.
> >
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>



--
Regards,

*Bin Mahone | 马洪宾*
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sql语句中当in的列表超过一定数据量的时候性能突然变差,如何解决?

wangzy24
Found by looking at the source code by kylin.query.scan.fuzzykey.max attribute control, the default is 200, when conditions combined with more than 200, it will range scan, if less than this value will be according to the rowkey to obtain, in kylin. If you want to modify the properties of adding kylin.query.scan.fuzzykey.max attribute
Loading...