首页 » Hive » 正文

Hive从入门到精通-第四篇

发表于: 大数据的那些坑-BigDataBugs · 2017-8-4 ·  16 views  ·  0 replies 

hive

———-

1.事务支持

stored as textfile ;

a)orc:Optimized Row Columnar,优化列模式文件。

2)桶表.

事务属性.

3)打开事务属性支持

SET hive.support.concurrency = true;

SET hive.enforce.bucketing = true;

SET hive.exec.dynamic.partition.mode = nonstrict;

SET hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;

SET hive.compactor.initiator.on = true;

SET hive.compactor.worker.threads = 1;

复杂类型

————-

1.array,数组,类型相同。

array<string>

a.创建表

create table arrtest(id int,hobbies<string>,name string) 

row format delimited

fields terminated by ' '

collection items terminated by ','

lines terminated by '\n'

stored as textfile ;

b.准备数据

1 football,basketball,tabletannis tom

2 music,movie tomas

c.加载数据

load data local inpath '/home/centos/arr.dat' into arrtest ;

d.查询

select * from arrtest ;

select id,hobbies[0] from arrtest ;//

select id,array('xx','yy') from arrtest ;//

e.通过

2.struct<>

pojo //plain old java object.

a)创建表

  create table stru1(id int,addr struct<province:string,city:string,street:string>) row format delimited fields terminated by ' ' collection items terminated by ','  map keys terminated by ':' lines terminated by '\n' stored as textfile  ;

3.map<key,value>

a)创建表

b)准备数据

  create table map1(id int,scores map<string:int>) row format delimited fields terminated by ' ' collection items terminated by ','  map keys terminated by ':' lines terminated by '\n' stored as textfile  ;

 多行函数

 ———–

输入 输出

1 1 单行函数

1 n 多行函数(表生成函数)

n 1 聚合函数(max|min|avg)

表生成函数

————-

explode //

select explode(array(1,2,3)) ;

select explode(hobbies) from arrtest ;

使用hive实现word count

————————-

1.准备数据

2.建表

create table wc(line array<string>) 

row format delimited 

fields terminated by '|' 

collection items terminated by ' ' 

lines terminated by '\n' 

stored as textfile  ;

3.加载数据

load data local inpath '/home/centos/wc.txt' into table wc ;

4.通过嵌套子查询 + 表生成(explode)函数.

select a.w , count(a.w) cc from (select explode(line) w from wc) a group by a.w order by cc desc ;

 1 province:hebei,city:handan,street:remin

create table stru1(id int,addr struct<province:string,city:string,street:string>) row format delimited fields terminated by ' ' collection items terminated by ','  map keys terminated by ':' lines terminated by '\n' stored as textfile  ;

1 phisic:80,huaxue:78,shuxue:99

2 music:80,history:78

3 tiyu:80,song:78

hello world tom

world hello tom

tom world hello

查询密码设置最多次数的前200.

—————————–

select password,count(password) cc from duowan group password order by cc desc limit 200 ;

dc483e80a7a0bd9ef71d8cf 000 000 000 000 000 973 673 924

MR:400+M / 800+万条

——————-

131.784s

MySQL

——————-

164.26s

使用hive实现ncdc最高气温统计

——————————-

1.1901.gz

TextInputFormat

2.创建hive表

create table ncdc(data string) ;

3.load数据到hive

4.select * from ncdc limit 100 ;

5.统计语句

select a.year,max(a.temp) from 

(select substring(data,16,4) year , cast(substr(data,88,5) as int) temp ,substr(data,93,1) from ncdc where cast(substr(data,88,5)as int) != 9999 and substr(data,93,1) rlike '[01459]') a 

group by a.year ;

查询函数的帮助

————–

$>desc function substr ;

UDF

————–

1.创建函数类

2.导出jar到hive classpath

[临时]

$hive>add jar /x/x/x/xxx.jar 

[永久]hive-site.xml

hive.aux.jars.path=/shared/

3.添加函数声明

[临时函数]

CREATE TEMPORARY FUNCTION add AS 'com.it18zhang.myhive210.udf.UDFAdd';

[永久函数]

CREATE FUNCTION add AS 'com.it18zhang.myhive210.udf.UDFAdd' USING JAR 'file:///shared/bigdata/data/myhive210-0.0.1-SNAPSHOT.jar';

4.使用函数

$>select add(1,2) ;

5.删除函数

DROP TEMPORARY FUNCTION IF EXISTS toUpper;

hbase

———————-

本文链接: http://www.bigdatabugs.com/
0 喜欢+
«上一篇:

相关文章

Hive从入门到精通-第三篇

beeline ———— 0.hive $>hiv...

Hive从入门到精通-第二篇

hive直接执行sql语句 ————&#...

Hive从入门到精通-第一篇

Hive —————- The Apache H...

  1. 还没有任何评论,你来说两句吧

留个评论