实践成功
demo.txt:
I hava a dog
He has a Dog
RDD写法:
[('I', 1), ('hava', 1), ('a', 2), ('dog', 1), ('He', 1), ('has', 1), ('Dog', 1)]
[('a', 2), ('I', 1), ('hava', 1), ('dog', 1), ('He', 1), ('has', 1), ('Dog', 1)]
DF写法:
[Row(word='dog', count=1), Row(word='He', count=1), Row(word='Dog', count=1), Row(word='I', count=1), Row(word='a', count=2), Row(word='hava', count=1), Row(word='has', count=1)]
[Row(word='a', count=2), Row(word='I', count=1), Row(word='Dog', count=1), Row(word='hava', count=1), Row(word='dog', count=1), Row(word='has', count=1), Row(word='He', count=1)]
从启动到出结果,DF写法速度要比rdd慢。
展开