Returning & Using Multiple Values from a HIVE UDF

One of the typical problems faced while implementing User Defined Functions (UDF) in HIVE is – How to return multiple values from it, and how to use the multiple values (columns) in the HIVE select statement.

In our case, we were faced with a requirement of returning multiple float values (columns) as part of our select query from our UDF.

This assumes that you are familiar with UDF in HIVE. If not, please follow these links –

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

https://cwiki.apache.org/confluence/display/Hive/HivePlugins

While the above links explain the basics of creating a UDF, the following java program explains how is the problem of returning multiple values from the UDF is achieved.

Create the UDF in the following manner –


import java.util.ArrayList;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.*;

public final class MultiUDF extends UDF {
public ArrayList<FloatWritable> evaluate(final LongWritable input) {
// function logic
}
}

Here we have created a UDF which can return multiple values. The idea is to use a generic ArrayList of type FloatWritable. It can be also be IntWritable, Text etc. (any type which Hadoop can serialize)

Now an important point is How to use the returned elements of the ArrayList in the HIVE Select statement.

Using a simple statement like –

Select MultiUDF([input]) from myTable;

Output = [1.2 , 2.8]

The above would return the serialized array and not the individual values (columns) which you expect to be returned.

Structure the query in the following manner –

select ret[0],ret[1] from (select MultiUDF(nId) as ret from myTable) bar;
Output = 1.2       2.8

Hope this would help you use UDFs as per your requirements. One should also read about GenericUDF type for related solutions as well.

Leave a Reply

Your email address will not be published. Required fields are marked *


six − = 5

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>