UDF pig for iso to yyyy-mm-dd hh: mm: ss.000

I want to convert the ISO time format to yyyy-mm-dd hh: mm: ss.SSS. However, Im is not able to achieve conversion. I am new to pig and am trying to write udf to handle the conversion from ISO format to yyyy-mm-dd hh: mm: ss.SSS.

I ask you, I tried the built-in functions of the pig (FORMAT, DATE_FORMAT), but I could not convert the data to the required format.

Current data format: 2013-08-22T13: 23: 18.226220 + 01: 00

Required data format: 2013-08-22 13: 23: 18.226

import java.io.IOException; import java.text.DateFormat; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.Date; import org.apache.pig.EvalFunc; import org.apache.pig.data.Tuple; import org.apache.pig.EvalFunc; import org.joda.time.DateTime; import org.joda.time.format.*; import org.joda.time.format.DateTimeFormatter; import org.joda.time.format.DateTimeFormatterBuilder; public class test extends EvalFunc<String>{ public String exec(Tuple input) throws IOException { if ((input == null) || (input.size() == 0)) return null; try{ String time = (String)input.get(0); DateFormat dt = new SimpleDateFormat ("yyyy-mm-dd hh:mm:ss.SSS"); Date d_t = dt.parse(time); String timedt = getTimedt(d_t); return timedt; } catch (ParseException e) { return null; } } private String getTimedt(Date d_t) { DateTimeFormatterBuilder formatter = new DateTimeFormatterBuilder(); } } 

How can I handle date conversion in a pig?

+6
source share
3 answers

With pig 0.11.1, UDF is not required to convert from ISO 8601 format to yyyy-mm-dd hh: mm: ss.SSS format. The following is sample code that shows how to convert columns in the ISO 8601 date format to yyyy-MM-dd HH: mm: ss.SSS dates.

convert_dates = FOREACH input_dates GENERATE ToString (date, 'yyyy-MM-dd HH: mm: ss.SSS') as the date: chararray;


Note:

I don't think the ToString function is documented ... I figured out this use from this Google SOC suggestion:

http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/zjshen/21002

where the following function is mentioned, which must be converted from a UDF piggybank to an embedded system.

 String ToString(DateTime d, String format) 

My guess is that it was transformed, but has not yet entered the main documentation. Here is the class documentation for the built-in ToString:

http://pig.apache.org/docs/r0.11.1/api/org/apache/pig/builtin/ToString.html

But we see that the ToString function is missing in the apache pig documentation here:

http://pig.apache.org/docs/r0.11.1/func.html

+7
source

2013-08-22T13: 23: 18.226220 + 01: 00 is the XSD date and time format, and it should be analyzed in this way

 XMLGregorianCalendar xc = DatatypeFactory.newInstance().newXMLGregorianCalendar("2013-08-22T13:23:18.226220+01:00"); 

from XMLGregorianCalendar you can get GregorianCalendar and then java.util.Date

 GregorianCalendar gc = xc.toGregorianCalendar Date date = gc.getTime(); 

Note that 226220 takes a fractional second. If you try to parse it using SimpleDateFormat as SSS, it will parse it as 226220 milliseconds and it will be 226 seconds 220 ms instead of 0.2226220 sec

+1
source
  DateFormat dffrom = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS"); DateFormat dfto = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS"); //TimeZone zone = TimeZone.getTimeZone("America/Los_Angeles"); //dfto.setTimeZone(zone); Date date = dffrom.parse("2013-08-22T13:23:18.226220+01:00"); //2013-08-22T13:23:18.226220+01:00 String s = dfto.format(date); System.out.println(s); 
0
source

Source: https://habr.com/ru/post/953311/


All Articles