Creating a hierarchical data structure, nodes in HTS R

I am trying to create a node structure using the HTS package in R. The documentation for the nodes is sparse, so trying to arrange the node structure is a difficult task and add an added layer. I am trying to create two hierarchies in which we have the following:

(Hierarchy 1 - Geography: An Example of the State of Delaware and its Districts)

=> 10000 => 10001 => 10003 => 10005 => 10999 

(Hierarchy 2 - Industry: Simplified)

 => 10 => 11 => 12 => 21 => 22 => 31 ... => 99 

Edit 2 - Corrected hierarchies and further clarification

Thus, each timer will have a geography code and an industry code. Geography codes correspond to one hierarchy, and the other to another (see above).

I am trying to figure out how to specify the "nodes" argument to represent the relationships of both hierarchies (only one hierarchy is shown in the documentation example).

When the two hierarchies interact, we get additional levels. Let it be simplified if we assume that there are only 2 industries: 11 and 12. The timeouts identified by (10001.11) and (10001.12) should be up to (10001.10); as well as (10001.11) ... (10999.11) should be reduced to (10000.11), etc. etc. Again, these are simplified hierarchies - there are more levels in real data.

The question is, what does the "node" argument look like for the two hierarchies? Hope this helps.

+6
source share
1 answer

Your designation (which may not be your choice) makes this very confusing. It seems that the same numerical sequence can refer to either a county or an industry.

However, the basic idea is clear enough: you have two hierarchies, and you want both types of aggregation to be taken into account. Here is an example, using my own notation to make it clearer.

Suppose there are two states with four and five districts, respectively, and two sectors with three and two sub-sectors, respectively. Thus, there are 9x5 series at the most disaggregated level (sub-industrial x county combinations). I will name states A and B, and the districts A1, A2, A3, A4 and B1, B2, B3, B4, B5. I will refer to industries X and Y as subcontractors Xa, Xb, Xc and Ya, Yb, respectively. Suppose you have a lower series of levels (the most disaggregated level) in the matrix y , one column per series and columns in the following order:

  County A1, industry Xa County A1, industry Xb County A1, industry Xc County A1, industry Ya County A1, industry Yb County A2, industry Xa County A2, industry Xb County A2, industry Xc County A2, industry Ya County A2, industry Yb ... County B5, industry Xa County B5, industry Xb County B5, industry Xc County B5, industry Ya County B5, industry Yb 

So, we have a reproducible example, I will create y randomly:

 y <- ts(matrix(rnorm(900),ncol=45,nrow=20)) 

Then we can construct labels for the columns of this matrix as follows:

 blnames <- paste(c(rep("A",20),rep("B",25)), # State rep(1:9,each=5), # County rep(c("X","X","X","Y","Y"),9), # Industry rep(c("a","b","c","a","b"),9), # Sub-industry sep="") colnames(y) <- blnames 

For example, the first series in the matrix has the name "A1Xa" , meaning state A, county 1, industry X, sub-industry a.

Then we can easily create a grouped time series object using

 gy <- gts(y, characters=list(c(1,1),c(1,1))) 

The characters argument indicates that there are two hierarchies (two items in the list), and the first hierarchy is indicated by the first two characters, and the second hierarchy is specified by two two characters.

A somewhat complicated but similar example (with labels accepting more than one character each) is listed in the help file for gts in v4.3 of the hts package.

You can specify the grouping structure without using column labels. Then you need to specify a group matrix that determines which aggregations are of interest. In the above example, the group matrix is ​​specified by

 gps <- rbind( c(rep(1,20),rep(2,25)), # State rep(1:9,each=5), # County rep(c(1,1,1,2,2),9), # Industry rep(1:5, 9), # Sub-industry c(rep(c(1,1,1,2,2),4),rep(c(3,3,3,4,4),5)), # State x industry c(rep(1:5, 4),rep(6:10, 5)), # State x Sub-industry rep(1:18, rep(c(3,2),9)) # County x industry ) 

Then

 gy <- gts(y, groups=gps) 

It is much simpler to use the column names approach with characters as the creation of all these rows with multiple products can be confusing.

+9
source

Source: https://habr.com/ru/post/970770/


All Articles