Reading and building a table from a multi-column data frame using R

I would like to read this output file (gibbs_samples) in R and create a table between the number of iterations (1- 44) and deviations. The summary table should have 10 columns with each column:

"iter","va1","cova12","va2","vc1","covc12","vc2","ev1","cove12","ev2" 

which should be divided by empty space:

 1 0.2319E-01 0.1166E-02 0.9017E-02 0.3545E-01 0.1543E-01 0.1860 0.5173E-01 0.1542E-01 0.2297 2 . . 44 

Note:

i) The first 10 lines should be ignored

ii) The number 9 in the second column to be deleted, and every three rows should fall on one row,

  example: 1 9 0.2319E-01 0.1166E-02 0.9017E-02 0.3545E-01 0.1543E-01 0.1860 0.5173E-01 0.1542E-01 0.2297 

should form the first line:

 1 0.2319E-01 0.1166E-02 0.9017E-02 0.3545E-01 0.1543E-01 0.1860 0.5173E-01 0.1542E-01 0.2297 

iii) Between the left margin and the numbers

there is empty space

iv) Some columns (mainly columns 1 and 2) do not have empty space between them

Example:

  21 9 0.2331E-01-0.1479E-03 0.7441E-02 0.2520E-01 0.1537E-01 0.5753E-01 0.7325E-01 0.2136E-01 0.1439 22 9 0.2377E-01-0.2597E-03 0.7385E-02 0.2614E-01 0.1565E-01 0.6142E-01 0.7073E-01 0.1946E-01 0.1424 

Below is the output file "gibbs_samples":

  -1 9 4 1 6 6 1 1 2 6 6 1 2 3 6 6 2 2 4 7 7 1 1 5 7 7 1 2 6 7 7 2 2 7 0 0 1 1 8 0 0 1 2 9 0 0 2 2 1 9 0.2319E-01 0.1166E-02 0.9017E-02 0.3545E-01 0.1543E-01 0.1860 0.5173E-01 0.1542E-01 0.2297 2 9 0.2315E-01 0.1457E-02 0.8491E-02 0.3538E-01 0.1838E-01 0.9397E-01 0.6191E-01 0.1684E-01 0.1538 3 9 0.2311E-01 0.1363E-02 0.8228E-02 0.3032E-01 0.1593E-01 0.7850E-01 0.6831E-01 0.1998E-01 0.1450 4 9 0.2282E-01 0.1120E-02 0.7982E-02 0.2935E-01 0.1425E-01 0.7046E-01 0.6993E-01 0.1987E-01 0.1411 5 9 0.2263E-01 0.1138E-02 0.7893E-02 0.2935E-01 0.1524E-01 0.6388E-01 0.7037E-01 0.1909E-01 0.1439 6 9 0.2202E-01 0.1260E-02 0.7649E-02 0.3002E-01 0.1776E-01 0.6507E-01 0.7196E-01 0.2067E-01 0.1429 7 9 0.2229E-01 0.9052E-03 0.7424E-02 0.3015E-01 0.1945E-01 0.6771E-01 0.7075E-01 0.2017E-01 0.1438 8 9 0.2163E-01 0.7266E-03 0.7327E-02 0.3211E-01 0.2199E-01 0.6600E-01 0.7120E-01 0.1876E-01 0.1458 9 9 0.2134E-01 0.6320E-03 0.7375E-02 0.3316E-01 0.1930E-01 0.6214E-01 0.7083E-01 0.2038E-01 0.1465 10 9 0.2188E-01 0.8527E-03 0.7504E-02 0.2835E-01 0.1504E-01 0.6133E-01 0.7096E-01 0.1839E-01 0.1458 11 9 0.2111E-01 0.9058E-03 0.7598E-02 0.2629E-01 0.1543E-01 0.6452E-01 0.7105E-01 0.2034E-01 0.1450 12 9 0.2129E-01 0.7306E-03 0.7787E-02 0.2844E-01 0.1472E-01 0.6338E-01 0.7284E-01 0.1831E-01 0.1452 13 9 0.2163E-01 0.7417E-03 0.7484E-02 0.2881E-01 0.1532E-01 0.6110E-01 0.7104E-01 0.1979E-01 0.1454 14 9 0.2200E-01 0.4375E-03 0.7549E-02 0.3004E-01 0.1674E-01 0.6364E-01 0.7125E-01 0.1999E-01 0.1432 15 9 0.2189E-01 0.2382E-03 0.7774E-02 0.2962E-01 0.1681E-01 0.6627E-01 0.7203E-01 0.1894E-01 0.1419 16 9 0.2155E-01 0.2874E-03 0.7529E-02 0.3123E-01 0.1612E-01 0.6113E-01 0.7190E-01 0.1877E-01 0.1485 17 9 0.2167E-01 0.2198E-03 0.7418E-02 0.3035E-01 0.1574E-01 0.6015E-01 0.7060E-01 0.1976E-01 0.1484 18 9 0.2257E-01 0.4865E-04 0.7392E-02 0.3082E-01 0.1619E-01 0.5738E-01 0.7020E-01 0.1838E-01 0.1443 19 9 0.2184E-01 0.1929E-03 0.7315E-02 0.3363E-01 0.1854E-01 0.6592E-01 0.7136E-01 0.1890E-01 0.1466 20 9 0.2214E-01 0.7481E-04 0.7492E-02 0.2906E-01 0.1682E-01 0.6001E-01 0.7087E-01 0.2037E-01 0.1469 21 9 0.2331E-01-0.1479E-03 0.7441E-02 0.2520E-01 0.1537E-01 0.5753E-01 0.7325E-01 0.2136E-01 0.1439 22 9 0.2377E-01-0.2597E-03 0.7385E-02 0.2614E-01 0.1565E-01 0.6142E-01 0.7073E-01 0.1946E-01 0.1424 23 9 0.2366E-01-0.1304E-03 0.7536E-02 0.2996E-01 0.1942E-01 0.5751E-01 0.7112E-01 0.2063E-01 0.1442 24 9 0.2353E-01-0.1806E-03 0.7412E-02 0.3136E-01 0.2238E-01 0.6733E-01 0.7275E-01 0.1907E-01 0.1425 25 9 0.2278E-01-0.3747E-03 0.7351E-02 0.3003E-01 0.1832E-01 0.6088E-01 0.7126E-01 0.2140E-01 0.1469 26 9 0.2259E-01-0.3012E-03 0.7219E-02 0.2732E-01 0.1631E-01 0.5692E-01 0.6851E-01 0.1875E-01 0.1447 27 9 0.2231E-01-0.2277E-03 0.7038E-02 0.2828E-01 0.1892E-01 0.5589E-01 0.6876E-01 0.1963E-01 0.1413 28 9 0.2229E-01-0.2523E-03 0.6896E-02 0.3183E-01 0.1796E-01 0.5565E-01 0.6928E-01 0.1921E-01 0.1425 29 9 0.2152E-01-0.3977E-03 0.7060E-02 0.3026E-01 0.1822E-01 0.5779E-01 0.7268E-01 0.1956E-01 0.1445 30 9 0.2109E-01-0.2360E-03 0.6998E-02 0.3183E-01 0.1643E-01 0.5578E-01 0.7322E-01 0.2124E-01 0.1492 31 9 0.2135E-01-0.2020E-04 0.6787E-02 0.2956E-01 0.1830E-01 0.5778E-01 0.7114E-01 0.2004E-01 0.1447 32 9 0.2154E-01 0.1254E-03 0.6829E-02 0.3317E-01 0.2162E-01 0.6202E-01 0.7061E-01 0.1795E-01 0.1412 33 9 0.2199E-01-0.6386E-04 0.6709E-02 0.2947E-01 0.1854E-01 0.5679E-01 0.7039E-01 0.1803E-01 0.1458 34 9 0.2184E-01 0.8950E-04 0.6689E-02 0.3075E-01 0.1846E-01 0.5174E-01 0.7162E-01 0.1975E-01 0.1443 35 9 0.2168E-01 0.4526E-04 0.6550E-02 0.3146E-01 0.1977E-01 0.5322E-01 0.7290E-01 0.1834E-01 0.1470 36 9 0.2153E-01 0.2621E-03 0.6705E-02 0.2954E-01 0.1959E-01 0.5310E-01 0.7244E-01 0.1869E-01 0.1441 37 9 0.2234E-01 0.2712E-03 0.6856E-02 0.3043E-01 0.2318E-01 0.6494E-01 0.7143E-01 0.1889E-01 0.1426 38 9 0.2217E-01 0.2001E-03 0.7079E-02 0.2945E-01 0.2101E-01 0.6468E-01 0.7113E-01 0.1815E-01 0.1429 39 9 0.2235E-01 0.2269E-03 0.7234E-02 0.2940E-01 0.1930E-01 0.6252E-01 0.7006E-01 0.2083E-01 0.1450 40 9 0.2280E-01 0.2957E-04 0.7126E-02 0.2971E-01 0.1971E-01 0.5912E-01 0.7027E-01 0.1819E-01 0.1493 41 9 0.2296E-01 0.1358E-03 0.6955E-02 0.2877E-01 0.1901E-01 0.5768E-01 0.6981E-01 0.1956E-01 0.1449 42 9 0.2302E-01-0.2015E-03 0.7196E-02 0.3145E-01 0.1959E-01 0.5772E-01 0.7115E-01 0.2065E-01 0.1445 43 9 0.2277E-01-0.2184E-03 0.6900E-02 0.3119E-01 0.2150E-01 0.5739E-01 0.6990E-01 0.2023E-01 0.1435 44 9 0.2300E-01-0.1301E-03 0.6936E-02 0.2865E-01 0.1844E-01 0.5872E-01 0.6876E-01 0.1887E-01 0.1478 

Your help will be greatly appreciated!

+1
source share
2 answers

The R database has a read.fwf function that reads files of a fixed width. It has the ability to read multi-line file formats by passing list the widths argument.

Here is the starting point for your data (if your data file is called "raw"):

 dat <- read.fwf(raw, skip=11, widths=list( c(8, 8), c(11, rep(11, 6)), c(11, 11) ), stringsAsFactors=FALSE, colClasses="character" ) 

Convert columns to numeric. (It seems that something is odd with the alignment of the columns of row 19 of your data. Usually the next step is not required, as the digital conversion happens automatically.)

 dat <- as.data.frame(lapply(dat, as.numeric)) 

Results:

 str(dat) 'data.frame': 44 obs. of 11 variables: $ V1 : num 1 2 3 4 5 6 7 8 9 10 ... $ V2 : num 9 9 9 9 9 9 9 9 9 9 ... $ V3 : num 0.0232 0.0232 0.0231 0.0228 0.0226 ... $ V4 : num 0.00117 0.00146 0.00136 0.00112 0.00114 ... $ V5 : num 0.00902 0.00849 0.00823 0.00798 0.00789 ... $ V6 : num 0.0355 0.0354 0.0303 0.0294 0.0294 ... $ V7 : num 0.0154 0.0184 0.0159 0.0142 0.0152 ... $ V8 : num 0.186 0.094 0.0785 0.0705 0.0639 ... $ V9 : num 0.0517 0.0619 0.0683 0.0699 0.0704 ... $ V10: num 0.0154 0.0168 0.02 0.0199 0.0191 ... $ V11: num 0.23 0.154 0.145 0.141 0.144 ... 

The only remaining task is to remove the second column, which is trivial.

+2
source

R is not very convenient for data processing. You must do this kind of manipulation differently before loading it into R for statistical analysis. You can use the ETL tool (Extract, Transform, and Download) for this task. I am currently familiar with one ETL tool (Integration Services that ships with SQL Server), but I'm sure others are available.

Alternatively, you can write a script that does your manipulation.

It depends on your technical capabilities and skills ... :)

Good luck.

-1
source

Source: https://habr.com/ru/post/1380296/


All Articles