[Forensics] How to retroactively document awk code [uses a regex for "$n"]

Grab (copy/ paste) some code that may need some explainin' and paste into ChunkOfawkCode2Document

Bash:

linux> cat ChunkOfawkCode2Document | nl
       
     1    x1=`sed '1d' $KAG_extract | sed 's/\"//g' | awk -F, -v e=$KAG_EmissionAccount '($4=="create_account") && ($11==e) {sum_10 += $10} END {printf "%-10.2f\n", sum_10}'`
     2    x2=`sed '1d' $KAG_extract | sed 's/\"//g' | awk -F, -v e=$KAG_EmissionAccount '($4=="payment") && ($13==e) {sum_15 += $15} END {printf "%-10.2f\n", sum_15}'`
     3    x3=`sed '1d' $KAG_extract | sed 's/\"//g' | awk -F, -v x=$KAG_HotWalletAccount -v y=$KAG_EmissionAccount '( ($4=="payment") && (($13==x) && ($14==y)) ) {sum_15 += $15} END {printf "%-10.2f\n", sum_15 }'`

Extract out numbers preceded by "$"
To see more about regex, you could look here.

Bash:

linux> cat ChunkOfawkCode2Document | grep -n -Eo '\$[0-9]+' 
5:$4
5:$11
5:$10
6:$4
6:$13
6:$15
7:$4
7:$13
7:$14
7:$15

In reality that snippet of regex took me well over an hour to figure out. I may have stolen it from elsewhere (very likely) but I have no idea from where any longer. So can't attribute the author. But that's not the point. With regex ye shall suffer. They are rarely obvious. Lots of trial and error required.

Remove $ sign
Remove line number (-n)
Sort numerically

Write to temporary file t

Bash:

linux> cat ChunkOfawkCode2Document | grep -Eo '\$[0-9]+' | sed 's/\$//g' | sort -n -u
4
10
11
13
14
15
linux> cat ChunkOfawkCode2Document | grep -Eo '\$[0-9]+' | sed 's/\$//g' | sort -n -u > t

Each of those hard-coded $4 $10 ... $15 variables corresponds to a column number in the input file.
It turns out the input file comes with header descriptions in the first record.
Let us data wrangle that (I have already used this technique at least a dozen times here on this forum)

Bash:

linux> sed -n '1p' $KAG_extract 
"id","transactionSuccessful","sourceAccount","type","typeI","createdAt","transactionHash","account","into","startingBalance","funder","assetType","from","to","amount","signerKey","signerWeight","masterKeyWeight","lowThreshold","medThreshold","highThreshold"

Transpose header line and prepend with line number:

Bash:

linux> sed -n '1p' $KAG_extract | sed 's/,/\n/g' | nl -ba
     1    "id"
     2    "transactionSuccessful"
     3    "sourceAccount"
     4    "type"
     5    "typeI"
     6    "createdAt"
     7    "transactionHash"
     8    "account"
     9    "into"
    10    "startingBalance"
    11    "funder"
    12    "assetType"
    13    "from"
    14    "to"
    15    "amount"
    16    "signerKey"
    17    "signerWeight"
    18    "masterKeyWeight"
    19    "lowThreshold"
    20    "medThreshold"
    21    "highThreshold"

write to u temp file
List them side by side

Bash:

linux> sed -n '1p' $KAG_extract | sed 's/,/\n/g' | nl -ba > u
linux> paste t u
4         1    "id"
10         2    "transactionSuccessful"
11         3    "sourceAccount"
13         4    "type"
14         5    "typeI"
15         6    "createdAt"
         7    "transactionHash"
         8    "account"
         9    "into"
        10    "startingBalance"
        11    "funder"
        12    "assetType"
        13    "from"
        14    "to"
        15    "amount"
        16    "signerKey"
        17    "signerWeight"
        18    "masterKeyWeight"
        19    "lowThreshold"
        20    "medThreshold"
        21    "highThreshold"

How to match 'em up?
How to do a JOIN
It turns out, join is a bit rude to us

linux> join t u
join: t:2: is not sorted: 10
4 "type"
join: u:10: is not sorted: 10 "startingBalance"
join: input is not in sorted order
linux> join <(sort t) <(sort u)
10 "startingBalance"
11 "funder"
13 "from"
14 "to"
15 "amount"
join: /dev/fd/62:11: is not sorted: 1 "id"
4 "type"
join: input is not in sorted order

So we revert to trusted and tried methods:

Bash:

linux> for i in `cat t`; do grep $i u; done | sort -k1n -u
     4    "type"
    10    "startingBalance"
    11    "funder"
    13    "from"
    14    "to"
    15    "amount"

Ladies and gentlemen, we are DONE
That's our documentation

This relatively simple sequence of bash and awk instructions tells us exactly which fields we are pulling to calculate CoinInCirculation data for KAG.

As the format of the KAG file may vary (or perhaps the KAU one did), we are now armed to take any future changes in our stride.

Some more superfluous info:

Half a dozen
is all ya need

Bash:

linux> for i in `cat t`; do grep $i u; done | sort -k1n -u | wc -l
6

Going back to full header list.
Deleting from the end first
('cos otherwise we change the line number sequence)

Bash:

linux> cat u | sed '16,$d' | sed '1,3d'
     4    "type"
     5    "typeI"
     6    "createdAt"
     7    "transactionHash"
     8    "account"
     9    "into"
    10    "startingBalance"
    11    "funder"
    12    "assetType"
    13    "from"
    14    "to"
    15    "amount"
linux> cat u | sed '16,$d' | sed '1,3d' | wc -l
12

We used 6 of 12.
Half a dozen
Half a shilling's worth so to speak.
hey this is a precious metals forum init!

Click to expand...

[Forensics] How to retroactively document awk code [uses a regex for "$n"]

ZürichGnome

Active member

Translate