Command Line Hacking Example 2019-12-17

I am developing a program to import CSV files from my banks and convert them into ledger format for import into the accounting. The first program created would import from the BBVA Compass credit card data. Wrote the program. Included tests for the basic operation. Included a few very simple integration tests. All good.

Then it was time to create a program to import CSV files from the BBVA Compass checking account. Of course I simply copied the first program to the second so that I would start with a working example and then would modify it into working for the second. The original was named bbva-import-cc and the fork I named bbva-import-bank.

Needing to fork the tests so that there would be tests for the original bbva-import-cc and new tests for bbva-import-bank. Of course I had not named the original tests uniquely. Therefore I needed to rename those files to associate them with the cc program first.

cd ../t/
rwp@angst:~/src/bank-import-stuff/bbva-import/t$ ls -1 *.t
help.t
invalid-options.t
load.t
transactions.t
version.t

The rename command is perfect for this. It takes a sed-like regular expression (it is really a perl script so it is a PCRE) to do the conversion. Let's build up the rename by trying things without doing the rename and crafting the command in place until it does what we want. Use the -v option to show what it is doing. Use the -n option to say not-really, don't actually do it, just show what would be done. Try this and see what would result without actually doing anything.

rwp@angst:~/src/bank-import-stuff/bbva-import/t$ rename -v -n s/.t/-cc.t/ *.t
rename(help.t, help-cc.t)
rename(invalid-options.t, invalid-o-cc.tions.t)
rename(load.t, load-cc.t)
rename(transactions.t, transa-cc.tions.t)
rename(version.t, version-cc.t)

Nope. Missed. The . is a regular expression that matches anything. Therefore matched the 't' in "options". Let's backslash quote it so that it must match a dot.

rwp@angst:~/src/bank-import-stuff/bbva-import/t$ rename -v -n s/\.t/-cc.t/ *.t
rename(help.t, help-cc.t)
rename(invalid-options.t, invalid-o-cc.tions.t)
rename(load.t, load-cc.t)
rename(transactions.t, transa-cc.tions.t)
rename(version.t, version-cc.t)

Nope. Missed. The \. was not quoted and therefore the shell thought we were quoting a shell meta-character on the command line. Must quote it. Here it does not matter if we use single or double quotes. As there is no other expansion desired I used single quotes.

rwp@angst:~/src/bank-import-stuff/bbva-import/t$ rename -v -n 's/\.t/-cc.t/' *.t
rename(help.t, help-cc.t)
rename(invalid-options.t, invalid-options-cc.t)
rename(load.t, load-cc.t)
rename(transactions.t, transactions-cc.t)
rename(version.t, version-cc.t)

That now looks correct. Therefore removed the -n option so that it will now actually do the rename.

rwp@angst:~/src/bank-import-stuff/bbva-import/t$ rename -v 's/\.t/-cc.t/' *.t
help.t renamed as help-cc.t
invalid-options.t renamed as invalid-options-cc.t
load.t renamed as load-cc.t
transactions.t renamed as transactions-cc.t
version.t renamed as version-cc.t

Looks like the renames were correct. Check it.

rwp@angst:~/src/bank-import-stuff/bbva-import/t$ ls -1 *.t
help-cc.t
invalid-options-cc.t
load-cc.t
transactions-cc.t
version-cc.t

Perfect! All of the files are now named for the cc program. Now let's fork those into versions for the new bank program. Let's run a for loop on the command line on the files we want to copy. Just echo them out initially so that we can see where we are starting. The hack it into what we want.

rwp@angst:~/src/bank-import-stuff/bbva-import/t$ for f in *-cc.*; do echo $f; done
help-cc.t
invalid-options-cc.t
load-cc.t
transactions-cc.t
version-cc.t

Yes. Matches the source files we want to copy. Let's modify the file names from cc to bank. Again sed is perfect for this modification.

rwp@angst:~/src/bank-import-stuff/bbva-import/t$ for f in *-cc.*; do n=$(echo $f | sed s/-cc/-bank/); echo cp $f $n; done
cp help-cc.t help-bank.t
cp invalid-options-cc.t invalid-options-bank.t
cp load-cc.t load-bank.t
cp transactions-cc.t transactions-bank.t
cp version-cc.t version-bank.t

Looks exactly like what we want to do. Do it. Remove the echo from the cp command. That will invoke it for actually doing the copy. Since it is nice to have some feedback that it is doing something I am going to add -v to the cp command so that it will show us what it is doing here. But really since we previewed it above this is not needed as we know exactly what it is going to do. The -v option is a GNU extension to the cp command and not available everywhere. But here we are on the command line and will know if it works or does not.

rwp@angst:~/src/bank-import-stuff/bbva-import/t$ for f in *-cc.*; do n=$(echo $f | sed s/-cc/-bank/); cp -v $f $n; done
'help-cc.t' -> 'help-bank.t'
'invalid-options-cc.t' -> 'invalid-options-bank.t'
'load-cc.t' -> 'load-bank.t'
'transactions-cc.t' -> 'transactions-bank.t'
'version-cc.t' -> 'version-bank.t'

Looks good. Check it.

rwp@angst:~/src/bank-import-stuff/bbva-import/t$ ls -1 *.t
help-bank.t
help-cc.t
invalid-options-bank.t
invalid-options-cc.t
load-bank.t
load-cc.t
transactions-bank.t
transactions-cc.t
version-bank.t
version-cc.t

Perfect! Now we just need to modify the contents of the *-bank files. The content of them calls over to the bbva-import-cc program and we want them to call over to the new bbva-import-bank instead.

There are endless different ways to do this. Do not get hung up on one perfect way. This is a one-time throw away sequence of command lines to get this one-time task done. It doesn't need to be perfect. It doesn't need to be best. It just needs to be good enough. Afterwards we will have the result we need and how we got here is not important.

First I like to check the matches of the pattern I am going to use to review and verify that what I am changing is what I want to change. Let's grep the file contents and double check our matches.

Here I am using "-cc" with the starting "-" because that will be very unique and will avoid matching accidental occurrences of "cc" other places. But "-cc" looks like an option to grep. Therefore I need to use the grep -e PATTERN option to tell grep that "-cc" is a pattern and not an option. `grep -e -cc" says to grep for "-cc" as the pattern.

rwp@angst:~/src/bank-import-stuff/bbva-import/t$ grep -e -cc *-bank.t
help-bank.t:my $output = `perl $srcdir/../src/bbva-import-cc.pl --help`;
invalid-options-bank.t:my $output = `perl $srcdir/../src/bbva-import-cc.pl --foo 2>&1`;
invalid-options-bank.t:$output = `perl $srcdir/../src/bbva-import-cc.pl -x 2>&1`;
load-bank.t:require_ok("$srcdir/../src/bbva-import-cc.pl");
transactions-bank.t:my $output = `perl $srcdir/../src/bbva-import-cc.pl transactions1.csv`;
version-bank.t:my $output = `perl $srcdir/../src/bbva-import-cc.pl --version`;
version-bank.t:$output = `perl $srcdir/../src/bbva-import-cc --version`;
version-bank.t:like($output,qr/bbva-import-cc \d/, "version number is a number");

I review the matches. Looks like there are no confusing strings that I need to worry about. Looks like a very simple task to edit those from cc to bank.

Let's use sed to change all occurrences of "-cc" to "-bank" in those files. Let's preview the change first. I like previewing actions before doing them when hacking something together on the command line. Here let's substitute "-cc" for "-bank" and let's use the 'g'lobal option to change it every time it appears on a line. The basic command would be this, which

sed 's/-cc/-bank/g'  # do not really run this

But if we previewed that with the files we wanted to change:

sed 's/-cc/-bank/g' ./*-bank.t  # do not really run this

That would produce a lot of output. It would stream the entire file contents to the terminal output. Everything would scroll off. So let's not do the above. grep that output for just the lines we want to see. This we can do and it will be small.

rwp@angst:~/src/bank-import-stuff/bbva-import/t$ sed 's/-cc/-bank/g' ./*-bank.t | grep -e -bank
my $output = `perl $srcdir/../src/bbva-import-bank.pl --help`;
my $output = `perl $srcdir/../src/bbva-import-bank.pl --foo 2>&1`;
$output = `perl $srcdir/../src/bbva-import-bank.pl -x 2>&1`;
require_ok("$srcdir/../src/bbva-import-bank.pl");
my $output = `perl $srcdir/../src/bbva-import-bank.pl transactions1.csv`;
my $output = `perl $srcdir/../src/bbva-import-bank.pl --version`;
$output = `perl $srcdir/../src/bbva-import-bank --version`;
like($output,qr/bbva-import-bank \d/, "version number is a number");

As an aside I could be fancy and do the sed+grep all in one sed command. But that would require modifying the sed command. I would need to use -n to tell sed not to print lines and need to add the 'p'rint command to the action. Then need to remove it. Filtering it through a later grep command allows the sed part we want later to be exactly as it needs to be.

And so now let's use the GNU sed --in-place extension to edit those files in place.

rwp@angst:~/src/bank-import-stuff/bbva-import/t$ sed --in-place 's/-cc/-bank/g' ./*-bank.t

And then check the results.

rwp@angst:~/src/bank-import-stuff/bbva-import/t$ grep -e -bank ./*-bank.t
./help-bank.t:my $output = `perl $srcdir/../src/bbva-import-bank.pl --help`;
./invalid-options-bank.t:my $output = `perl $srcdir/../src/bbva-import-bank.pl --foo 2>&1`;
./invalid-options-bank.t:$output = `perl $srcdir/../src/bbva-import-bank.pl -x 2>&1`;
./load-bank.t:require_ok("$srcdir/../src/bbva-import-bank.pl");
./transactions-bank.t:my $output = `perl $srcdir/../src/bbva-import-bank.pl transactions1.csv`;
./version-bank.t:my $output = `perl $srcdir/../src/bbva-import-bank.pl --version`;
./version-bank.t:$output = `perl $srcdir/../src/bbva-import-bank --version`;
./version-bank.t:like($output,qr/bbva-import-bank \d/, "version number is a number");

Perfect! Exactly what we wanted.

As another aside when I originally did this for real I didn't use sed. Instead I used perl. Since perl has always had the -i edit in place option. And perl is available everywhere too. Really it is more portable than using the GNU sed --in-place option. But I was worried that if I mentioned perl that people would get scared off! I didn't want that. But the same command above is very simply done more portably in perl like this. The first checking the action and the second with the -i invoking it for in place editing.

perl -p -e 's/-cc/-bank/g' ./*-bank.t | grep -e -bank
...
perl -pi -e 's/-cc/-bank/g' ./*-bank.t

Pretty cool! In just a moment of command line hacking we have forked the files that we needed and edited them to fork the content.

Maybe later if I add a third bank I will parameterize out the bank part for this basic part of the framework. But when it is the simple tests such as testing invalid options and version output those tests are always the same overhead to parameterize makes them harder to understand and harder to keep updated than simply copying them.

I am sure however that even though this program did not exist 30 minutes ago and now it exists and solves a problem that I needed solved that 10 minutes from now I will hear all kinds of comments about how I am doing it wrong! I shouldn't be using a copy-paste anti-pattern for one. Which I do agree with. But one should not let the perfect be the enemy of the good. It's a process of continuous improvement. Remember the Rule of Optimization: Prototype before polishing. Get it working before you optimize it.