This was pretty straightforward – which was a relief after the difficulties and ultimate failure with Day 9. I also got to use a cool new trick with the GREL cross
function that I literally just learnt about. I made a silly mistake not reading the instructions quite carefully enough which led to an “off by one” error, and I faffed a bit getting the sum (as an aside it’s a shame that a sum across rows is required so often in these problems just to give a simple number to type in as an answer – this isn’t OpenRefine’s strength and it isn’t really the crux of the problem).
Overdue Ideas
Ideas linking Libraries, Computing, E-learning, and anything else that springs to mind.
Author Archives ⇒ ostephens
Advent of Code with OpenRefine Day 9
My first (for now at least) failure 🙁 I’ve found this to be intractable. The problem setup is that given one end (the head) of a rope moves around, track the path taken by the other end (the tail) of the same rope – with some rules defined which determine how the tail moves in relation to the head.
The input for the problem was a list of instructions telling you how the head of the rope was being moved, you had to calculate which points the tail went through as a result.
The rules themselves are a little complicated but I think I made progress in writing some GREL that would correctly output the next location of the tail…. however, to predict the next location of the tail I needed to first calculate the location the tail was currently at. The first move is fine – we can treat the starting point as (0,0) on a 2D grid and if we know where the head of the rope has moved we can work out where the tail will end up. However, to work out where the tail ends up on the second ‘move’ we have to have already worked out where it ended up after the first move – which essentially limits us to calculating one row of data at a time – so with a list of 2000 instructions it would mean applying adding a value at a time to the ‘tail location’ column – so 2000 actions to update the column – which I could do manually or perhaps find a way of automatically generating the same instruction repeated 2000 times to fill out an operation history JSON – but it all feels really messy.
So several days later I’ve decided it’s time to move on and declare this one a loss for me (and OpenRefine) for now. Possibly others might be able to work out a better way of using OpenRefine to tackle this, and maybe I’ve missed a trick in the problem that might let me solve it – but I’ve hit my limit on this and feel it simply reflects the limitations of OpenRefine. Of course I could always write a python program inside OpenRefine to solve this – that’s always a possibility but it’s not the spirit of what I’m trying to do here 🙂
Advent of Code with OpenRefine Day 8
This should have been relatively straightforward, and really was much simpler than day 7 but my brain got stuck on some of the logic for the second part and then I made some stupid mistakes in applying transformations. Eventually I solved the problem with the example data from the description of the problem and checked I got the correct answer – then I was able to extract the transformation history and apply to the larger project – which was a nice use of that functionality in OpenRefine 🙂
The video below skips the actual code I used to work out part 2 so if you’re interested it looked a bit like this:
with(rowIndex-row.record.fromRowIndex,position,
filter(
forEachIndex(
row.record.cells.treeHeight.value.slice(position+1),i,v,if(value<=v,i+1,null)
)
,j,j!=null)[0]
)
)
This code goes through a record tree heights (one record per column or row) and then finds all the trees either before or after the current position (this is what the ‘slice’ does in the middle of this GREL) and then finds the position along from that of each taller tree in the list, and then extracts the first of those positions – which gives us the number of trees you can go along the row/up or down the column before you find a tree the same size or bigger.
Advent of Code with OpenRefine day 7
This was a really tricky one – it took me ages to work out how to approach it and even then, and with a dry run to get the answer, I made several mistakes as you can see if you take a look at the mammoth video (maybe watch this one at 2x speed?!).
This is definitely not the right kind of problem for OpenRefine to solve but it turned out it was possible – just about. I wonder if I missed a trick somewhere along the line and there could have been a simpler approach – after all the simpler question of “how big was each directory” was trivial in OpenRefine – it was the fact that we had to calculate the path to each directory based on the list of ‘change directory’ commands that made it difficult.
Advent of Code with OpenRefine Day 6
I think this might have been the quickest so far, although I’m not sure it’s a particular efficient solve and the GREL was a bit ugly, but it worked immediately.
Advent of Code with OpenRefine Day 5
This was a fun one to do and it took me a while to work out how to approach it, but once I’d realised I could generate an OpenRefine operation history in JSON from the provided instructions and apply it to the data it worked really well (barring a few silly mistakes on my part which you can see me make in the screencast!)
Advent of Code with OpenRefine Day 4
So this puzzle was definitely more straightforward from an OpenRefine perspective. Once I’d got the logic worked out, it all just needed to be applied to each row in turn, and using one of my favourite tools – the custom text facet – the answer was easy to read off…
Advent of Code with OpenRefine Day 3
The way I solved this puzzle has a nice use of value.split(//)
to get an array of letters from a string, as well as:
That’s quite a list of functions for something I thought was relatively straightforward!
There’s also a more appropriate use of the OpenRefine Record mode in the second part (although I still revert to the horrible hack used in day 2 to add together all the numbers in a column). Creating the records in the second part includes a neat hack to group together a set number of rows which works easily because of a quirk of the way OpenRefine handles numbers which I completely fail to explain properly in the screen capture!
Advent of Code with OpenRefine Day 2
The day 2 challenge led me to some manipulation of the cross
function in GREL, and a horrible hack to add up all the numbers in a column (really not recommended in real life! If only I’d got around to doing some work on this suggested new “Statistical” numeric facet there would be a much better solution!)
Advent of Code with OpenRefine
In case you’ve not come across it… Advent of code is:
… an Advent calendar of small programming puzzles for a variety of skill sets and skill levels that can be solved in any programming language you like. People use them as interview prep, company training, university coursework, practice problems, a speed contest, or to challenge each other.
My son had started working on it this year, and inspired me to have a go as well. However, rather than following the usual route of solving the puzzles by writing computer programmes, I’ve decided to try to solve each one using the OpenRefine software which is designed for working with large data sets.
After a few days I can already tell you OpenRefine is definitely not always the right tool for this kind of job… but that’s OK. This is just a fun challenge and the point (for me) of doing in OpenRefine isn’t to find the best of most efficient way of solving the problem, but rather to have fun, challenge my brain, and explore the abilities, and limitations, of OpenRefine along the way.
Each day there are two challenges, and I’m going to try to post a short video for each day I try and see how far I get with the two challenges in OpenRefine. I’m playing catchup at the moment and have already solved the first few days but I’m only just getting round to recording a video…. so here is Advent of Code 2022, Day 1, solved using OpenRefine