To add detail to the article:
When you're modifying the code in memory (as opposed to say a bootable disk or flash drive) there are a few things you have to watch out for. This is most true when you have no physical access to the device, as in the case of Voyager 1.
So if you've identified that some part of the memory is unusable, you need to ensure it won't be attempted to be used.
Phase I: Setup
Step 1: Allocate a chunk of memory for a temporary region that will do nothing more than simply exist.
Step 2: Set the instructions in that region to return execution to a known-working part of the code with some non-operational instructions in the middle (NOOPs)
Phase II: Re-vector
Step 3: Change all instructions that send execution to the bad memory to now go to the good region
Step 4: Change some of the NOOPs to log the information so you can tell you're now executing new-region stuff, not old-region stuff
At this point you have logs showing that the bad memory is not being used. You know the new region (not large enough) is being used. So now you have to get the code that existed in the damaged memory put in some places (not enough room for one place) and then jump to it:
Phase III: Recreate the code
Step 5: Allocate new regions of memory. Fill them with NOOPs and a branch (or jump AKA JMP) to the next chunk
Step 6: Add a new region of memory with a new routine to go read the stuff you set up in Step 5 to ensure that if something WERE to execute, it would sequentially go through all those new regions allocated in Step 5 and return just fine.
Step 7: Run Step 6, and if it doesn't pass, fix it.
Step 8: Replace the NOOPs in the new regions with the instructions from the damaged original areas of memory.
Effectively at this point you've replaced the original code but instead of one region of memory you're using multiple regions with branch instructions. Note that I'm using generic comments here like "branch instruction" where if we were doing machine code in the 1970s it would be a JMP or a JSR or BEQ or whatever. That's not important to the concept. The important thing is that IF your steps are successful, at each point your system is fully recoverable. IF a step fails it is still recoverable. You only go to the NEXT step upon success of the preceding step.
Phase IV: Activate the new code
Step 9: Vector the original jump instructions from Step 3 to now point to the new code from Step 5.
Step 10: Lose two days of sleep waiting for success.
You can shortcut a lot of these if you have physical reset access. Not an option here. You can shortcut a lot of these steps if you had an A/B memory (also now used on Android devices and immutable operating systems.). Not an option here either. That means in anything you do you should leave the device in a state where it i still usable enough to fix what you broke. That's why you need 10 steps.
But hey what if you had A/B.
1. Copy A to B.
2. Reallocate regions of memory so B can operate in areas where the physical memory is undamaged. Add branches (jumps) to make a bunch of little regions act as one big region. A jump is one of the least-intensive CPU operations because it loads the program counter with a specific address (to go execute code at) instead of merely incrementing. In pseudocode LOAD PC=new-address is computationally simpler than LOAD PC=PC+current-instruction-size. (Some people use "length" instead of size. Whatever. 1970s octets were all the same size and length... this was no TOPS-10/20 system with 36-bit weirdness.)
3. Boot up on B. If fail, reboot on A. Requires a bootloader equivalent (BIOS on DOS, Fastboot on Android, UEFI on newer systems, etc.) Not an option here.
Well, and after all that's done, what do you do to clean up?
Step 11: Have Voyager 1 send you a new data dump of all of memory so you have a new clear copy.
Step 12: Put in some pseudo-reset options so if this happens again you have SOME of the capabilities of uploading code without having to do it step by step
Step 13: Put in logging so that it's more verbose, doesn't chew up very limited bandwidth on the downlink, and doesn't fill up the very limited storage.
If you're thinking any of this is easy
- You're working with less memory than is found in your car's radio
- You're working with less bandwidth than is available on an analog FM radio channel
- You're working with 50 year old hardware exposed to cosmic radiation that is SUCH A ONE-OFF thing there are no comprehensive plans for it
- Latency of 48 hours means the equivalent of "type a key on the keyboard and wait two days to see it display" except of course there's no keyboard and no display and you "batch" up the commands and hope you got the steps lined up in the right order so you don't brick the device.
The good news - NASA has top people who have done tremendously well to get this partially fixed. I have no doubts they will get it fully working... until it breaks again. The 1970s brought us hippies and tie dye... but not really good micro-electronics.