This isn't difficult. Ask an llm, in this case, I ask Gemma running on lmstudio :
"using lmstudio how can I prepare the context. In otherwords, I want a standard document which lists rules I want the llms to follow when answering my questions. For example, I don't want it to provide any information without also providing reference links. I often get responses like "There is a research paper named..." and I want the link to the paper and don't want to search for it."
The response it provides is long and detailed. It's really quite good. If you follow the steps, it's really much more reliable than getting constant hallucinations.
If you want it to work like a champion, then ask it
"Is there a way to keep an llm up to date? It would be amazing if I could tell the llm that later today I intend to ask it more information on a specific topic. Do some research while I'm gone. And then the llm would search the internet. It would be even cooler if it could chat on message forums and then check the answers for validity afterwards"
Which will help you setup a RAG.
It sounds like Lawyer dude started his project way too early. And I get it, after all, if he didn't rush in and start early someone else would have. It also sounds like he probably got to the point where the customer expected it to provide better results and if he didn't reach version 1.0 pretty soon, not only would they ditch him, but it would probably slam the doors shut for everyone else. And finally, he probably chased a rabbit down the wrong rabbit hole for far too long and delivered a shit product.
I think to run a project like this, if I were starting today, I would talk with Google (I'd prefer Ali these days, but the whole US government/China thing is an issue), and I'd ask to license Gemma for the base of my own LLM and then extend on that. After all, training your own model from scratch is not only insanely expensive, it's also impressively stupid. Let someone else waste a few gazillion GPU hours to lay down the base weights and deal with all the other training annoyances.
But, again, he sounds like he did a great job suckering some investors into giving him money and now he's trying to convince everyone else that it's not worth their effort to make a competing project because it's really hard to do.
Honestly, cutting a deal with ANY of the mainstream LLMs and uploading the entire legal library of Alaska as RAG data and creating a context rule document which would constrain the answers provided to verifiable fact with linked references would have been far cheaper and far more effective.
Of course, at the current rate of progress of LLMs, I expect by 2030, there won't even be a need for RAGs regarding things like legal references. But this might end up only being possible on Chinese computing systems since OpenAI just killed all western AI research. After all, we spent $32,000 a card on 340 H200 cards last year. They have 141GB each. This is way to small to run decent LLMs on current generation tech. I speculate that we'll see a breaking point closer to 512GB. And I don't think we'll see 512GB from anyone but the Chinese until there are A LOT more RAM factories up and running.