The unlisted aspect only comes through the SS7(PTSN) or SIP(VOIP/IMS) protocol headers with a flag indicating whether the account is private, in addition to phone number paying for call, phone number to display, phone number originating, etc... -- AND -- this meta-data can change during a call if it was rerouted mid stream, delayed headers, etc. This gets even more complicated for reverse billed numbers (800) where the originating number is XXX, the billing number is YYY, the display number is ZZZ, and sometimes an interlink number ends up in there. (and as we found out last month with our call logs, some numbers have yet another header that contains virtualized/multi-ring which need to be taken into account; lest the "wrong" number be displayed)
Now, legally, we are required to keep the originating number, time stamp, and length of call;
And for billing and interconnect agreements, the billing number as well.
As we internally always have full access to the raw protocol data on the Enigeering side; the legal siphon (done at the switch level) just skims off all the legally required data and stores it in long-term storage (not DB); to handle the GBs of data a day of the minimally required data.
We then have a separate process which takes each session and generates a [display-phone number, timestamp] DB for 90 days of call logs for users to look up (or legal requirement on bills for chargable calls made depending on juristdiction).
Under no circumstances have we ever kept the "is unlisted" status of the call; as it's never been a datum required for any business logic, ever.
And when handling millions of calls daily, and relying on switches to read/dump data for secondary systems to process RT is a space and time sensitive process; and thus, only the absolute minimum required is kept to prevent buffer overruns in the data processing phase;
But, as the process is semi-manual to retrieve data for a given time-range I can understand their request to honor "all my metadata" as well.
Limited time-ranges as required by law enforcement is easier to obtain:
- fetch the raw hourly dump files for the time range requested
- run the script that goes through the files and formats a CSV output for any matches of the search phone number
- this process takes hours to run for a weeks worth of data as it churns through TBs of text files if it's outside the 90-day "fresh" window that is stored in a more processed state (but not kept as it's a lot of data to store for no company benefit); most requests from law enforcement only request the last 30days of calls; and this particular process is more streamlined.
- it would be entirely unrealistic to do for the lifetime of a given customer.
One point to take away from this, is that many telecom companies have no interest to keep your data. It's expensive, each item of data adds substantial more costs, overhead, and resource to manage it's storage. It also adds significant more liability as now more people have access to it internally; and safeguards and resources must be used to manage it. Which is why the legal information is done automatically at the switching level, and dumped in a non-processed state; processed and stored, and intentionally kept difficult to access. Because we do not want the liability that comes with storing it, or making it easily available to even a subset of internal employees. Each person that has access adds more risk.
Storing users meta data at least in the telecom world -- is not wanted in the slightest, and we only do the absolute minimum to meet government regulations. Sadly, this also implies that with the current state of laws; that the data is not easily accessible, nor is the data in a state that can be released to a private indiviual without substantial legal risk.