Anon writes: I work for a project that is digitising a very large (circa 18,000 hours) collection of audio. Data is being digitised now to be sent to cataloguers who will:
Break the longer sections into MP3 tracks.
Provide a one sentence summary.
Provide a full summary.
Fill in a keyword section.
There are also vague plans afoot to try and fit the data into categories, eg religion, songs, tales etc, then to split these down into smaller categories. However this is eating into a lot of project time and causing a lot of headaches due to the fact that many items will defy categorisation.
My question is: In the age of the semantic web is the forced categorisation of any value, or would the data be more accessible from searches purely based on the summaries?
How would you organise data on this scale?
P.S. the audio incorporates 3 languages.