Washington State Archives Go Digital 131
prostoalex writes "USA Today and dozens of others report that Washington state archives went online. Over the past two years project participants scanned 1 million documents issued by state and country authorities. The archive is located in my alma mater Eastern Washington University (go Eagles!) The 800 terabyte storage system was developed by Microsoft and EDS."
Well, (Score:5, Insightful)
Although, it has to be said, I hope they make everything accessable for *everyone*, regardless of OS and browser. No doubt a lot of researchers would be using OS X/Linux/Firefox.
Re:Well, (Score:1)
Re:Well, (Score:5, Informative)
As for OSX and Linux users, there is a plug in for viewing the content needed. But they report to support OSX and "UNIX". The plug-in is called DjVu [lizardtech.com] and has an open source equivalent at sourceforge [djvuzone.org] (with RPMs, OS/2 and even Cygwin support).
Re:Well, (Score:2)
Re:Well, - data lost (Score:2)
So it is not all that it is cracked up to be.
Re:Well, - data lost (Score:2, Informative)
-Joe
Re:Well, - data lost (Score:2)
Oh well my family must not exist.
Re:Well, (Score:2)
They would? How do you know this?
WWW address (Score:5, Informative)
Re:WWW address (Score:5, Informative)
Just in case someone actually wanted the address for the archives it's http://www.digitalarchives.wa.gov/
FYI. Turn on cookies or you receive this extremely helpful error message:
Otherwise, it's pretty cool.
How many terabytes in the archive ... (Score:2, Funny)
Re:How many terabytes in the archive ... (Score:1)
Re:How many terabytes in the archive ... (Score:2)
I hate the twenty second waiting period
Hurrah (Score:3, Funny)
I feel safer already.
Re:Hurrah (Score:2)
hm, according to this [eds.com] link, their CEO is a guy called Michael Jordan
this name seems to attract money alot better than mine
Just another link (or two) (Score:5, Informative)
From the NYSE Site [nyse.com]
NB Archives (Score:5, Informative)
Re:NB Archives (Score:1, Interesting)
Search capabilities (Score:4, Insightful)
As far as i have tried it out in these few minutes, the search strategy is good... there are separate search that researchers can use to know historical data and the like... This is great.
drive letters (Score:2, Funny)
How would windows have enough drive pointers to be able to access this? Would there be a drive AG:?
-Pete
Re:drive letters (Score:1, Informative)
While there is nothing to stop an NTFS partition being 800Tb, it is far more likely that some sort of nearline hierachical storage is being used, the sort of system that is used the world over in workflow/image systems.
Privacy (Score:5, Insightful)
Re:Privacy (Score:4, Insightful)
Re:Privacy (Score:2)
Re:Privacy (Score:2)
Re:Privacy (Score:2, Insightful)
Absolutely. Making "public" records available universally is a different meaning to "public" in public records in situ. Although the word "public" was used, it really meant the local community. When you change that to "everyone in the world with internet access" you change the context in which the data resides... and for data, context is everything. For one thing, it narrows the scope to a small portion of the population so that accurate identification (or, conversely, less mistaken identity) is facilitated
the previous method wasn't great (Score:2)
If you really want to stop abuse, you'll have to make them completely private, not just "private but inconvenient to get to".
Re:Privacy (Score:2, Interesting)
Re:Privacy (Score:2)
Property (real property) records are already public domain--as they should be. There's no good reason for the government not to tell you who owns what land. Whether you find out at the county tax assessor's office or on the Internet is irrelevant.
Aside from property tax information, I don't foresee other tax information being released to the public. Knowing t
Re:BSOD? (Score:1, Funny)
Re:BSOD? (Score:2)
Digital twilight. (Score:5, Interesting)
And I'm still ignoring the fact that machines grow old and has to be replaced. It's a known fact that disks break so You'll need backup but how long could You keep an old storage solution around. Sooner or later You'll have to migrate old backup data to newer media.
Note that I don't think that this is a bad idea, moving everything online, but there are concequences that I don't think that everyone has thought of.
Where I live one can go into the royal library and find (and read) an official document written by someone in the 16:th century, but can we be sure that 100 or even 50 years from now someone can read a DLT300-tape?
Re:Digital twilight. (Score:5, Insightful)
harddrives can easily be replaced (assuming its a sort of raid with hotswap)
sql will also stay around really long, and if not there will be at least a gazillion tools to convert to a new format (it is quite sure that the data will be stored on a sql server)
and as long as the data is safely stored the access mechnism shouldnt be a problem but thats just my
Re:Digital twilight. (Score:2)
Yes, you are correct about this but scale it up a bit. If you have to change media every 10-15 years then the data migration becomes a full time job for someone.
sql will also stay around really long, and if not there will be at least a gazillion tools to convert to a new format (it is quite sure that the data will be stored on a sql server)
and as long as the data is safely stored the access mechnism shouldn
Re:Digital twilight. (Score:2)
some form of a database will be around almost forever, if a new form of storage is invented which makes databases obsolete there will be enough tools around to move the data from a sql database to the new system, simply for the fact that to much stuff is already stored in various databases to let go of it.
additionally if this system is heavily used (i.e really everything enters it) it will grow and with it the hardware will need upgr
Re:Digital twilight. (Score:2)
some form of a database will be around almost forever, if a new form of storage is invented which makes databases obsolete there will be enough tools around to move the data from a sql database to the new system, simply for the fact that to much stuff is already stored in various databases to let go of it.
But this presumes that someone cares enough to do the migration while there are still "enough tools around". If no one cares for enough technology generations the data format will be so old that no one
Re:Digital twilight. (Score:2)
Ah, yes. This is a good point. What if the vendor... of the frontend, or backend, or any of the systems, goes out of business? Then they will be screwed!
Unless, perhaps, they we
Re:Digital twilight. (Score:2)
What should an archive be (Score:1)
An archive should be a Write one, read many file system with Active on-disk (not hierarchical on tape) information with multiple copies preferably at multiple sites (depending on how valuable the data is), with programs for active file validation (you need to be sure the file is still there, and still th
Re:Digital twilight. (Score:1)
Re:Digital twilight. (Score:2)
Yes, I am aware that these records will be destroyed eventually but it has survived more than 500 years of storing without any intervention. I seri
Re:Digital twilight. (Score:2)
Washington State's doing just that experiment. Back in 1992, they created a time capsule using the latest and greatest storage technology: CD-ROMs. The plan is to add new material every 25 years, and in 2492,
Re:Digital twilight. (Score:2)
Paper is not really permanent either. If someone wants to get rid of paper documents, all he needs to do is burn them. Eventuially, in an "accidental fire".
Re:Digital twilight. (Score:1)
Re:Digital twilight. (Score:1)
As another poster in this thread rightly points out, long term digital archiving requires cycles of storage migration. But while this is onerous, it's not the biggest challenge. The biggest challenge is the format that data is
Re:Digital twilight. (Score:1)
Reckon on all your digital data being unreadable (through software obsolescence, mostly) in no more than 50 years time without some form of active intervention. We have paper that's survived for 100's of years easily with no special requirements.
Matt Palmer
Digital Preservation Department
UK National Archives.
no maps? (Score:5, Insightful)
Re:no maps? (Score:3, Interesting)
Re:no maps? (Score:2)
Re:no maps? (Score:1, Flamebait)
Maps & blueprints? What are you, some kind of terrorist!!?!?
Re:no maps? (Score:1)
System Spec (Score:4, Funny)
Microsoft was able to confirm the system is expandable, and contrary to previous rumours, will infact have enough disk space to install Longhorn.
They do however state, that to do anything actually useful, more upgrades will be required.
Thanks for the F'ing Popups (Score:4, Informative)
http://www.digitalarchives.wa.gov/ [wa.gov]
All that work, and.. (Score:2, Funny)
If only someone had told them about Kinko's [kinkos.co.uk].
Re:All that work, and.. (Score:2)
Re:All that work, and.. (Score:2)
Perfect (Score:2, Interesting)
Would you trust a known pedophile to give your kids a bath? If not, then why trust a convicted monopolist who is on the record for purgery with critical documents?
Re:Perfect (Score:2)
Re:Perfect (Score:1)
How 800TB can be locked forever (Score:1)
cheaper linux based product (Score:1)
linux based, postgres db.
Not in full release, not free, but very open.
http://www.archivas.com/ [archivas.com]
Not 800 Terabytes, & using DjVu (Score:5, Informative)
An interesting point is that they're delivering the documents using DjVu by Lizardtech [lizardtech.com], which is GPLd, and developed by the creators of DjVu in conjuction with LizardTech (after a period of LT not-getting-it). The DjVuLibre home page is here [djvuzone.org]. LizardTech still have the best encoders for the format.
Re:Not 800 Terabytes, & using DjVu (Score:2)
Re:Not 800 Terabytes, & using DjVu (Score:4, Insightful)
Unit conversion (Score:3, Funny)
EDS???? (Score:1)
FTA -- "If you mention NMCI, there is an automatic groan," he says. "I think the phrase is, 'I've been NMCI'd.' "
The Article [govexec.com]
Re:EDS???? (Score:2)
Of course, I cant bash them too hard. I'm hoping they'll hire me
teh EDS sux (Score:1)
Actually, the only reports that I have found
Bug Farm (Score:2, Funny)
Run and hide. If there was ever a combination of resources destined to fail it's Windows and EDS. If it works at all I'll be surprised. If it keeps working I'll be amazed.
reboot (Score:1)
Microsoft and EDS (Score:2)
I'm awake, really... (Score:1)
Go Eagles! (Score:2, Interesting)
Re:Go Eagles! (Score:2, Interesting)
Re:Go Eagles! (Score:1)
It's not only a problem with the provider and the infrastructure, but also the management (EWU Department of Housing and Residential Life) not having a fsckin' clue how to manage a network of this size. I've looked into it on several occasions (from both the end-user and systems design perspectives). It's not pretty. Pretty fsckin' ugly, actually.
I don't have enough space h
Re:Go Eagles! (Score:1)
Re:Go Eagles! (Score:2)
The digital archives is a big step for my University. Five years ago we were facing a hostile take over by the drunken WSU, now Eastern is the fastest growing University in the state.
Yes, it's amazing the strides Eastern has made. I was there in the early-to-mid 90s when they sucked in just about every possible way there is to suck. Now, if my other plans don't pan out, I might actually consider going back there to finish my degree. It is true, however, that EWU is still
Re:Go Eagles! (Score:1)
In the Computer Science Department (where I am now a grad student, having gotten my BS degree here last fall), students are taught Java as the base language for the programming classes.
After that, we offer electives in C++, C, Ptyhon, and a handful of other languages.
We also have quite a few L
Admin login (Score:2)
https://www.digitalarchives.wa.gov/WADAAdm
It appears to not be succeptible to a common IIS/ASP script injection bug: ' or 0=0 --
Good work.
Apparently I don't exist (Score:1)
Re:Apparently I don't exist (Score:1)
Re:Apparently I don't exist (Score:1)
Re:Apparently I don't exist (Score:1)
Re:Apparently I don't exist (Score:1)
If that were the case why do they allow you to search for birth records? They even return results too (mainly from Spokane County)... just not my personal record.
Re:Apparently I don't exist (Score:1)
Overkill? (Score:2)
What about 1000 Terabytes? (Score:1)
Re:What about 1000 Terabytes? (Score:1)
Re:What about 1000 Terabytes? (Score:2)
http://www.pcsndreams.com/Pages/Articles/Megaby
I'd never heard of a Brontobyte before, just Yottabyte.
Re:What about 1000 Terabytes? (Score:2)
http://www.pcsndreams.com/Pages/Articles/Megabyte
"Go Digital"??? (Score:2)
What's the date format (Score:3, Interesting)
Has anybody figured out the date formats? I'm seeing a lot like this "02001987". OK, it's either mmddyyyy or ddmmyyyy. But what does 00 mean for month or day? Unknown? It's hard to imagine that they don't have an exact date of death for someone who died as recently as 1987. Or is a zero-based counting system (00 = Jan, 01 = Feb, ...)?
It's interesting that the death records include Social Security Numbers. Anybody want to harvest a few thousand inactive SSNs?
Search for first name "Bill", last name "Gates" (Score:2)
Size is out of wack (Score:2, Interesting)
there are 1 million documents in this database? And it's 800 terabytes? So each doc is 800m in size?
800m EACH? That's freaking huge. Even if the thing is only 8T in size (far more reasonable), each doc is still 8M in size. Again, pretty massive.
is this like that time MSFT bragged about their 1T DB of geological data, and then Oracle
built the same database, with the same content using only 300G of space?
Inefficiency is not
Exhaustive Database? (Score:1)
Re:How to view the records? (Score:2, Informative)
DjVu is a web-centric format and software platform for distributing documents and images. DjVu can advantageously replace PDF, PS, TIFF, JPEG, and GIF for distributing scanned documents, digital documents, or high-resolution pictures. DjVu content downloads faster, displays and renders faster, looks nicer on a screen, and consume less client resources than competing formats. DjVu images display instantly and can be smoothly zoomed and panned with no lengthy re-rendering. DjVu is used by hundreds of acade