Please create an account to participate in the Slashdot moderation system


Forgot your password?

More PDF Blackout Follies 309

georgewilliamherbert writes "The latest installment of "As the PDF Blackouts Turn" hit today, with a U.S. government apparently releasing a redacted version of their court filing in the Balco grand jury leak case which merely stuck a black line over the text, which remains available in the document. As with prior documents, entering text cut/paste mode in a normal PDF browser such as Acrobat allows a reader to access the concealed text. Previous incidents include an AT&T filing in the NSA case." This works with Xpdf and KPDF, too; for KPDF, use the selection tool (under the Tools menu) around the redacted section, copy to clipboard, then paste into the text-manipulator of your choice.
This discussion has been archived. No new comments can be posted.

More PDF Blackout Follies

Comments Filter:
  • by Deep Fried Geekboy ( 807607 ) on Thursday June 22, 2006 @10:41AM (#15582413)
    You can open them directly in Safari and cut/paste into TextEdit too.
  • by nweaver ( 113078 ) on Thursday June 22, 2006 @10:43AM (#15582428) Homepage
    Redacting electronic documents right is HARD. See, for example, The NSA's guide to redacting word documents as PDF [].
  • Cache (Score:4, Informative)

    by Rob T Firefly ( 844560 ) on Thursday June 22, 2006 @10:44AM (#15582437) Homepage Journal
    Coral cache of the PDF []

    Anyone into mirroring it?
  • PDF Redaction (Score:4, Informative)

    by Fedallah ( 25362 ) on Thursday June 22, 2006 @10:44AM (#15582440) Homepage
    This is pretty ridiculous. Products have existed for years to take care of this sort of thing, such as /redaction.php [].

    How does this keep happening?
  • by hey! ( 33014 ) on Thursday June 22, 2006 @11:00AM (#15582566) Homepage Journal
    Like .doc, .pdf, and AFAIK the opendoc format.

    It's the same old story as with operating systems or anything else: features are usually either a plus or a "don't matter", except when serious security issues are involved, in which case you can't always predict what is benign, whether in and of itself or in combination with other features. Adobe tried to position PDF for all kinds of other things like portable forms and collaboration, but obviously their users are running into the same problems ad MS Word users have with leaking sensitive information.

    What there should be is a standard document format for outside release of legal or sensitive documents, that doesn't have any features that could be inadvertantly used. Maybe it is RFT or a stripped down PDF; but something where you can tell the intern to release this press release, and not count on him being smart enough to check for hidden comments and workflow information. It sould be WYSIAYG -- what you see is ALL you get -- and any additional features, other than possibly a small and well defined set of metadata, should parse as an error.
  • by The Only Druid ( 587299 ) on Thursday June 22, 2006 @11:08AM (#15582629)
    "Redacted" is a legal term of art (i.e. it has a special meaning in the legal context).

    For lawyers/courts/etc., redacted (Per Black's Legal Dictionary) means:
    n), n. 1. The careful editing of a document, esp. to remove confidential references or offensive material. (Cases: Criminal Law 663; Federal Civil Procedure 2011; Trial 39. C.J.S. Criminal Law 1210-1211; Trial 148-153.) 2. A revised or edited document. -- redactional, adj. -- redact, vb.>

    The lesson here is this: if you see a word used in a legal context (or any professional context) and it sounds entirely wrong...ask yourself first whether it might have a special meaning before complaining.
  • They're correct. (Score:5, Informative)

    by Kadin2048 ( 468275 ) <> on Thursday June 22, 2006 @11:09AM (#15582640) Homepage Journal
    Their use of redact is completely correct.

    If I am releasing a document for publication and decide to remove information from it, this is redaction. It's editing for publication, which can include the removal of information. It could also include the addition of new information, but that's not what typically happens. Redaction can be a form of self-censorship, but it's not always the same.

    Censorship is when a third party, generally a person in authority, suppresses information which is considered objectionable. The 'authority' can be the same as the author (e.g. 'self-censorship'), or the suppression can be indirect -- it need not be editing per se.

    It's my understanding that "redact" is used only in reference to written documents that are being edited, while 'censor' is more general and can refer to anything. The terms are closely related, especially in their typical use, but they're not exactly the same. "Redact" is actually a more specific and precise word for what's going on in this instance. We can argue about whether censorship is also going on, but redaction definitely is.

    Anyway, arguing about definitions by citing dictionaries is always a bit pedantic, since dictionaries are not authoritative except as a historical reference: they can tell you what a word meant at the time the dictionary was written, but not what it means right now, since a word's definition is determined by its usage. All language is inherently arbitrary: they're just sounds we make or things we write down in order to convey ideas, and the relationship between the sounds/characters and ideas is not fixed, but infinitely variable. If everyone were to decide tomorrow that 'redaction' meant the same thing as 'censorship,' that's what it would mean, and next year's dictionaries would have to be updated to reflect that.
  • Congratulaitons. (Score:5, Informative)

    by sammy baby ( 14909 ) on Thursday June 22, 2006 @11:14AM (#15582672) Journal
    Congratulations, Slashdot! The FBI will be along shortly to raid your offices on suspicion of violating the DMCA, the Patriot Act, and probably some other bullshit piece of legislation we don't even know about.

    Oh, yeah - it's a no-knock warrant, so put your pants on now.
  • by RobertB-DC ( 622190 ) * on Thursday June 22, 2006 @11:18AM (#15582700) Homepage Journal
    Redacting electronic documents right is HARD. See, for example, The NSA's guide to redacting word documents as PDF []

    At least it's obvious that the folks who know what they're doing, know that MS products aren't the best solution. From the doc:
    Microsoft Word XP/2003: Microsoft has attempted to remedy certain issues with Metadata in Office XP and up by including a menu option to remove personal information (metadata). There
    is also a tool available for free from MS, Remove Hidden Data 1.0 (for XP) and 1.1 (for Office
    2003), hereafter referred to as RHD, that allows batch removal information from Word
    documents. None of these will remove sensitive information from the main document; neither
    will they remove all metadata of possible concern. And RHD 1.0 suffered from stability issues.
    Reliance of these tools may give a false sense of security.

    The fact that MS tools are in use at all in these situations -- as opposed to free, open-source solutions that can be customized for high security applications -- may show the ineptitude of whatever management keeps signing off on their purchase.
  • by Sir Codelot ( 830933 ) on Thursday June 22, 2006 @11:20AM (#15582710)

    i hate the new acrobat reader. some claim it calls home to the mothership(Adobe) which i dont approve of either (spyware)...

    Then you should try Foxit Reader []. Apart from being free, light-weight and best for everyday use, it also has got a 'Fox' in its name. :)

  • Re:Congratulaitons. (Score:2, Informative)

    by botlrokit ( 244504 ) on Thursday June 22, 2006 @11:26AM (#15582766)
    The FBI will be along shortly to raid your offices on suspicion of violating the DMCA, the Patriot Act, and probably some other bullshit piece of legislation we don't even know about.

    /. doesn't host with AT&T [], so no worries.

  • by Pendersempai ( 625351 ) on Thursday June 22, 2006 @11:30AM (#15582805)
    It's not hard; people just have to manually delete (not obscure) data they want redacted. Then all outgoing Word files should be scrubbed of metadata. There are commercial packages, included in many groupware suites, that do this automatically. At the law firm where I work, every single Word file that gets emailed to an address outside the firm is automatically scrubbed of metadata by the server. If you try to save a document with Track Changes enabled, a dialog box warns you. If you try to email a document with Track Changes enabled, several layers of dialog boxes confirm that this is actually what you wanted to do.

    The procedure you link to has people scrubbing the metadata by copying all the content of the document and pasting it into a new document. This puts too much trust in the user and does not clear some types of metadata anyway.
  • by Noksagt ( 69097 ) on Thursday June 22, 2006 @11:57AM (#15583007) Homepage
    CLI programs are REALLY useful to look at "hidden" content.

    'pdftotext' comes with xpdf & is even available natively on windows.

    Similarly, for MS Word documents, you may use 'antiword' [], 'catdoc' [], and 'wv' [].

    These programs are quite nice in that they can easily batch-process a lot of documents & then you can go grepping through them for interesting tidbits.

    (On the GUI front, evince [] deserves a plug. It uses the same poppler [] backend as xpdf and kpdf. I used to use tiny & fast xpdf for most of my pdf viewing, but evince has a few nice features which xpdf lacks & has become my personal favorite pdf viewer.)
  • Re:Maybe (Score:4, Informative)

    by Doctor Faustus ( 127273 ) <Slashdot@William ... d.Org minus poet> on Thursday June 22, 2006 @11:58AM (#15583018) Homepage
    All you need to do is attempt to convince them that white text is still text, or that black text on a black background is still text. Either way, the text is still there.

    This is a confusion over the way the Adobe Imaging Model works, not white-on-white or black-on-black. In Adobe's model, you start with a blank page, and you essentially paint on it; newly drawn things cover previously drawn things. Basically, despite what the previous commenter said, it really is like a Sharpie.

    When you physically draw over something with a black marker, the previous text may be impossible to see, but it's still there. In the PDF, you'd only have to skip the instruction drawing the box to get the text out. Even if Acrobat didn't let you get at the text by cutting-and-pasting, someone familiar with the PDF format could still get to it with some work.
  • by Namlak ( 850746 ) on Thursday June 22, 2006 @12:02PM (#15583046)
    The industry at large (Microsoft being a big offender) has been trying to get us to a this magical place where everything is system and location independent and this is where we end up:

    1) FTP sites in Windows Explorer look like regular Windows folders. People expect them to work like regular folders. I had a field sales force try to "share" an Excel spreadsheet expecting the others to get a "Read Only" copy just like would happen on a local network share. Overwriting madness ensued. You can't blame them, there was no indication that it would work differently. Asking them to understand FTP is like accounting expecting me to fully understand the accounting rules behind my IT purchases.

    2) A manager where I used to work had an Excel spreadsheet with payroll data for the entire company. He wanted to send each department their subset of the data. So he filtered his spreadsheet and sent the filtered lists to each department not knowing that he was sending each department the whole list under teh covers. Luckily, the file was 30MB and choked in the mail server and I was able to bail him out of that huge mistake. But you really can't blame him - he saw something on the screen and sent "it". There should be an indication of underlying data. BTW, doing a cut and paste special made each file about 25k or so.

    Same thing with this PDF error. If your file shows certain information, it should contain that information only or indicate (or warn) otherwise.

    By "simplifying" everything, nobody knows what's really going on. A couple times per week I have to explain some type of issue to some user about how "It's really more complicated than that, see Windows (or an app) hides this from you." User roll eyes as their simple task has become obscurely complicated - all in the name of making things "easier" to understand, ironically.

    If something works different, it should be displayed different - that at least gives the user a chance to question what they are doing.
  • by blackstripe ( 635857 ) on Thursday June 22, 2006 @12:14PM (#15583146)
    Assuming the original document was in Word format, I'm surprised they didn't use Microsoft's freely available redaction add-in [].
  • by MrCopilot ( 871878 ) on Thursday June 22, 2006 @12:17PM (#15583172) Homepage Journal
    17 Pages. Note to NSA.

    There is a much Simpler Solution.

    1.)Print Document.
    2.)Locate and uncap Sharpie.
    3.) Blackout Text.
    4.) Scan to DocRedacted.pdf
    Wow less than the average government paragraph. Seems like the way they have been doing it for years why change now?

  • by Bill Hayden ( 649193 ) on Thursday June 22, 2006 @01:57PM (#15583871) Homepage
    Using Evince, GNOME's document viewer, you don't even have to copy to another document. Merely selecting the "redacted" text shows the actual text.
  • by GWTPict ( 749514 ) on Thursday June 22, 2006 @02:04PM (#15583920)
    Nah, he just can't spell.
  • Re:Maybe (Score:3, Informative)

    by Gilmoure ( 18428 ) on Thursday June 22, 2006 @04:45PM (#15584972) Journal
    Sharpie def []
  • Re:Maybe (Score:3, Informative)

    by TubeSteak ( 669689 ) on Thursday June 22, 2006 @04:59PM (#15585045) Journal
    I un-redacted the PDF file, as an example for others. Instead of stripping out the black mask, I turned it red & lowered the opacity. It now highlights the 'redacted' portions quite nicely (and any underlining they used).

    Skip to pages 6-16 of the PDF for the not-so-hidden goods poena_sfchronicle_unredacted.pdf.html []

    P.S. I did it in FoxIt PDF Editor Pro, which I wouldn't really recommend to anyone
  • Er... pdftotext...? (Score:3, Informative)

    by digital photo ( 635872 ) on Thursday June 22, 2006 @05:54PM (#15585397) Homepage Journal
    Okay... this is what is considered secured??

    Using a STANDARD pdf handling tool:
    % pdftotext BALCO_quash_subpoena_sfchronicle.pdf

    From the PDF->TXT file:

    [snipped to first line before the "blacked out section"]

      C. Movants' Efforts to Obtain the Secret Grand Jury Transcripts

    [beginning of first blacked out section]

    Prior to the return of the Balco indictments, the lead defendant, Victor Conte ("Conte"), began to correspond via e-mail with Movants. (See Ex. 1 to Donnelan Aff.). Neither Movants nor Conte attempted to keep their relationship confidential, as the e-mail correspondence routinely was reported by Movants.2 (Exs. 1, 2, 3, and 11 to Donnelan 1

    [... snipped for berevity ...]

    On June 23, 2004, Fainaru-Wada sent an e-mail to Conte indicating that he (Fainaru-Wada) was busy working on some stories that may be "up on the web soon. Hope you like t
    hem." (Ex. T to Hershman Decl.). Conte responded that he was looking forward to seeing the article and that his lawyer would be available for comment. (Id.).

    [end of first blacked out section]

    D. Disclosure of the Montgomery Grand Jury Transcript On June 24, 2004

    [more, but why post it when you can read it yourself!?]

    Okay... WTF!? Doesn't ANYONE check this stuff before it goes out the door!?

    OMG! Wonder if this is how our private documents are "made safe"....

Loose bits sink chips.