Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×

Comment It Depends (Score 1) 235

I have been involved in experimental science ranging in scale from preliminary survey of the support variable space to rigorously designed (as in design of experiment = "DoX") production support runs. The short answer to your question is: It depends. Mostly it depends on two things:
  • How similar is the space of experiments you are performing?
  • What sorts of questions do you intend to answer from your data?

As an example of the former: The patches of experiment space containing "measure the lifetime of the bottom quark" and "estimate the average length of 5 year old blue whales" are strongly disjoint and there is essentially no description reduction scheme that can handle such a broad range of inputs. Equivalently, "estimate the resistivity of the population of salt bridges I've experimented with" and "estimate the total data production of Earth in 2010" are questions drawn from experiments that are too different to have a unified data reduction description. I've led programs to address this range of problems in several ways:

  • Don't bother with links. Like any other "two representations of the data" problems, it WILL go out of sync as soon as something is reorganized.
  • Data goes in the leaf folders. Subsequent processing, folding, spindling, mutilating, and hand-waving with statistics occurs in parent folders. This typically includes interim reports and similar information. This leads to a strong visual model of data being hoisted from lower directories to higher directories by means of the data analysis tools that are *in* those directories. (This pins the version of the analysis tool that was used, so that the analysis can be replicated together with whatever oddities in processing were in that version of the tools.)
  • For back-of-the-envelope experiments (preliminary support variable space surveys), we tend to store the data in single directories named for the category of experiment, distinct instrumental data streams are stored in folders by instrument name (yeah, yeah, I know, that sounds transverse, but it solves any number of "process all the spectrometer data the same way" sorts of problems because all the spectrometer data is together in one place instead of trying to solve a potentially intractable programmatic data format recognition problem) and files from one "run" are named identically. For small support spaces, the variables values are logged right in the file names. For medium experiments (typically too many variables to make workable filenames) a meta-data file is created. This file either has a rigorous layout of support variable information separated by known section boundaries, or uses a form of pidgin markup (required for, for example, optical filter stacks, where a not-previously-specified number of filters may be electrical taped in a stack) that's not really too complicated, only brackets unformatted strings, but makes automatic parsing of the metadata file feasible.
  • For medium sized experiments, with a specific ending condition (makes more sense in the context of items a couple of bullets down), the pidgin metadata file can be used, but it tends to transmogrify into a *real* (strict) markup language. I'm not pushing XML, but there are plenty of tools out there for automatically parsing XML. However, most of them are broken in that they require loading the *entire* XML file before they start parsing. Oddly, for large experiments (next item), the metadata can be oppressively large.
  • For large experiments, the strict markup metadata file tends to transmogrify into an (actual) relational database. It really doesn't matter which one you use, they are all equally inaccessible to your data analysis tools. You will find yourself writing an export or report routine that dumps the database into something like the strict markup metadata file just so your other tools can access it. This is especially true for large DoX runs, with data gathering occurring in parallel in multiple labs where management wants to see something like a burndown chart showing how much time is left until the first meaningful result is obtained.
  • For medium sized, continuing experiments, a typical example being production statistical process control, there are a number of advantages. The experimental process is *exactly the same* every time, and the same things are measured in the same ways. We've found that dated folders make sense. The most recent example for me is a filestructure of \yyyymm\ddbbb\ where "bbb" is "today's batch number". Really try to estimate the rate at which you will create files. Try to avoid having more than ~100 directories at a level since . When a batch or a day or another recognizable periodic event occurs, have the users post their data to the "I am done with these" script that extracts their metadata, posts that metadata and extracted statistics to a database, then extracts the database (or more often just updates with the newly posted data) into a CSV so that all of your processing tools can get at it again.
  • For large, continuing experiments, additional metadata goes into the directory structure, such as which station, which lab, which satellite, whatever. Typically there will be local datastores that are updated as the data comes off the instruments, then the "I am done with these" script posts those files to the central repository (or mirrors to multiple repositories) and updates one replica of the running database.
  • For ad hoc experiments, the best method we've found is to perform the experiments and arrange their data and analyses as above, then post the experiment reports, interim reports, and survey reports to a wiki, with direct references from the articles to the locations in the filesystem where the supporting information resides. It's imperfect in that transverse queries across datasets that have not previously been renormalized to support such a query are not particularly supported, but *typically* you can find out which piles of data actually contain results relevant to your question. We've also been experimenting with the Semantic Wiki extensions to mediawiki and have found that we can perform queries against abstracted metadata (that has been properly marked up) on the articles, which can allow a more refined picture of which piles of data contain the numbers you want. Of course, at this points, we're back to a pidgin markup, just at a higher level of abstraction about the data.

I'm sure other people's mileage has varied. But that's what's worked for us. I could also tell long, involved stories about what *didn't* work. And while stories about train wrecks can be fun, that's not what you asked about.

Comment Synergy (Score 1) 460

I use Synergy ( http://synergy2.sourceforge.net/ ) to share one keyboard and mouse among several computers' displays. This should allow you to share one keyboard and mouse among multiple X servers running on your machine (and provide the opportunity for future expansion). It can even be used to do nonintuitive things like placing the "screen" of a VM (visible in a window on one of your screens) on an edge of one of your physical screens. (I'm still not sure that was a good idea.)

Comment Re:Everyday - Scams (Score 4, Interesting) 366

If you know where this magic software is that knows almost every useful property of almost every known material, I and my employer would pay huge amounts of money for it. Because the reality is:

* Most materials haven't had any meaningful measurements made for any property that is actually interesting.

* Most measurements are crap. Many published measurements are crap. The amount of practice and control necessary to make useful measurements is outlandish.

* Published data for any but the most lavishly studied materials range wildly. What's the vapor pressure of, for example, RDX at STP. Checking the published sources, you'll find answers ranging over 6 orders of magnitude. So, ..., where does "somewhere between 1 millisquat and 1 nanosquat" fall on this sorted list?

This idea that there's a giant database of materials properties that contains accurate and precise data for all technologically interesting properties of most materials is bunk.

And then, ..., what's hair? Since when did hair become a specific material? Thick hair? Thin hair? Oily hair? Dry hair? Which property were you asking about? Is the hair split? Follicle attached? Old and dessicated? New and slightly less dessicated?

Yes, I think the claim made in the article is bunk. And I bet no one here can provide a single (real) citation to a source for the current-voltage relationship for hair.

Comment Language choices (Score 1) 997

I recommend C or C++ as a first Linux programming language, not for the reasons given above, but because there is a large ecosystem of tools to act on C/C++ source, object files, libraries, headers, and executables. There is nothing wrong with learning the *rest* of the GNU toolchain as well.

In addition and outside the scope of your question, develop a preference for two languages -- the "guts" implementation language and the "pretty" UI language. They *can* be the same language, but I haven't found that this is a workable situation. The tools that you've used on Windows make it possible to write UIs and nuts and bolts in the same language, but even there it's somewhat easier to write "big", "complicated", and/or "performance" code in VC++ and then paste up an UI in VB. I would strongly recommend developing this separation -- always having a text-only interface and additional interfaces that interact with the core through the command line and/or over a network socket. This is one of the few aha/gotcha issues that I was very happy to figure out and sad that no one mentioned it previously.

Finally, learn some "odd" languages: Lojban, Malbolge, Scheme, Erlang, Prolog, m4, INTERCAL, and so Forth, for many reasons, including becoming familiar with some of the other tools potentially in the toolchest.

Comment Some other books (Score 2, Informative) 418

I'd recommend that you start with Sagan, Boundary and Eigenvalue Problems in Mathematical Physics. II.1 The Vibrating String (with derivation from principles). II.2 The Vibrating Membrane (with derivation). II.3 The Equation of Heat Conduction and the Potential Equation (with derivations).

I'd also include Crank, The Mathematics of Diffusion. You have to get all the way to eqn. 1.9 on p. 5 before starting to treat anisotropic media. This derives from and extends Carslaw and Jaeger, Conduction of Heat in Solids.

You will want to eventually read (but not during your class), Frankel, The Geometry of Physics. Bridging the gap between the the Exterior Calculus and what you will see in a PDE class is too much work. However, much like the algebra-based-physics student taking differential calculus realizing how many equations he could have *not* memorized if only he had known how to take a derivative, realizing how much second order differential physics follows directly from the properties of certain forms/bundles/et c. is very enlightening (although somewhat opaque at first).

Running my finger down my math/phys shelf (and skipping those that won't provide much physical basis for the setups):
Jackson, Classical Electrodynamics
White, Fluid Mechanics
Ozisik, Boundary Value Problems of Heat Conduction
Segel, Mathematics Applied to Continuum Mechanics
Shankar, Principles of Quantum Mechanics
Boon and Yip, Molecular Hydrodynamics
Hayes and Probstein, Hypersonic Inviscid Flow
and a seemingly endless supply of books by Greiner.

Misner, Wheeler, and Thorne, Gravitation is probably more index gymnastics than you want to try to absorb for PDE. But it's a fun read, is all about PDEs, and they more than completely ground their derivations in the physics.

You might also want to thumb through Brouwer, Studies In Logic And The Foundations Of Mathematics: The Axiomatic Method With Special Reference To Geometry And Physics, Part II.

Security

Submission + - SSL on IPv6

Fuzzy Eric writes: Shockingly, my boss has asked to duplicate some of our servers to IPv6. We run a couple of SSL-ed websites. My Google-fu appears to be weak because I am not finding much help or warning about using SSL certificates for IPv6 hosts. Do Slashdotters have pointers, stories, or insight about setting up certificates so it doesn't matter over which network layer users reach our sites? ... or will potentially mismatched host address types automagically work?

Slashdot Top Deals

Love may laugh at locksmiths, but he has a profound respect for money bags. -- Sidney Paternoster, "The Folly of the Wise"

Working...