Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
User Journal

Journal Journal: Transcribing WW1 biography 5

My great great grandmother wrote a biography of her three brothers killed in WW1. I'm typing it all into a LaTeX editor and will be adding a family tree along with a sketched outline of their lives and newspaper clippings.

A best-seller it ain't, but it may interest a few here as these guys show autistic traits and are geeks from just over a century ago.

User Journal

Journal Journal: Review: Bird of Prey 4

TL;DR version: 80s dystopian techno-horror geekfest with relatively accurate portrayal of cryptography and hacking.

Long version: Pretty much the same as above. It's a low budget BBC production that scores highly on accuracy of methods, exploits and technology of the era, insofar as TV ever gets.

The premise: a low-rank civil servant, tracking down bank fraud, discovers a trail of blackmail, corruption by intelligence services, deliberate weaknesses in security and criminal gangs operating with impunity.

By season 2, he's keeping himself alive the same way the Wikileaks journalists did, his wife has what we would call severe PTSD and the body count isn't slowing down.

Given trauma was barely understood in the 80s, the portrayal there and the bouts of temporary insanity are extremely close to what happens, again allowing for this being TV drama and not a psychological documentary.

The storyline deals with cryptography, surveillance society, backdoors and institutional corruption. All hot button issues of today. It even covers the inevitable issues of DIY security.

The conspiracy aspect is a trifle OTT bit, again, it's TV. It has to be to have a program.

It's geared to nerds, geeks and dystopia lovers, though, rather than the mainstream. I saw more reviews in computer journals than in TV guides.

It's the sort of show that would really need updating to be watchable by modern audiences, but fans of older shows would likely enjoy it.

It wasn't unusual for the time, which is the great thing

The 80s were a time for really bleak geek television - Codename Icarus (for the younger viewers), Edge of Darkness, Terry Nation's Survivors, Threads - all productions in this decade.

(Even late 70s had some dark stuff, Blake's 7, The Omega Factor, Day of the Triffids, and ABC/Central's Sapphire & Steel were not light watching. You have to go back to the start of the decade and Doomwatch to see a plausible contemporary dystopia.)

The stuff of a thousand bad dreams, these shows.

User Journal

Journal Journal: Teaching history via RPGs 3

There's a new RPG pack under development, called Carved In Stone. Well, it's called an RPG pack, but basically it's a fairly comprehensive history lesson about the Picts that can be used in roleplaying games. This is quite a neat idea and it got me wondering.

There were, at one point, quite a few historical wargames (Britannia, Decline and Fall, etc) but they were mostly about large-scale strategy rather than the history itself (which was mostly an excuse for blowing up other people's counters). History lessons via roleplaying games sounds quite an interesting approach and could be used to cover all kinds of events.

The expansion pack isn't out yet (it's still in kickstart) but there's enough information about it to get a good feel for how much depth there is in there. If it's done well, it could be very effective in the same way "...and then the Huns came and beat the sh*t out of the Romans before leaving again" isn't. Unless you're a Hun.

I'd like to get people's views on the use of roleplaying games and which system would be best for such gaming. Rolemaster? Call of Cthulhu? The ever-present Dungeons and Dragons? ("My 20th level mage casts a fireball at the fleeing Scots" sounds ahistorical.)

User Journal

Journal Journal: Consumer Genetics, the current state of play

Ok, so let's start be defining a few terms, as it is obvious from Facebook genetic genealogy groups that people are truly ignorant on the subject. (Not that I believe this is common on Slashdot, where we're all much more knowledgeable.)

First off, most genetic testing is NOT carried out by sequencing all of your DNA, a widespread belief that resulted in outrage on one Facebook group when I pointed that out.

The vast majority of consumer testing is done by SNP genotyping. They look at very specific genetic markers and see if those markers have changed from one base pair to another. That's the only type of mutation looked for and they typically look at only a few.

So we've our first way to group companies: sequencing vs genotyping.

SNPs (single nucleotide polymorphisms) are, as mentioned above, one type of mutation. Another is called STR (short tandem repeat), where a block of DNA is duplicated.

FamilyTreeDNA does both STR and SNP testing, STRs mostly for the Y chromosome. Both can be used for family history.

Most labs, though, use only SNP tests. It's quicker and cheaper than counting repeats but with many of the more interesting ones covered by patents or kept private by other means, there's a lot more secrecy involved.

(Note: This has doubtless led to a lot of unnecessary deaths, as genetic markers indicating a high probability of getting certain forms of cancer are being milked by private companies for profit. Few people get more than one test, so most people won't know if they carry such markers and can't take action in advance.)

So the second piece of jargon is SNP vs STR.

Finally, we come to the different areas of DNA. There are regions that are especially good for ancestrial reserch (mostly non-coding DNA), then there's the exome (which is where most of the protein coding takes place), you've telomeres (suicidal buffers between chromosomes, which have a function in longevity), and so on. I won't list them all.

The Y chromosome is particularly good for ancestry, but only has 9 coding genes left in it. It's possible it will vanish in time, but it seems to be fairly stable for right now.

Most companies test only DNA that is good for ancestral research in the autosomal regions (aDNA, the regions outside the sex chromosomes). This allows you to identify anyone who is genetically connected, but because you (on average) get just under 50% (remember, there's mutations in each generation and that DNA comes from neither parent) of your DNA from each parent, the distance you can track depends on how many markers are tested (very few). Reliability falls off sharply.

YDNA (Y chromosome DNA) tests only test for paternal ancestry, but if two people have a common paternal-line ancestor, it's a lot more precise once you're past about second cousins. It's popular with anthropologists as it's very good for tracking how men have migrated.

mtDNA (mitochondrial DNA) is only inherited through the maternal line. Again, it's very popular, this time for tracking how women have migrated. There are certain forms of mtDNA that are linked to health benefits and others to genetic diseases, so this one tends to be the most controversial of the ancestral DNA tests. It also changes very slowly, so you don't get high resolution on population movements.

These two (YDNA and mtDNA) tests can tell you a lot about whether societies are open or closed, and whether it was men who travelled to find partners, women, or both. So we can know something of the culture of even long-extinct societies.

The data I have been able to find is for 2019. It shows: Myheritage tests for 702,442 autosomal SNPS, AncestryDNA for 637,639, FTDNA for 612,272 and 23andme for 630,132. This is out of a total of 3 billion base pairs. So the best test that year looked at 0.0023% of the genome.

ISOGG produced a chart as well, but it's far older. Their chart is dated around 2013.

Since you inherit a random 50% from each parent, the assumption that this is statistically meaningful for such a small fraction of the DNA is questionable. It seems to work adequately, but I'm not sure what the error bars are.

FTDNA also tests up to 111 STRs on regular tests and 600+ STRs for their "BigY" (it depends on the quality of the genetic sample).

Companies that do sequencing sometimes offer partial kits (in the order of tens of millions of SNPs) or full sequencing (which is what the same suggests). These are rarer and more expensive.

Most DNA companies allow you to access the raw data, some only allow it if you pay vast sums of money, and some don't allow you to at all. Always check in advance.

When you download your own data, you can use public databases to search for matches (either for relatives or genetic conditions). The quality of public databases is less controlled, both in terms of privacy and quality of data. However, corporate databases will usually be smaller for both types of data and will also usually not contain data from rivals. If you want broad data sets, public databases are the way to go.

I've only tested with 23&Me, FamilyTreeDNA, CRI Genetics and Nebula Genomics, so can't tell you anything much about the quality of the other companies.

(Ok, I also tested with uBiome, a microbiome testing company in the US, but they had their computers seized some time back due to fraud. I have no idea what happened to my data on there, or whether there's a way to access it.)

The quality seems to be reasonable for all four.

FTDNA is the most expensive for a lot of things, but has less of a sticker shock than Nebula and gets more data than 23&Me. It looks like there are a few companies that are better for ancestry but it's one of the best and the one the Genomics Project used. They're the only ancestral company that gives you STRs AFAIK and they give you a much more detailed evaluation of haplogroups than anyone else I've tested with.

Nebula does up to medical grade (100x oversampling) DNA testing, so if you want results a hospital will trust, that's where you part with a vast amount of money.

23&Me is good for a lot of medical stuff and if you want to help with research is probably the best.

CRI Genetics produces a lot of data with much higher reliability than most of the others, but you can't access the raw data and their databases won't be as extensive. However, because you can't access the raw data, you have to test with them to compare against their database.

User Journal

Journal Journal: Tea 7

I have now passed the total of 30 different black teas. Not fruit, not spice, not herbal, not even green, white or red tea. Just black teas. No, blends like PG Tips and Yorkshire Gold don't count either.

Why so many? Aside from being my current monomania, it's because I'm fascinated by how different they are.

I couldn't tell you the chemistry that makes that difference, nor could I tell you what difference it makes in terms of the various compounds affecting alertness or sedation. (It contains both), in terms of health benefits or even in the simplest term of how water is retained in the body.

But I'm determined to find out at least some of this. It'll have to be on my own, as essentially no research is being done on the subject, and I've no idea of what that'll require beyond a very good gas spectrometer (I'm going to have to count molecules, not atoms).

But I think it would be fun to find out, and definitely worth doing as long as I can figure out how to (a) control the parameters, and (b) afford said piece of gear.

User Journal

Journal Journal: It's surprising to me how vicious atheists on /. are....

I mentioned my faith as a thing that guided me and there was a maelstrom of haters. WTF? If you don't have a religion, you don't hear me trying to convert you. I don't give a Parson's fart if you do. And I don't really care what you think about me. It's just amazing that atheists want to spend so much energy to mock other people.

Kinda fucked up, really.

User Journal

Journal Journal: Still can't believe the NSA is as deep as it is.

The width and breadth of our govt's spying abroad was never in doubt: our abilities are impressive, but not shocking. the shock comes from deciding to turn the lens inward. Exec order 12333 says CIA and NSA look "out" and not domestically at US persons.

It's a slippery slope to start down. Without 4th Amendment protections, we could all be fucked.

User Journal

Journal Journal: Continuation on education 13

Ok, I need to expand a bit on my excessively long post on education some time back.

The first thing I am going to clarify is streaming. This is not merely distinction by speed, which is the normal (and therefore wrong) approach. You have to distinguish by the nature of the flows. In practice, this means distinguishing by creativity (since creative people learn differently than uncreative people).

It is also not sufficient to divide by fast/medium/slow. The idea is that differences in mind create turbulence (a very useful thing to have in contexts other than the classroom). For speed, this is easy - normal +/- 0.25 standard deviations for the central band (ie: everyone essentially average), plus two additional bands on either side, making five in total.

Classes should hold around 10 students, so you have lots of different classes for average, fewer for the band's either side, and perhaps only one for the outer bands. This solves a lot of timetabling issues, as classes in the same band are going to be interchangeable as far as subject matter is concerned. (This means you can weave in and out of the creative streams as needed.)

Creativity can be ranked, but not quantified. I'd simply create three pools of students, with the most creative in one pool and the least in a second. It's about the best you can do. The size of the pools? Well, you can't obtain zero gradient, and variations in thinking style can be very useful in the classroom. 50% in the middle group, 25% in each of the outliers.

So you've 15 different streams in total. Assume creativity and speed are normally distributed and that the outermost speed streams contain one class of 10 each. Start with speed for simplicity I'll forgo the calculations and guess that the upper/lower middle bands would then have nine classes of 10 each and that the central band will hold 180 classes of 10.

That means you've 2000 students, of whom the assumption is 1000 are averagely creative, 500 are exceptional and 500 are, well, not really. Ok, because creativity and speed are independent variables, we have to have more classes in the outermost band - in fact, we'd need four of them, which means we have to go to 8000 students.

These students get placed in one of 808 possible classes per subject per year. Yes, 808 distinct classes. Assuming 6 teaching hours per day x 5 days, making 30 available hours, which means you can have no fewer than 27 simultaneous classes per year. That's 513 classrooms in total, fully occupied in every timeslot, and we're looking at just one subject. Assuming 8 subjects per year on average, that goes up to 4104. Rooms need maintenance and you also need spares in case of problems. So, triple it, giving 12312 rooms required. We're now looking at serious real estate, but there are larger schools than that today. This isn't impossible.

The 8000 students is per year, as noted earlier. And since years won't align, you're going to need to go from first year of pre/playschool to final year of an undergraduate degree. That's a whole lotta years. 19 of them, including industrial placement. 152,000 students in total. About a quarter of the total student population in the Greater Manchester area.

The design would be a nightmare with a layout from hell to minimize conflict due to intellectual peers not always being age peers, and neither necessarily being perceptual peers, and yet the layout also has to minimize the distance walked. Due to the lack of wormholes and non-simply-connected topologies, this isn't trivial. A person at one extreme corner of the two dimensional spectrum in one subject might be at the other extreme corner in another. From each class, there will be 15 vectors to the next one.

But you can't minimize per journey. Because there will be multiple interchangeable classes, each of which will produce 15 further vectors, you have to minimize per day, per student. Certain changes impact other vectors, certain vector values will be impossible, and so on. Multivariable systems with permutation constraints. That is hellish optimization, but it is possible.

It might actually be necessary to make the university a full research/teaching university of the sort found a lot in England. There is no possible way such a school could finance itself off fees, but research/development, publishing and other long-term income might help. Ideally, the productivity would pay for the school. The bigger multinationals post profits in excess of 2 billion a year, which is how much this school would cost.

Pumping all the profits into a school in the hope that the 10 uber creative geniuses you produce each year, every year, can produce enough new products and enough new patents to guarantee the system can be sustained... It would be a huge gamble, it would probably fail, but what a wild ride it would be!

User Journal

Journal Journal: Letter frequencies in URLs

Doing some maintenance on a few squid cache servers, I decided to look into the letter frequency distributions for URLs, and how it matches normal written text.
Four caches were scanned for the URLs of currently cached content only, constituting around 1.5 million URLs.

In short, the results have some of the same characteristics as normal text, but with notable exceptions. You don't get an etaoin shrdlu; there are a lot of h, t, p, colons and slashes in URLs which skew the results. I'm also surprised that w scored so low, given all the URLs that start with www.

If anyone else finds a use for this, here is the data. Each character in the URL is followed by the number of times it was used in each cache, plus the total for all four caches.

/: 83198 130244 3028097 2929538 6171077
t: 73026 99729 2727455 2641930 5542140
e: 52801 95537 1746624 1753865 3648827
.: 35317 60175 1478231 1467006 3040729
o: 40941 86873 1423124 1448453 2999391
a: 43075 72450 1408451 1384211 2908187
c: 36078 64921 1308435 1295986 2705420
s: 41946 76684 1251987 1278493 2649110
p: 28248 44907 1214805 1190698 2478658
m: 29609 45768 1168769 1195505 2439651
h: 22543 41992 1029463 1019494 2113492
i: 37846 58586 974977 994693 2066102
n: 30006 51596 815477 795344 1692423
r: 26958 53239 801514 774606 1656317
g: 23689 57734 666533 790131 1538087
d: 23304 36637 746244 697523 1503708
:: 15442 27059 639115 649013 1330629
w: 25563 41061 622672 629215 1318511
1: 9697 12580 577523 561429 1161229
l: 21855 32824 560110 542960 1157749
2: 9890 13516 492565 514385 1030356
u: 11878 15246 440808 431176 899108
0: 10333 13106 404229 445998 873666
v: 7450 8415 328991 292590 637446
b: 9980 26743 280533 285767 603023
3: 6296 6905 299391 272352 584944
f: 9866 25830 265685 266037 567418
4: 4738 5931 273161 244104 527934
k: 4202 5641 235501 230456 475800
5: 5957 6920 212941 235172 460990
7: 6497 7333 230677 200956 445463
9: 4327 5215 206613 195295 411450
8: 5363 6697 210689 178565 401314
6: 5761 6487 209092 175203 396543
x: 3853 5755 168401 144265 322274
-: 3516 11325 124398 133481 272720
y: 4348 5272 114803 96971 221394
_: 2301 2683 87749 80901 173634
j: 4436 5058 89043 72567 171104
=: 1555 1437 37342 35214 75548
q: 1494 1538 32910 37861 73803
z: 741 907 29563 30037 61248
,: 3282 2848 21099 14688 41917
&: 493 413 12558 9222 22686
%: 220 460 9640 11420 21740
;: 2878 2254 8281 8281 21694
?: 322 294 4796 9264 14676
+: 45 35 1333 1758 3171
~: 31 7 996 735 1769
$: 0 0 425 670 1095
^: 6 0 420 228 654
*: 27 10 187 188 412
!: 0 2 282 122 406
[: 0 0 292 23 315
]: 0 0 272 23 295
|: 8 8 77 167 260
@: 10 0 113 38 161
(: 0 0 75 55 130
): 0 0 69 55 124
{: 0 0 75 0 75
\: 0 0 6 4 10
': 0 0 1 1 2

Does it have any practical use?
Perhaps. In proxy.pac files, a common method of load balancing based on URLs, known as the Sharp Superproxy script, is to sum the ASCII values of the cache entries, and mod it by the number of servers, to pick a server to use. .pac files are javascript, and javascript does not have an easy method to return the ascii value for a character. So what's generally used is a function like:

function atoi(charstring) {
    if (charstring=="a") return 0x61; if (charstring=="b") return 0x62;
    if (charstring=="c") return 0x63; if (charstring=="d") return 0x64;
//.....
}

This can be speeded up by ordering the list in the order of frequency, starting with "/", "t", "e", ".", "o", "a" - just moving those few to the front, reduces the latency of the script significantly.

Also, hashing in URL history handling can be sped up if the most prevalent buckets are created. This could also be useful for other URL collections, like AV software URL matching. I am unaware of any that work directly with character based lookups, but it is certainly one way to do it.

Other uses?
In pen testing, having a frequency table like this can greatly aid in URL discovery speed.

But all in all, it was a fun exercise. Note that the variations may be great, especially for the bottom half of the list. Also note that the low count for the letter 'x' in the URLs might not match your users.

Books

Journal Journal: History books can be fun (but usually aren't and this is a Bad Thing) 2

Most people have read "1066 and all that: a memorable history of England, comprising all the parts you can remember, including 103 good things, 5 bad kings and 2 genuine dates" (one of the longest book titles I have ever encountered) and some may have encountered "The Decline and Fall of Practically Everybody", but these are the exceptions and not the rule. What interesting - but accurateish - takes on history have other Slashdotters encountered?

User Journal

Journal Journal: Nothing changes: different decade, same consumer fraud.

I just looked back at a journal post I wrote from almost a decade ago here on /. It mentions all the same issues of arbitration and attempts by Big Biz to screw the Common Man with contract clauses.

I am not surprised that with so much money at stake for corporate America to steal that they have kept such a tight leash on consumer contract arbitration provisions. These keep a consumer from suing in court, requiring instead a forum more favorable to the Corp. Almost all consumer contracts have been changed to preclude class actions.

In my lawsuit against Sony in August over the PSN data breach, I was immediately faced with Sony changing its contract to preclude class actions. The same for my lawsuit against Citibank in their data breach.

An Australian friend put it this way: "Different dog, same leg action."

User Journal

Journal Journal: hmmm 1

hmmm

Slashdot Top Deals

Anyone can make an omelet with eggs. The trick is to make one with none.

Working...