Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
User Journal

Journal Journal: Keystone XL Pipeline Will Raise Gas Prices for US Consumers

Unintended consequences:

http://www.consumerwatchdog.org/sites/default/files/resources/keystonexl_cwd.pdf

There is no shortage of available crude oil, domestic or imported, in the
United States, and for the last few years there has been a glut at the nationâ(TM)s
largest crude oil terminal in Cushing, Oklahoma. Canadian tar sands oil
would be processed for greater use in the U.S. only as other imported or domestic
sources are reduced. Replacing Mexican oil with Canadian oil would
only trade the closer source for the more distant.

User Journal

Journal Journal: Letter frequencies in URLs

Doing some maintenance on a few squid cache servers, I decided to look into the letter frequency distributions for URLs, and how it matches normal written text.
Four caches were scanned for the URLs of currently cached content only, constituting around 1.5 million URLs.

In short, the results have some of the same characteristics as normal text, but with notable exceptions. You don't get an etaoin shrdlu; there are a lot of h, t, p, colons and slashes in URLs which skew the results. I'm also surprised that w scored so low, given all the URLs that start with www.

If anyone else finds a use for this, here is the data. Each character in the URL is followed by the number of times it was used in each cache, plus the total for all four caches.

/: 83198 130244 3028097 2929538 6171077
t: 73026 99729 2727455 2641930 5542140
e: 52801 95537 1746624 1753865 3648827
.: 35317 60175 1478231 1467006 3040729
o: 40941 86873 1423124 1448453 2999391
a: 43075 72450 1408451 1384211 2908187
c: 36078 64921 1308435 1295986 2705420
s: 41946 76684 1251987 1278493 2649110
p: 28248 44907 1214805 1190698 2478658
m: 29609 45768 1168769 1195505 2439651
h: 22543 41992 1029463 1019494 2113492
i: 37846 58586 974977 994693 2066102
n: 30006 51596 815477 795344 1692423
r: 26958 53239 801514 774606 1656317
g: 23689 57734 666533 790131 1538087
d: 23304 36637 746244 697523 1503708
:: 15442 27059 639115 649013 1330629
w: 25563 41061 622672 629215 1318511
1: 9697 12580 577523 561429 1161229
l: 21855 32824 560110 542960 1157749
2: 9890 13516 492565 514385 1030356
u: 11878 15246 440808 431176 899108
0: 10333 13106 404229 445998 873666
v: 7450 8415 328991 292590 637446
b: 9980 26743 280533 285767 603023
3: 6296 6905 299391 272352 584944
f: 9866 25830 265685 266037 567418
4: 4738 5931 273161 244104 527934
k: 4202 5641 235501 230456 475800
5: 5957 6920 212941 235172 460990
7: 6497 7333 230677 200956 445463
9: 4327 5215 206613 195295 411450
8: 5363 6697 210689 178565 401314
6: 5761 6487 209092 175203 396543
x: 3853 5755 168401 144265 322274
-: 3516 11325 124398 133481 272720
y: 4348 5272 114803 96971 221394
_: 2301 2683 87749 80901 173634
j: 4436 5058 89043 72567 171104
=: 1555 1437 37342 35214 75548
q: 1494 1538 32910 37861 73803
z: 741 907 29563 30037 61248
,: 3282 2848 21099 14688 41917
&: 493 413 12558 9222 22686
%: 220 460 9640 11420 21740
;: 2878 2254 8281 8281 21694
?: 322 294 4796 9264 14676
+: 45 35 1333 1758 3171
~: 31 7 996 735 1769
$: 0 0 425 670 1095
^: 6 0 420 228 654
*: 27 10 187 188 412
!: 0 2 282 122 406
[: 0 0 292 23 315
]: 0 0 272 23 295
|: 8 8 77 167 260
@: 10 0 113 38 161
(: 0 0 75 55 130
): 0 0 69 55 124
{: 0 0 75 0 75
\: 0 0 6 4 10
': 0 0 1 1 2

Does it have any practical use?
Perhaps. In proxy.pac files, a common method of load balancing based on URLs, known as the Sharp Superproxy script, is to sum the ASCII values of the cache entries, and mod it by the number of servers, to pick a server to use. .pac files are javascript, and javascript does not have an easy method to return the ascii value for a character. So what's generally used is a function like:

function atoi(charstring) {
    if (charstring=="a") return 0x61; if (charstring=="b") return 0x62;
    if (charstring=="c") return 0x63; if (charstring=="d") return 0x64;
//.....
}

This can be speeded up by ordering the list in the order of frequency, starting with "/", "t", "e", ".", "o", "a" - just moving those few to the front, reduces the latency of the script significantly.

Also, hashing in URL history handling can be sped up if the most prevalent buckets are created. This could also be useful for other URL collections, like AV software URL matching. I am unaware of any that work directly with character based lookups, but it is certainly one way to do it.

Other uses?
In pen testing, having a frequency table like this can greatly aid in URL discovery speed.

But all in all, it was a fun exercise. Note that the variations may be great, especially for the bottom half of the list. Also note that the low count for the letter 'x' in the URLs might not match your users.

User Journal

Journal Journal: Energy Industry Under Attack from Green Terrorists 5

"Just the other day, Duke Energy CEO Jim Rogers said, 'If the cost of solar panels keeps coming down, installation costs come down and if they combine solar with battery technology and a power management system, then we have someone just using [the grid] for backup.' What happens if a whole bunch of customers start generating their own power and using the grid merely as backup? The EEI report warns of 'irreparable damages to revenues and growth prospects' of utilities."

Solar Panels Could Destroy U.S. Utilities

User Journal

Journal Journal: nosupportlinuxhosting fail

I decided to try out nosupportlinuxhosting but they appear to have suspended hyperlogos.org without notice or explanation. I sure hope I don't have to issue a chargeback, that would be stupid. I'm already waiting for Amazon to process two returns that have been sitting around at their facilities for weeks.

User Journal

Journal Journal: This website has gone to shit 1

I'm not talking about the content, that has always been shit
I'm talking about what happens when I try to modify my relationships.

OK

The server encountered an internal error or misconfiguration and was unable to complete your request.

Please contact the server administrator, admin@slashdot.org and inform them of the time the error occurred, and anything you might have done that may have caused the error.

More information about this error may be available in the server error log.
Apache/2.2.3 (CentOS) Server at slashdot.org Port 80

Of course I reported the error. Of course I never heard back. Now I'm seeing it again.
Slashdot is basically unusable without friends/foes lists.

User Journal

Journal Journal: Users opt in, but they don't opt out

Google will tell you that their collection of "anonymous" location statistics is opt-in. That is, technically, true. However, on Gingerbread it is not opt-out. Once turned on, there is no option to disable it, that was added later. That, or I have forgotten all I once knew about google-fu. I can find how to change your mind about this on ICS, but not on GB.

User Journal

Journal Journal: Keep it coming, mod trolls 4

http://slashdot.org/comments.pl?sid=3436185&cid=42803807
http://slashdot.org/comments.pl?sid=3436175&cid=42803841
http://slashdot.org/comments.pl?sid=3430839&cid=42803897
http://slashdot.org/comments.pl?sid=3436185&cid=42804399

I can afford it.

User Journal

Journal Journal: I'm sorry, did I break your concentration? 2

Modtrolls troll mod. Film at eleven.

http://slashdot.org/comments.pl?sid=3428941&cid=42771455
http://slashdot.org/comments.pl?sid=3428941&cid=42771437
http://slashdot.org/comments.pl?sid=3426877&cid=42770781
http://slashdot.org/comments.pl?sid=3427061&cid=42770777
http://slashdot.org/comments.pl?sid=3427083&cid=42770769

Pretty sure this time I offended an iFanboi.

Too bad you can't tag things "slashdot"

User Journal

Journal Journal: Bring it on, bitches 6

Every time I get a modtroll I get validation. Here are the five comments in question, all downmodded in a row by someone too lazy to even spread out their punitive moderation.

http://slashdot.org/comments.pl?sid=3411745&cid=42715053
http://slashdot.org/comments.pl?sid=3412111&cid=42714885
http://slashdot.org/comments.pl?sid=3412141&cid=42714871
http://slashdot.org/comments.pl?sid=3412263&cid=42714843
http://slashdot.org/comments.pl?sid=3412263&cid=42714801

Social Networks

Journal Journal: Internet Bravery 5

I should very much like to demand satisfaction from people who are badly in need of correction. Namely, those who use insults against me in lieu of argument. (I am not above answering insult with insult, of course.) When are we going to get internet VR boxing, so that I can realistically challenge some random dipshit who talks a lot of shit to a boxing match and announce his cowardice should he decline?

User Journal

Journal Journal: Moderation works? 8

Well, some mod douchewaffle spent all his points on me because I'm anti gun control. I heart slashdot.

Slashdot Top Deals

You knew the job was dangerous when you took it, Fred. -- Superchicken

Working...