Follow Slashdot blog updates by subscribing to our blog RSS feed

Bow Tie Theory: Researchers Map The Web 133

Posted by Hemos on Tuesday May 16, 2000 @08:27AM from the the-logo-looks-neat dept.

Paula Wirth, Web Tinker writes "Scientists from IBM Research, Altavista and Compaq collaborated to conduct the most intensive research study of the Web. The result is the development of the "Bow Tie" Theory. One of the initial discoveries of this ongoing study shatters the number one myth about the Web ... in truth, the Web is less connected than previously thought. You can read more about it "

This discussion has been archived. No new comments can be posted.

Bow Tie Theory: Researchers Map the Web

Load All Comments

Search 133 Comments Log In/Create an Account

Comments Filter:

how does it work? what is next? (Score:1)

by segmond ( 34052 ) writes:

...and how does this work? Do they count the number of links on the web, and count the number of pages pointing to it?! If a page is not connected, how do they know about it? Do they scan the entire range of valid ip addresses, attemping to connect to port 80? What if someone has a web server on port 8000/8080?

Now they have showed that the web is not as connected as many people have speculated, do they have any solutions?! What are possible solutions?
Have YOU really thought it through? (Score:2)

by kwsNI ( 133721 ) writes:

1) This is not really "news for nerds", in the sense that it is not about Linux or open source issues.
OK, first off, I'm really getting sick of this. Nerds don't have to be interested in Linux or Open Source to be a Nerd. That's how you define yourself and therefor you think that all nerds should be like you, right? Because this doesn't interest you doesn't mean that you have to troll the article or that it's not important to other people that consider themselves nerds...
I have been using the Internet almost since it started, and can even remeber the pre-web technologies like "gopher", "wais" and "veronica". If I need to find a web page, I can always use one of the major search engines like googal and altavista it doesn't matter to me if Joe Average's page is not linked, since it is probably something he hacked together one evening and put up at geocities, and has not updated it for over a year.
Wow, how often do you use "googal"? I mean, if you can't even spell google right, why should we believe that you have any knowledge of the internet? And the typical "Joe Average" doesn't have that page up there for you, it's for family and friends. Very few personal pages have a target audience larger than people they know or people that want information on them.
I find all these "personal" pages on the web are a major irritant, as they seldom contain useful information, and they clog up the search engines with non-relavent crap, by polluting the search space.
If I want to know what Joe Sixpack in Assmunch Arizona called his dog, or to see pictures of his pickup truck, I would ask him. But I don't.
Have you heard of logical searches? If you know how to search the web properly, you should be able to find just about anything that you are wanting within the first 5 hits. Know what search engines to use for what you want and how to use the logical operators to filter that "non-relevant" crap.
It is about time that us "geeks" re-claimed our Internet from the dumbed down masses. We should return to the days of ARPA, when only people with a legitimate requirement could get net access. The "democratization" (i.e. moronification) of the web has gone too far and is responsible for the majority of problems us "original internet users" are seeing. The flood of newbies must not only be stopped, it needs to be REVERSED. These non-tech-savvy people need cable TV, and not something as sophisticated and potentially dangerous as the Internet.
Perhaps a new more exclusive "elite" (in the good sense of the word) Internet should be set up, running only IPv6. Then we could capture some of the community spirit of the pre-AOL "good old days". And maybe these spammers, skript kiddies and trolls would back off.
Ooh, just what the web needs, more "elite" people like you. Dammit, the web is about information, equality and business. It's not just for you "31337 H@X0RZ" anymore. Grow up! Most of the technology that you're using today was devoloped because of the popularity of technology. You try to "reclaim" technology for your little group and you'll shink the market for it so much that companies won't bother with it.

kwsNI
Re:Sources and sinks (Score:1)

by VSc ( 30374 ) writes:

1) How many corporate pages EVER link outside their site?
I would say it would be a generally bad idea for them to link outside, as the ultimate goal is to keep visitors inside as long as possible. That's an absolute no-no in corporate web page design to link outside..

__________________________________________
Dark Matter pages (Score:1)

by Yojimbo-San ( 131431 ) writes:

The assertation that around 10% of the "web" is disconnected is interesting. Just how can the existance of a disconnected page be inferred? And as has been pointed out, what about password-controlled areas and dynamic content? However, leaving that problem aside for the moment...
I'm sure that within these "Dark Matter" pages you would find a large number of "unsavoury activities" being carried out. Perhaps it would be valuable for an ISP to consider scanning web server assets (HTML, etc) to determine how much of it is unlinked - and to make a ToS/AUP that bars sites with too high a Dark Matter percentage. Especially if the server logs show traffic hitting the dark pages. Prime indication that something "non-connected" is going on - which if it isn't outright illegal is probably at least non-PC, webwise.
Thanks for the obvious (Score:1)

by Petrarch ( 178222 ) writes:

I don't know about the rest of you, but this bow tie theory is how I always imagined the internet to be. Of course I've been online since the early 90's and thus have had a long time relationship with my computer and the internet and have used it and seen it grow to the way it is today.
I think the author of this article miswrote it's title, it should have been something like... Dumbasses who believe everything they read and hear about the Internet and cannot think logically for themselves should all be shot and refused Internet access. Researchers at IBM, Compaq and AltaVista have done a study to show what anyone with any intelligence would already know.
But perhaps that was a bit too long.
Re:Have they really thought it through. (Score:2)

by 348 ( 124012 ) writes:

The VC's get their money back from the infrastructure companies. Remember back in the gold rush days, the minors didn't make much money overall. The folks who made the gazillions were the folks selling shovels and building railroads. Todays methodology is no different. Yeah, they make money in the long run from the end consumer, but the big payoff is in value trades from the infrastructure parties. They have all the power and they know it. Power is the commodity and the currency, not the electronic dollar.
The good ones are expecting their consumers to pay them back, the bad ones are trying to IPO.
What does this mean? Do the Ciscos of the world expect to stay in business by having end consumers a penny at a time repay their VC debt? I wouldn't think so. And only bad companies IPO? Thats a rather shallow view isn't it?
A distributed effort to create a map of the web? (Score:1)

by PerlGeek ( 102857 ) writes:

One project that I've wanted to see done for a while now is a total mapping of the internet, to find every website. Of course, trying every possible domain name won't work, too many possibilities, and it'll miss web servers that don't have dns entries. What if we could map every IP address, finding out which ones had web servers on them?

Computer A connects to 123.45.12.34
Computer A sends http request to 123.45.12.34
If request is replied to, add IP to list of web server adresses - If request is ignored, increment IP by one and repeat

Trouble is, we need a way to scan through these adresses in a way that doesn't take too long, preferably, less time than the half-life of a web server. What good is a list of web servers if half the entries are invalid? It might make a great starting point for a webspider, but it's no Encyclopeadia Internautica.

4 billion adresses, 1 hundred computers dedicated to searching, 5 seconds spent on each IP address. ETA: 6.8 years. A joke.

4 billion adresses, 10 thousand computers in a distributed network, 10 seconds spent on each IP address. ETA: 49.7 days. Doable.

We need a Seti@Home approach to this. Naturally, IPv6 presents a new problem. :)
Re:Sources and sinks (Score:2)

by sammy baby ( 14909 ) writes:

1) How many corporate pages EVER link outside their site?

There's actually a very good reason for this: include a link on your page, and there's a non-zero chance that the viewer will follow it. If the web page in question is essentially an ad (which many pages are, these days), having someone follow a link off-site is like watching them change the channel when your commercial comes on the TV. Why provide them with the out?
Re:I doubt you're a developer (Score:1)

by Bongo ( 13261 ) writes:

He does need his wife in front of the camera to do the news, since he's behind it and on the control board.
Heh. What happens when they have a row...? Does he get insulted in front of the whole community ??
Goodevening, and here is the news. Today I caught my husband leering at the neighbour's teenage daughter...
Of COURSE there are four groups (Score:4)

by Animats ( 122034 ) writes: on Tuesday May 16, 2000 @08:41AM (#1069983) Homepage

Given the criteria they picked, there have to be four groups. The binary-valued criteria are "has links to it" and "has links from it". There are then four possible combinations. All four exist in practice, which is to be expected. Big deal.

Share
twitter facebook
To: whomever knows graph theory? (Score:1)

by Anonymous Coward writes:

What effect might the XML-ish multiway links do to this type of network? Someone made a comparison to "Hollywood Society" in an article @ ZD over the weekend. (Top actors/ moguls surrounded by the Pantheon of "others" in the social network.) Human societal structures are almost always made up of multidirectional linkages; is there any reason to think multi-directional links will change the "shape" of the Net?
Re:The /real truth/ about web's topology... (Score:2)

by Devil Ducky ( 48672 ) writes:
You did forget one important subdivision of the web.
- p0rn
It has to be on it's own, slashdot wont link to it, and we do generally care to count it. Someone had to say it. Devil Ducky

Devil Ducky
Looks like an amoeba (Score:1)

by roman_mir ( 125474 ) writes:

That thing (bow tie) looks like an amoeba http://www .nature.com/nature/journal/v405/n6783/fig_tab/4051 13a0_F1.html [nature.com] - a nucleus with protoplasm and tenticles.

We all knew that Internet was in its early development stage but I thought it was already closer to some multicelled trilobite [magma.ca] then a single celled organism!
Re:Alternate Pasta-Based Web Theories (Score:1)

by marnanel ( 98063 ) writes:

Well, unstructured code has always been compared to spaghetti [tuxedo.org]; that must be a complement for a hypertext corpus (since hypertext is, after all, supposed to be unstructured). :)
What about secure sites and generated pages? (Score:1)

by BoLean ( 41374 ) writes:

The study does not mention the impact of secure sites. For that matter, any site that the search engine couldn't crawl. For instace, what abot PHP or other script generated sites? Do these show up as "outs" or strongly connected or dead links? I have not seen anything in the paper addressing this. I have to imagine there are a fairly large number of scripted pages and secure websites.
How to clean up the Internet (Score:2)

by Anonymous Coward writes:

Yes, I agree entirely with you and have been considering the issues you raise for some time. The most urgent needs facing the internet today are a) to get rid of current users who aren't capable of using it, and b) preventing further users from accessing it in the future. I have some ideas on how to proceed with these two points.
IMHO, the solution is to stop letting everyone access the web. There are two ways this should be implemented. Firstly, anyone under the age of 18 (or 16 or 21) should not be allowed on the web at all. Until they are adults, they cannot be trusted to handle the large amounts of dangerous information which the web can provide to them, and during this vulnerable stage in their life they can be swayed by rhetoric and promises. Doing this will immeadiately stop the market for censorware and filtering, since if only repsonsible adults can use the net, then they can handle what is seen there, and get rid of pirates, script-kiddies and trolls who are almost exclusively under 16.
Secondly, access to the web should be dependent on some kind of examination process, whereby people who want to use the web have to take a test to determine their suitability. In this way we can weed out the undesirables from the net and make sure that the content on it is of uniformely high quality. Rather than having sites dedicated to racist hate, terrorist manifestos and anti-Christian diatribes we can have decent sites which educate and enlighten readers, like we had before open access.
Now, I know these comments will offend some /.ers, but try to look beyond your liberal hand-waving for a minute and think about these proposals. The net is becoming a cesspool, and this is the only way to clean it up.
Good to see a study on this... (Score:2)

by The Queen ( 56621 ) writes:

Of COURSE there are a bunch of 'dead end' and non-connected sites. There are a thousand web rings for Leonardo DiCraprio just languishing, having been abandoned for whoever is hot now... argh...
I love Google, but lately when I search I get more results consisting of dead links and posts to message boards than any useful info. I've been on the mailing list for the Search Engine Watch [searchenginewatch.com] newsletter for a couple years now, and while there's a lot being done to weed through all the fluff, IMHO the fluff is growing at too high a rate for the technology to keep up with presently.
Anybody currently active in the industry got an insight into how search engines are combatting all this expired flotsam?

The Divine Creatrix in a Mortal Shell that stays Crunchy in Milk
Re:Sources and sinks (Score:2)

by smileyy ( 11535 ) writes:

...which is why so many corporate web sites suck.

They haven't realized that people want information from more than one source. They also haven't realized that providing links to those alternate sources will improve their credibility.
Re: Personal/Coporate (Score:1)

by Tibi the Troll ( 153360 ) writes:

It's a shame search engines don't more clearly seperate corporate and personal websites, like the telephone directory.
Having said that I guess some pages would be hard to categorise.
Re:The web is broken. (Score:2)

by Anonymous Coward writes:

All opinions are my own - until criticized

Just so you know, I'm not here to criticize your opinions. I'm here to criticize your sig. The first, most obvious problem, is that you are missing a period at the end of your sentence. Please fix this. Secondly, you should not have hyphenated that sentence. That's just wrong in so many ways. In mid-sentence -- like this -- you use a dash. In plain ASCII text, a dash is two -- count them -- two hyphens. There are other characters available, but those fall outside the 7-bit range and therefore, they cannot be trusted. Not that any of this matters because you should have used parentheses (the little round things on either side of this little comment here) or an ellipsis...

Here are some samples of what your sig should look like:

All opinions are my own (until criticized).

or

All opinions are my own ... until criticized.

You must please understand sigs are very important. Unlike comments, you can change your sig and fix it and make it look pretty. Anybody that criticizes spelling in a post is an elitist and a hypocrite, but sigs can be changed. You can make a difference!

That said, it's time you got yourself a new sig. Thank you.
Re:How useful is this? (Score:1)

by jpatokal ( 96361 ) writes:

All this tells me is that developers are selective in what they link to. Some tend to get together and link to each other.
And these, the knot in the bowtie, are the meat of the Internet: real content with links elsewhere to cover what's not on that site. One example would be Slashdot.
Some tend to link only to themselves.
Your average corporate site...
Some want to be noticed so they provide lots of links, but aren't truly interesting, so nobody links to them.
...and your average home page, which more often than not contains a 100K list of bookmarks.
And finally, we have the disconnected pages linked to originating sites, which are linkless homepages and other contentless cruft.
None of this is particularly surprising, we've all seen examples of each type, although the ratio might have been a bit of surprise -- it seems to be about a 50-50 split between commercial and non-commercial. What would be even more interesting is a traffic analysis: how much of the Web's traffic is in that compact 30% core? I'd wager around 90%.
Cheers,
-j.
Re:Just Wondering (Score:1)

by Chiasmus_ ( 171285 ) writes:

I'm sure there are a lot of sites with no possible access points. If some kid buys 10 megs of web space for $5 a month and sticks four pictures of himself and a message saying he's l33t, unless he personally hands you the link, you'll probably never find it.

I didn't say that was bad.
Doesn't have the same ring to it. (Score:2)

by kwsNI ( 133721 ) writes:

Sorry, but "World Wide Bow Tie" just doesn't do it for me. Plus, we'd have to rename W3C to W2BTC, and that would just screw things up...

kwsNI
Is this a dup also? (Score:1)

by djweis ( 4792 ) writes:

I'm experiencing deja vu.
Obvious glaring error (Score:2)

by NaughtyEddie ( 140998 ) writes:

Somehow they managed to catalogue 44 million pages which "were not linked to from anywhere".
How did they do this? They used Altavista.
So their entire theory of "bow-tie connectedness" conveniently forgets that Altavista exists. Fortunately for us web users, Altavista (insert your favorite search engine) does exist, and its existence seems to invalidate their hypothesis.
So it's an interesting idea, but if it ignores the existence of search engines it doesn't really hold much meaning.
Alternate Pasta-Based Web Theories (Score:3)

by joshv ( 13017 ) writes: on Tuesday May 16, 2000 @03:40AM (#1069999)

This was shortly followed by announcements from the W3C of the 'Angel-Hair', 'Fucilli', and 'Linguini' web theories.

-josh

Share
twitter facebook
Web Theories (Score:1)

by Bob McCown ( 8411 ) writes:

A bowtie? No style...how about a noose? Or one of the single-surface solids?
I wonder if pr0n sites map out to a 'dirty picture and paper towel' image?
Re:Is it just me ? (Score:1)

by SlashParadox ( 180018 ) writes:
It doesn't do you any good, but think of all the money to be gained from selling nifty graphical representations of the web now that we know what it is shaped like.

IBM has hit a gold mine. Think of all the things they could sell!
- Bow ties!
- Pictures of bow ties!
- more bow ties!
Re:Have they really thought it through. (Score:2)

by zog78 ( 90787 ) writes:

I find all these "personal" pages on the web are a major irritant, as they seldom contain useful information, and they clog up the search engines with non-relavent crap, by polluting the search space.

Really I think you should be blaming the search engines for that, not the web itself. It's the search engines who index it, after all.

The most convenient way to fix this problem would simply be for all your favourite sites to use meta information properly. This is exactly what it was designed for. Unfortunately there are too many lazy designers around that don't bother to implement it properly, so it's no wonder that search engines have trouble indexing and promoting it appropriately. Most geocities users who don't update their homepage for a year won't know or care about how to use meta info effectively, and it would quite easily demote their pages by default.

I don't want to sound too boring but one of the best things about the net in its (mostly) unregulated state today is it's openness and how it lets information be distributed so easily. Sometimes this information is unreliable but the same mechanism can't prevent open debate about the same information, either.

Personal homepages are simply documents that somebody has placed on a server indirectly attached via a network to your own. If you don't like them, disconnect your computer from that network. If you want a censored system, then by all means design it, patent it, and only sell the rights to the people you want to use it.
Re:The web is broken. (Score:2)

by DrSkwid ( 118965 ) writes:

only if you want super gee wizz stuff

the bleeding edge will always be messy as new technologies race to be ahead and others fall down

it's ALWAYS been like that in virtually every field of study in computing and no doubt humanity

ride the wave or swim back top shore, your choice

.oO0Oo.
This is no surprise (Score:5)

by ballestra ( 118297 ) writes: on Tuesday May 16, 2000 @04:16AM (#1070004) Homepage

Slashdotters, and especially Everything [everything2.com] noders, are good at including relevant links in their posts, and presumably on their own pages. The problem is that most of the content being created for the web is written the same way as traditional magazine or newspaper copy. It's the old 90/10 rule: 90% of the eyeballs are viewing 10% of the available content, and that 10% is generally on commercial sites one or two clicks away from the Yahoo [yahoo.com], Netscape [netscape.com], MSN [msn.com], or AOL [aol.com] main pages.
Look at the money going into streaming media [windowsmedia.com]. A large segment of the business world still sees the internet as just another medium for TV [den.net] or radio [realnetworks.com] broadcasting. By it's very nature broadcasting is not interconnected, it's passive and linear.
Tim Berners-Lee [w3.org] wrote in his book, Weaving the Web [w3.org] that the main obstacle to the web being a true information web of shared knowledge is that content is controlled by too few. He was upset that browsers were developed which could not edit web pages like his original browser/editor [w3.org].
The silver lining to this, IMHO, is the "weblog" phenomenon, including sites like Slashdot, where ordinary users can contribute their ideas, especially in html format so that they can contribute links. I really believe that some day soon the conventional media sites will be forced to give this kind of capability to their readers, or else risk losing all those eyeballs to Slash-like sites.

"What I cannot create, I do not understand."

Share
twitter facebook
Re:The web is broken. (Score:2)

by guran ( 98325 ) writes:

Amen
Netscape and Microsoft have market shares enough that their "features" are used, but none are big enough to set a de facto standard.
Wouldn't it be nice if *one* browser had a flawless implementation of the W3C standard?
why didn't you link to the abstract (Score:1)

by DrSkwid ( 118965 ) writes:

the abstract was far more informative and nerdish than the magazine style of the linked story

lower those common denominators

coming soon :

!!London Bus found on the Moon!!
!!new Yeti! pictures!!!!
**win win win**
.oO0Oo.
How useful is this? (Score:4)

by Shotgun ( 30919 ) writes: on Tuesday May 16, 2000 @04:19AM (#1070007)

All this tells me is that developers are selective in what they link to. Some tend to get together and link to each other. Some tend to link only to themselves. Some want to be noticed so they provide lots of links, but aren't truly interesting, so nobody links to them.

This makes complete sense. If every page had links to every other page, you would never be able to find anything. Each page would have too many links. The way the web is developing, you start looking for info within the IN group (usually a search engine or someones index page). This lead to the SCC which eventually points you to a leaf node in the OUT group which has the truly interesting information.

I find this structure to be efficient and elegant.

Share
twitter facebook
Re:Have YOU really thought it through? (Score:1)

by guran ( 98325 ) writes:

Calm down, he was trolling you.
It wasn't always that way, now was it? (Score:2)

by plumpy ( 277 ) writes:

Remember the days when you would go to a web page and every sentence had at least one link? Even corporate sites weren't shy about doing off-site linking.

Of course the web was atrocious, but if you found a dumb page (take my old one [skylab.org] for example) there was always something linked to that WAS moderately interesting.

Wiki [c2.com] pages are awfully remniscient of the "old web". (Of course that one is centered around eXtreme Programming and kinda boring, IMO, but it's the principle!)

Oh well. The corpratization of the web has brought lots of cool things, too; they're just harder to find now.
Re:Dark Matter pages (Score:1)

by StrawberryFrog ( 67065 ) writes:

Interesting. I have on occasion uploaded a Mb or so of holiday pix to my ISP-provided web space and given the "dark" URL to a few friends.

> bars sites with too high a Dark Matter percentage

Um. Censorship, especially that not based on an analysis of the actual content is like, real bad.
Re:Why more connected? (Score:2)

by Hard_Code ( 49548 ) writes:

"I'm not arguing for linking to random information just because you can, but informative linking is why hypertext has the hyper."

And I'm not arguing the opposite. We shouldn't just link every single word to Everything2.com just because we can, and, God Forbid, our site would not be linked enough to the core if we did not. The content has to weighed. What frustrates me even more than a page with absolutely no applicable links (when it would be useful), is a site which has big blue glaring links all over the place and I can't find /anything/ relevant. An example would be Microsoft's knowledge base/help (sic) site. Try to find something there.
Danger Will Robinson! (Score:1)

by Trombone8vb ( 110011 ) writes:

"If you know who's linked to you, then perhaps you know your content is valuable. (You might say) 'Hey, let's throw up a royalty, a fee for pointing to me,' " he said.
This is comming from an IBM spokesperson. Is anyone else upset? Charging for linking?
Re:Danger Will Robinson! (Score:1)

by Trombone8vb ( 110011 ) writes:

Oops, forgot to give you the source. It's from a ZDNet article on this story. [zdnet.com]
Sheesh, Troll? (Score:1)

by Richy_T ( 111409 ) writes:

Flamebait perhaps, offtopic undoubtedly but troll?
Can we get some kind of IQ test going before allowing people to meta-moderate? Something along the line of:
Q. You have five apples and you eat one, do you have
1.More
2.Less
3.My butt itches
Rich
Re:US right to vote is worthless anyway [OT] (Score:2)

by B. Samedi ( 48894 ) writes:

Endless? I doubt it. Start a business. Chances are it will go belly up in the first year or so. Take a job. You will be working for someone else and giving them the fruits of your labor while they hand out a small pittance to you (not bad sometimes but at least realize it). Vote? Doesn't really count unless you organize a large group of people to vote like you do (special interest groups). How about something simple like build a shed on property you own with your tools, made with wood you bought (or made if you can do that)? Sure you can do that. Assuming you have the permission of the locality you live in. Or you could just build it and have the locality order you to tear it down and get permission first.

Let's be honest now and drop the sarcasm. In America we are free... to a point. We live under a mass of laws that have been enacted over time to appease one group or another. Some of this is good... some of it is bad or just down right unenforceable, in and of themselves. As for standing a better chance here then anywhere else I think you would have a pretty good chance in Canada, Great Britian, or several other countries. America does not have a lock on success in this world. We happen to just be the most arrogant about it (unfortunatly).
Impact of secure sites and generated pages? (Score:4)

by BoLean ( 41374 ) writes: on Tuesday May 16, 2000 @04:21AM (#1070016) Homepage

The study does not mention the impact of secure sites. For that matter, any site that the search engine couldn't crawl. For instace, what abot PHP, ASP or other script generated sites? Do these show up as "outs" or strongly connected or dead links? I have not seen anything in the paper addressing this. I have to imagine there are a fairly large number of scripted pages and secure websites. What about AD-click links, does this make the web appear more connected just because Ads appear on a otherwise dead end page? There may be reasons to question the validity of any such reasearch done using web-bots since the nature of web-content has rapidly changed over the last few years.

Share
twitter facebook
Linked by (Score:1)

by marat ( 180984 ) writes:

How about having 'Linked by' page on every site as a good style? This would loop this thing...
Re:Have they really thought it through. (Score:2)

by Tet ( 2721 ) writes:

It is about time that us "geeks" re-claimed our Internet from the dumbed down masses.
OK, so you're trolling, but I almost agree with you anyway. Back in the late '80s, I was wishing more people were connected to the net. It was a great place to be. Now they're all here, I occasionally find myself wishing they weren't. The problem is that there's no quality control. If only people with half a brain were allowed Internet access, we wouldn't have the AOL syndrome. But real life isn't like that. For better or worse (overall, I think it's for better, despite the problems it causes), the unwashed masses do contribute to the essence of the net. For every 1000 AOL lusers, the general population gives us a Rob Malda or an Iliad. Not an ideal ratio, but better than nothing.
Re:The web is broken. (Score:2)

by Kaa ( 21510 ) writes:

The web is broke. We're not using it properly

Is there a proper way of using the web? I don't thing so. The web is many things to many people. That's what makes it so alive and so interesting.

there are too many poorly done corporate sites

Sure, so what? See Strugeons's Law (90% of everything...). A badly designed corporate site tends to be its own punishment.

We need more of these research projects to help us figure out what needs to be changed.

Seems like you want to impose good taste and proper programming practices on the web. Thank you very much, I'll pass. I don't want the web to be Martha-Stewardized.

Kaa
shape (Score:1)

by Quazi ( 3460 ) writes:

The internet is similar to the universe and everything in-between -- especially the theories about their respective shapes! I remember when the big theory was that the universe is saddle-shaped. Then I got theories that conflicted that by saying that the universe has far too many dimensions to have a shape. Then the first guys said that only the three-dimensional part looks like a saddle.. Now there are people saying that the Internet looks like a bowtie. I say that if you redraw the graph with dots that represent site locations and put them in roughly the same areas (incoming, outgoing, central area, tendrils, etc.), you won't actually see individual tendrils. Instead, you'll see a big hazy mass. To the untrained eye, it will look like a blarb of shit. Here's the big question: Is the Big Blarb of Shit collapsing in on itself, or will it expand forever?
Re:This is no surprise (Score:1)

by ballestra ( 118297 ) writes:

Look, I wouldn't have risked my +1 bonus if was just trying to go for points. You say the links to the portals weren't necessary, but actually none of my links were necessary. I was trying to show an extreme example of maximal linking which is common on most nodes at everything2.com. I wouldn't plug /. if it weren't important to the point I was making, which is that unlike most major web sites, it allows every visitor to contribute not only comments, but links as well.
BTW, if you hadn't pissed away your karma, you might have been able to moderate me down.

"What I cannot create, I do not understand."
Re:Read more at K5 (Score:1)

by new500 ( 128819 ) writes:

yeah, that's normal. didnt mean to be up tight or anything
Some ways of finding unlinked to sites (Score:1)

by erice ( 13380 ) writes:

There are essentially two kinds of sites

1) Sites that have their own domain.
2) Sites that share their domain with others.

Sites with their own domains can be found simply by searching domains. I would think that NSI and other registries would be willing to part with their zone data for research purposes by a reputable organization.

Sites that share a domain are harder. These could probably be estimated by finding the ratio between pages reachable from the domain's home page and those not reachable for known sites and extrapolating.

Another useful source, if they can get it, is Alexia's data. Alexia tracks pages as the user visits them. As a result, any page that any user (using the Alexia plugin) visits, Alexia can catalog. I have caught alexia crawling pages of mine that were deliberately set up to not have any links to them.
Re:This is no surprise (Score:1)

by MrShiny ( 171918 ) writes:
Amen. I'm tired of hearing about how e-business is the next big step in the web (but I doubt anybody here buys that sort of media hype anyway).
The web is where it is today because it allows people to publish to the world for peanuts, period. The next big step in the web is making it easy for novice home users to publish content without using costly third party hosting. These are the things required for this to happen:
- Broadband, dedicated access available to every home at consumer prices. This is already happening.. for example, here in Canada we have a choice of cable or DSL in most major cities.
- Hassle free static IPs and domain names. Domains seem to be getting cheap now with the NSI monopoly busting and OpenSRS but the IP shortage makes it almost impossible for a typical consumer to get a static IP. Maybe when hell freezes over and everybody switches to IPv6, this will change.
- Web server/authoring software targeted at novice users. MS Frontpage and Personal Web Server don't count :). My suggestion would be something that looks like a web browser but lets you edit everything you see. Something like this probably exists already, I'm not sure.
I know this will probably lead to the web being further polluted with poorly designed pages and midi music, but it's worth it since this would probably double online content overnight.. and we all know content is what matters, right?
Nerd != Linux (Score:1)

by feck ( 181247 ) writes:

damn straight! synth-dorks and physix-weenies have as much place on /. as *nix gods, code monkeys and Trolls(tm)
Referances (Score:2)

by Money__ ( 87045 ) writes:

Adamic and Huberman (1) 99. L. Adamic and B. Huberman. The nature of markets on the World Wide Web [xerox.com], Xerox PARC Technical Report, 1999.
Adamic and Huberman (2) 99. L. Adamic and B. Huberman. Scaling behavior on the World Wide Web, [xerox.com] Technical comment on Barabasi and Albert 99.
Aiello, Chung, and Lu 00. W. Aiello, F. Chung and L. Lu. A random graph model for massive graphs, [xerox.com] ACM Symposium on the Theory and Computing 2000.
Albert, Jeong, and Barabasi 99. R. Albert, H. Jeong, and A.-L. Barabasi. Diameter of the World Wide Web, Nature 401:130-131, Sep 1999.
Barabasi and Albert 99. A. Barabasi and R. Albert. Emergence of scaling in random networks, Science, 286(509), 1999.
Barford et. al. 99. P. Barford, A. Bestavros, A. Bradley, and M. E. Crovella. Changes in Web client access patterns: Characteristics and caching implications, in World Wide Web, Special Issue on Characterization and Performance Evaluation, 2:15-28, 1999.
Bharat et. al. 98. K. Bharat, A. Broder, M. Henzinger, P. Kumar, and S. Venkatasubramanian. The connectivity server: fast access to linkage information on the web, [digital.com] Proc. 7th WWW, 1998.
Bharat and Henzinger 98. K. Bharat, and M. Henzinger. Improved algorithms for topic distillation in hyperlinked environments, [digital.com] Proc. 21st SIGIR, 1998.
Brin and Page 98. S. Brin, and L. Page. The anatomy of a large scale hypertextual web search engine, [stanford.edu] Proc. 7th WWW, 1998.
Butafogo and Schniederman 91. R. A. Butafogo and B. Schneiderman. Identifying aggregates in hypertext structures, Proc. 3rd ACM Conference on Hypertext, 1991.
Carriere and Kazman 97. J. Carriere, and R. Kazman. WebQuery: Searching and visualizing the Web through connectivity [uwaterloo.ca] , Proc. 6th WWW, 1997.
Chakrabarti et. al. (1) 98. S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan, and S. Rajagopalan. Automatic resource compilation by analyzing hyperlink structure and associated text, [decweb.ethz.ch] Proc. 7th WWW, 1998.
Chakrabarti et. al. (2) 98. S. Chakrabarti, B. Dom, D. Gibson, S. Ravi Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Experiments in topic distillation, [ibm.com] Proc. ACM SIGIR workshop on Hypertext Information Retrieval on the Web, 1998.
Chakrabarti, Gibson, and McCurley 99. S. Chakrabarti, D. Gibson, and K. McCurley.Surfing the Web backwards, Proc. 8th WWW, 1999.
Cho and Garcia-Molina 2000 J. Cho, H. Garcia-Molina Synchronizing a database to Improve Freshness [stanford.edu] . To appear in 2000 ACM International Conference on Management of Data (SIGMOD), May 2000.
Faloutsos, Faloutsos, and Faloutsos 99. M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power law relationships of the internet topology, ACM SIGCOMM, 1999.
Glassman 94. S. Glassman. A caching relay for the world wide web [digital.com] , Proc. 1st WWW, 1994.
Harary 75. F. Harary. Graph Theory, Addison Wesley, 1975.
Huberman et. al. 98. B. Huberman, P. Pirolli, J. Pitkow, and R. Lukose. Strong regularities in World Wide Web surfing, Science, 280:95-97, 1998.
Kleinberg 98. J. Kleinberg. Authoritative sources in a hyperlinked environment, [cornell.edu] Proc. 9th ACM-SIAM SODA, 1998.
Kumar et. al. (1) 99. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the Web for cyber communities, Proc. 8th WWW , Apr 1999.
Kumar et. al. (2) 99. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Extracting large scale knowledge bases from the Web, Proc. VLDB, Jul 1999.
Lukose and Huberman 98. R. M. Lukose and B. Huberman. Surfing as a real option, Proc. 1st International Conference on Information and Computation Economies, 1998.
Martindale and Konopka 96. C. Martindale and A K Konopka. Oligonucleotide frequencies in DNA follow a Yule distribution, Computer & Chemistry, 20(1):35-38, 1996.
Mendelzon, Mihaila, and Milo 97. A. Mendelzon, G. Mihaila, and T. Milo. Querying the World Wide Web [toronto.edu], Journal of Digital Libraries 1(1), pp. 68-88, 1997.
Mendelzon and Wood 95. A. Mendelzon and P. Wood. Finding regular simple paths in graph databases [toronto.edu], SIAM J. Comp. 24(6):1235-1258, 1995.
Pareto 1897. V Pareto. Cours d'economie politique, Rouge, Lausanne et Paris, 1897.
Pirolli, Pitkow, and Rao 96. P. Pirolli, J. Pitkow, and R. Rao. Silk from a sow's ear: Extracting usable structures from the Web [acm.org] , Proc. ACM SIGCHI, 1996.
Pitkow and Pirolli 97. J. Pitkow and P. Pirolli. Life, death, and lawfulness on the electronic frontier, Proc. ACM SIGCHI, 1997.
Simon 55. H.A. Simon. On a class of stew distribution functions, Biometrika, 42:425-440, 1955.
White and McCain 89. H.D. White and K.W. McCain, Bibliometrics, in: Ann. Rev. Info. Sci. and Technology, Elsevier, 1989, pp. 119-186.
Yule 44. G.U. Yule. Statistical Study of Literary Vocabulary, Cambridge University Press, 1944.
Zipf 49. G.K. Zipf. Human Behavior and the Principle of Least Effort, Addison-Wesley, 1949.
___
The Astral Plane Model (Score:1)

by JammmGrrl ( 131305 ) writes:

I think to Bow Tie fits just fine into my Astral/Etherial Plane model... the AP model fits over everything, so what's inside doesn't really matter as much.

The internet is like the Astral Plane. We access it from the "real" world all the time. Time and space obey different rules there. You can leap from place to place, no matter the distance, in the same amount of time, and from any location. You can even be in more than one place at once. Matter can be changed and manipulated at will. Cyberspace and alternate dimensions have a lot in common.
Re:Alternate Pasta-Based Web Theories (Score:1)

by Ranger Rick ( 197 ) writes:

Don't forget the much more common "Elbow", and "Shell" theories (both with cheese).

That's a spicy meat-a-ball!

:wq!
Re:The web is broken. (Score:5)

by Stiletto ( 12066 ) writes: on Tuesday May 16, 2000 @04:31AM (#1070029)

I can't stand developing for 3 different browsers on 4 different platforms, 12 screen resolutions, 3 color depths...

Then don't! Thats one thing many "web authors" still don't get... The WWW is a text-oriented medium. It's a page of text that has links to other pages of text. Everything else is just cruft.

HTML doesn't define how a web site should look to the pixel, and this is one of it's strong points. It's up to the user to decide how to view a site. If the user doesn't want images, your site should look just fine without them.

The minute you start checking to make sure your site looks the same on all browsers, you should re-think your entire site. Why do you want it to look the same on all browsers (it won't by the way...)? This usually indicates that you are focusing too much on presentation and not enough on content.

The web is broke. We're not using it properly

I agree with your second statement. The web isn't broke... people just aren't using it properly. There are so many corporate sites that look like brochures. It's sickening. My previous job was to set up a web page for a small business, and all they wanted me to do was scan each page of their brochure into GIF's, put them up on the web, and put "forward" and "backward" buttons on the bottom to navigate between pages. I said, WTF!?!? The concept of actually including text information and links to other resources was totally absurd to my boss.

These kinds of people think of the web only as a marketing tool, and thus can't take advantage of the power it has to offer.

Share
twitter facebook
Re:The /real truth/ about web's topology... (Score:1)

by phutureboy ( 70690 ) writes:

That's 'pr0n' actually, and it was a bad-ass movie back in the day. My favorite part was the lightcycle race.
Re:22% of sites were sites that we couldn't find. (Score:1)

by beebware ( 149208 ) writes:

IBM et al probably knows or else they couldn't have calcultated the numbers.
Is it me, or does anybody have this mental image of a machine trying every single possible URL (and I'm not just talking domain names, or even just index.html files) and filling them. And then going around see how many could be found in a search engine and then seeing how many linked to a search engine.

My mind boggles how IBM et al managed to find all the 'unlinked' sites to get their figures.

Unless of cause, they guessed...

Richy C. [beebware.com]
--
Uses AltaVista's raw data... (Score:2)

by Jon Peterson ( 1443 ) writes:

Read the article, it's quite good:
In general, the AltaVista crawl is based on a large set of starting points accumulated over time from various sources, including voluntary submissions. The crawl proceeds in roughly a BFS manner, but is subject to various rules designed to avoid overloading web servers, avoid robot traps (artificial infinite paths), avoid and/or detect spam (page flooding), deal with connection time outs, etc. Each build of the AltaVista index is based on the crawl data after further filtering and processing designed to remove duplicates and near duplicates, eliminate spam pages, etc. Then the index evolves continuously as various processes delete dead links, add new pages, update pages, etc. The secondary filtering and the later deletions and additions are not reflected in the connectivity server. But overall, CS2's database can be viewed as a superset of all pages stored in the index at one point in time. Note that due to the multiple starting points, it is possible for the resulting graph to have many connected components.
Why more connected? (Score:3)

by Hard_Code ( 49548 ) writes: on Tuesday May 16, 2000 @04:38AM (#1070033)

I hope people don't use this paper to promote arbitrary linkage to other sites. I mean /why/ do things have to be more connected? When I'm on my web page I don't want or need one click access to every other part of the web. That's why there are portals and search engines. Islands I understand. But we wouldn't necessarily /want/ those two sections of 24%, origin and termination, to be arbitrarily linked more to the core. We'd just end up with the whole web being a humongous hairball of a core in which each page linked to many other pages in the core. What a mess. People put indices in one place, at the BACK of a book, for a reason.

Share
twitter facebook
YHBT YHL HAND :-) (Score:1)

by Jon Peterson ( 1443 ) writes:

Sorry mate but your post was a quite beautiful example of trolling at work. I guess the old troll gets all of us from time to time but that _was_ a beauty :-)
Re:The web is broken. (Score:1)

by sredding ( 107116 ) writes:

Wouldn't it be nice if *one* browser had a flawless implementation of the W3C standard?

I'd settle for NS going far far away. I'm so sick of seeing work that passes through the W3C validator only to be mangled by Nutscrape.

Wasn't NS supposed to be saved/fixed by open source? When is that going to happen?
Re:how does it work? what is next? (Score:1)

by noweb4u ( 52880 ) writes:

Considering that you can have multiple sites on the SAME IP, differentiated only by the "Host:"
header in your browser, and compounded by the fact a webserver can be placed on ANY port in the 1-65535 port range, to use that theory to find a server is impossible.
Remember, infinite number of possible hosts on 65535 possible ports, on billions or IP addresses.
Thought I'd throw out that theory :)
Re:22% of sites were sites that we couldn't find. (Score:1)

by Richy_T ( 111409 ) writes:

I've often wondered howmany sites aren't linked to by any other site and have never been scanned by a search engine. Chances are we'll never know.
Although almost by definition, if a site isn't linked, it isn't part of the world wide web, the same as if web pages are on private networks.
An interesting thought is that what if the world-wide web somehow ended up in two halves with no links between the two? Would there be two world wide webs or would the bigger one prevail or would we still choose to think of it all as the one world wide web?
Rich
We're all part of a big BOW TIE (Score:1)

by mattkime ( 8466 ) writes:

The results of this study have the most poetic conclusion - every part of the internet is taking part in a big BOW TIE.

Sure, some parts of the bow tie aren't as well connected as others, but hey, thats their place. Its takes millions and millions of of sites to put together this universal-intergalatic bow tie, and not everyone can be at the center of it.

This reminds us that we should each make an effort to make our portion of the bow tie a bit nicer. Refraining from talking about petrified-natalie portman-grits really does pay off in the long run, making the heavenly Bow Tie a nicer place to surf.

Fellow linux users, do not insult Window and Mac users, as they are part of the great Bow Tie as well.
The web ain't broke (Score:1)

by jabber ( 13196 ) writes:

The managers of web developers are.

They want it to have the latest wizz-bang 'features' of a half-dozen different browsers.

They want it to contain ALL 'pertinent' information on the front page, but be clear, concise and readable at a glance.

They want it to PRINT cleanly on an 8.5x11 sheet - or worse yet, on an A4 sheet.

They want it to be secure, and robust and stable, but only if they can have it done TODAY!
Re:The web is broken. (Score:1)

by singularity ( 2031 ) writes:

I do not think that the original poster was saying that he wants a web without graphics, but rather that he wants a web with text. For example, take a look at my college's home page, http://www.ius.indiana.edu/ [indiana.edu] *with graphics turned off*. Now try to get somewhere specific. Can you do it?

The web allows for many different medias, and yet too often people only use graphics (and the occasional text).

I think that browsers such as the newest IE for the Mac (as much as I despise MS) and iCab (a very compliant browser, even showing HTML mistakes) are the futureof browsers: Compliant.

Other people on this thread are correct: Clients are the problem. Try showing them what their web page looks like in Lynx, or in iCab. "Do you want people coming to your web page to see this?"

Then we come to my pet-peeve: JavaScript. I cannot stand pages that depend on JS-support. It is fine to use, but as soon as I get to a page that is impossible to navigate without JS, I leave.

My final point: People need to realize that the web allows for a relatively seemless media environment. This does not mean, however, that you should pick one form of media and rely on it exclusively on your page.
Someone desperately needs to get laid (Score:1)

by sredding ( 107116 ) writes:

This is research? A scientific study of links? Gimme a break. Somebody, somewhere please find these people a date.
Going To Heck (Score:1)

by feck ( 181247 ) writes:

..at least the drinks'll be str0ng and allah my friends'll be there...
Re:More information (Score:1)

by TMB ( 70166 ) writes:

Most interesting, I think, were the tunnels, connecting IN to OUT but bypassing the core. It would seem that such tunnels indicate weaknesses in the make up of the core, which is to say paths of connected interest that for some reason are not included in the core. These, I think would be worth looking at to see if grow or diminish. If a tunnel grew to similar size to the core, it would make an interesting model where IN and OUT have more than one major connecting network.

I'd be surprised to see that happen... the way the tunnel would grow would be because more and more links are added. If 30% of sites on the web are in the core, then assuming that the links are to random sites, after typically 4 links have been added it will be connected to the core and therefore fall into the IN group.

Of course, sites aren't added randomly, but since the core by definition contains often-linked-to sites, chances are that more than 30% of newly added links in the tunnel will point to the core.

[TMB]
Slashdot's internal topology... (Score:1)

by FuzzyOne ( 92285 ) writes:
And we can't forget that Slashdot really is a bunch of tendrils leading from a core bundle of posts comprised of equal parts of:
- Grits posts
- OOG posts
- First post posts
- I'm gonna kick your ass guy posts
A silly question (Score:1)

by wwwhatever ( 183473 ) writes:

How can I embed sound/midi files into a web page which is W3C compliant and readable by Netscape? Another one: how can I get the effect of this page [crosswinds.net] without graphics?
If you want it to look the same on all browsers... (Score:2)

by Convergence ( 64135 ) writes:

Why not do the obvious thing? Make each page one giant GIF or JPEG image. You can use an imagemap to let people navigate.

Why don't you do THIS for your customers? It gives them exactly what they want, they can get pixel-level control of how their site looks. They can even digitize paper brochures and do it that way. And by ignoring all of the cruft with crappy HTML 4 different browsers on 5 different platforms, you can probably do the site cheaper and easier this way too.

HTML and pixel-accurate renderings are MUTUALLY EXCLUSIVE. HTML isn't, wasn't and should NOT be designed for that. If you want something better, either design it yourself, or piggyback it on something which works and can be done today. JPEG or GIF or PNG.

If your customers want to look like idiots on the web, then I'm sure they'll like this. Not only this, you should ADVERTISE this advantage. ``The only web design company who's sites look EXACTLY how you want, on every browser on every OS.''
Exactly (Score:2)

by Convergence ( 64135 ) writes:

I visit pages that have content I want. I bookmark and revisit pages that lead me to content I want. If Palm's website doesn't link to palmgear or their development section doesn't bookmark to the GCC cross compiler tools. I won't visit them.

I still remember the HP48 websites circa 4 years ago.. 95% of them were crap, just links to another
site that was full of links to other crap sites.. Had ONE of those sites had a catagorized set of links, I would have bookmarked it in an instant. ``links to development software sites'' ``links to fan sites'' ``links to shareware sites'' ``links to math sites'' ....
Need a ladder to get off that high horse? (Score:2)

by Aero ( 98829 ) writes:

It is about time that us "geeks" re-claimed our Internet from the dumbed down masses. We should return to the days of ARPA, when only people with a legitimate requirement could get net access. The "democratization" (i.e. moronification) of the web has gone too far and is responsible for the majority of problems us "original internet users" are seeing. The flood of newbies must not only be stopped, it needs to be REVERSED. These non-tech-savvy people need cable TV, and not something as sophisticated and potentially dangerous as the Internet.

While my wife and I often joke about the sentiment of this statement (at least once a day, one of us will point at a website or an email and say, "Yet Another person who really shouldn't be on the Internet"), we also know that actually believing it is horribly shortsighted thinking. Yes, there's a lot of no-content fluff out there on the web. But people have to start somewhere. I wouldn't expect a person's first web page to be more meaningful than "here's my house, my family, and my pets" any more than I'd expect a 6-year-old's first two-wheeler to be a Harley.

Granted, some folks never get past the "training wheels" stage. (Okay, make that "a lot of folks" these days.) But the Internet has long passed the days when it was a tool for a select group of people. If the S/N ratio is dropping precipitously, well, then, improve your noise filters. Make it a habit to include things like "-url:aol.com" in every search if you need to. You're one of the "tech-savvy" crowd (directed at the original AC who posted); use your tech knowledge! If all you can do is bitch about the fluff on the web without using readily-available tools to cut through it, maybe you're not as tech-savvy as you'd like us to believe.

"I shouldn't have to" isn't a valid response, either. In any information search, irrelevant data will turn up, and you're going to have to sort through it anyhow.

Aero
I doubt you're a developer (Score:2)

by Kozz ( 7764 ) writes:

The minute you start checking to make sure your site looks the same on all browsers, you should re-think your entire site. Why do you want it to look the same on all browsers (it won't by the way...)? This usually indicates that you are focusing too much on presentation and not enough on content.

Clearly, you're not a developer. For those of us who do this for a living, it's about presentation and content. And we're not necessarily designing our *own* websites, we're designing for clueless clients who refuse to be convinced of certain practices/standards no matter HOW MUCH we pound them into their puny skulls.

Web Developers/Designers have the most *clueless* clientele of most any industry, and we have to develop for them, not us. Believe it or not, graphics occasionally look good on a website, and people WANT THEM. And considering how IE and NS handle tables, alignment so very differently, we DO have difficulties making it look the same in both browsers.

For those of us who design, we know that this is a perpetual, neverending headache.

Quidquid latine dictum sit, altum viditur.
oh please! it's a "clip-on" (Score:2)

by G Neric ( 176742 ) writes:

what they're calling a "bow-tie" is more properly called a "clip-on"... probably a rental, at that. A bowtie does not have two little straps that head around to the back as they have show. A bowtie is a strap that comes around from the back and ties in the front.
But getting the analogy wrong just reflects the simplicity of what they've done. Their categorizations based on number of links in and number of links out could have been made a priori. They did measure the size of each group which was somewhat interesting.
A much more interesting study would be of the paths that people actually follow. Who cares what links the author put up if nobody clicks on them. But, the paths that people take would tell us a lot. Do peoples start at their bookmarks? Do they start at portals? Can they be categorized? And the real question: what paths do the people who buy stuff take?
Read more at K5 (Score:2)

by yerricde ( 125198 ) writes:

This is currently being discussed [kuro5hin.org] at Kuro5hin (pronounced "corrosion").
Re:The web is broken. (Score:2)

by Hard_Code ( 49548 ) writes:

"In fact, I've run into web developers who have never HEARD of the w3c."

<<**SHUDDER**>>
Re:More information---Not really (Score:2)

by scheme ( 19778 ) writes:

If you consider ramsey theory then you'll know that any two coloring of a graph will give a group of vertices that are strongly interconnected (a clique) and/or a group that where none of the vertices is connected to any other(anti-clique).
For example, coloring a complete 6 vertex graph will either give a clique or anti-clique of three vertices. In a social context, this means that in a group of 6 people there will be a group of at least 3 people who either do not know anyone else in the group or know everyone else in the group. Using a theorem by Erdos tells use that the web probably does not have a clique or anti-clique of size greater than 1+log n (here log = log base 2) where n is the number of web sites. Another result says that there is guaranteed to be a clique or anticlique of that is at least as large as the fourth root of n where n is the number of web pages.
Easy to disprove (Score:2)

by Mr_Ceebs ( 60709 ) writes:

Perhaps they should put a list of these unconnected websites up on the web, then they could quickly and easily disprove their own research.
Rather obvious. (Score:2)

by Rick_T ( 3816 ) writes:

Study or no, isn't it rather obvious that an awful lot of the web isn't accessable (linked) unless you know exactly where the resource you're looking for is? Ask anyone who does web searches routinely - you'll get vastly different results looking with different engines, and you can rest assured you'll miss lots of information that you might find useful.

Case in point - recently I needed to find some information about a specific company. Now this company's name is virtually the same as a popular Unix variant (no, not Linux). Searching for the company name, once all the Unix links were weeded out (this was a chemical company) led to some federal documents about the company, but not much other information at all.

As it turns out, the company in question HAS a web site (and has had one for some time) - it just wasn't linked from anywhere I could access on the common search engines.

Still, it's nice to have some data on this...
The /real truth/ about web's topology... (Score:3)

by VSc ( 30374 ) writes: on Tuesday May 16, 2000 @03:46AM (#1070068) Homepage
...is that in fact it consists of two distinctive bodies:
- Slashdot
- The rest
'The rest' can be further subdivided into 3 parts:
- News sites Slashdot links to
- Non-new sites which get slashdotted
- News sites talking about Slashdot
- The other category would be "None of the above " but in that case we don't really care to count, do we.
And of course I have to mention that this study of mine is highly unbiased, openminded, and generally guaranteed to be 100% completely accurate.

__________________________________________
Share
twitter facebook
More information (Score:4)

by spiralx ( 97066 ) writes: on Tuesday May 16, 2000 @03:48AM (#1070069)
Our analysis reveals an interesting picture (Figure 9) of the web's macroscopic structure. Most (over 90%) of the approximately 203 million nodes in our crawl form a single connected component if hyperlinks are treated as undirected edges. This connected web breaks naturally into four pieces. The first piece is a central core, all of whose pages can reach one another along directed hyperlinks -- this "giant strongly connected component" (SCC) is at the heart of the web.
In graph theory, a strongly connected component is a set of mutually reachable equivalence classes of vertices in a graph - i.e a group in which every vertice is reachable from each other.
What's interesting is that the four groups mentioned in this article are all approximately the same size, with the SCC group being only slightly larger than the others, which are:
- IN - Pages that link to SCC but aren't linked from the SCC back.
- OUT - Pages that are linked to from the SCC but don't link back to it e.g. corporate websites with only internal links.
- TENDRILS - Sites totally unconnected to the SCC in either direction.
So what they're saying is that really only about a quarter of the internet is the core that is strongly connected to the rest of it. Which is interesting in itself, because I'd have thought it was a lot higher.
Share
twitter facebook
Re:Have they really thought it through. (Score:2)

by Cedric C. Girouard ( 21203 ) writes:

>I find all these "personal" pages on the web are a major irritant, as they seldom contain useful ?information, and they clog up the search >engines with non-relavent crap, by polluting the search space.

Has it ever occured to you that slashdot was once Rob's personal page ?
A standard compliant web browser. (Score:2)

by aidoneus ( 74503 ) writes:

actaully, there is one. It's called amaya and it's been developed by the W3C for use as both a browser and editor. It's not the prettiest of things, but it's not designed to be. It's designed to be a fully standards compliant web browser. Head to the W3C's page and take a look at it. It's available for most platforms and really quite useful.

-Jason
Re:More information (Score:2)

by Jon Peterson ( 1443 ) writes:

This was indeed a very interesting survey.

Most interesting, I think, were the tunnels, connecting IN to OUT but bypassing the core. It would seem that such tunnels indicate weaknesses in the make up of the core, which is to say paths of connected interest that for some reason are not included in the core. These, I think would be worth looking at to see if grow or diminish. If a tunnel grew to similar size to the core, it would make an interesting model where IN and OUT have more than one major connecting network.

Most of the media coverage of this was saying that every company wanted to be in the core, but I think that's a very crude take on it. I didn't especially see anything in this study that indicated that interconnectivity was closely linked to traffic, much less relevant traffic.
Re:I doubt you're a developer (Score:2)

by Stiletto ( 12066 ) writes:

Truly, having to cater to clueless clients is what makes all jobs difficult :)

I still reject the (for some people) dogmatic view that a site needs to look the same in all browsers.
Re:The web is broken. (Score:3)

by TheTomcat ( 53158 ) writes: on Tuesday May 16, 2000 @04:53AM (#1070075) Homepage

The WWW is a text-oriented medium. It's a page of text that has links to other pages of text.

What you've just described is gopher with links.
I've said this on slashdot before, and I'll say it again: The web is NOT Gopher. The web is a multi-media platform. Including graphics, animation, video, sound, and any other funky stuff people want to throw up on it. The whole "The web should be text. Graphic elements are clutter." mentality makes me sick. I agree 100% that a site should NOT be DEPENDANT on graphics or other 'specialty media' to get content accross. That's what good consideration for the text-based users and ALT tags are for. But a web without graphics is merely gopher tunneled over http.

Why do you want it to look the same on all browsers (it won't by the way...)?

It's pretty simple: clients don't understand the web. They want all that pretty crap. They REQUIRE it to look the same wherever they see it. They expect things as low level as kerning and leading to be the same, universally.

Like I said in my first post, we (as in everyone) need to recognize that the web is a new medium. Traditional media conventions don't apply.

Share
twitter facebook
Re:Why more connected? (Score:2)

by GeorgeH ( 5469 ) writes:

I hope this is a troll that I'm falling for.

The web is supposed to be linked together. That's why you put something on the web instead of publishing it in a 'zine or a book [loompanics.com] or any other form of printed materials. Just because you don't want or need one click access to relevent information doesn't mean that it shouldn't be there. Would you still visit slashdot if it didn't link to the articles it talked about? Would suck [suck.com] be any good without links?

I'm not arguing for linking to random information just because you can, but informative linking is why hypertext [xanadu.net] has the hyper.
--
Re:The web is broken. (Score:2)

by Kenneth Stephen ( 1950 ) writes:

Then don't! Thats one thing many "web authors" still don't get... The WWW is a text-oriented medium. It's a page of text that has links to other pages of text. Everything else is just cruft.
I have to disagree with you there. Undoubtedly, the web started out as and was designed for a text oriented medium of information propagation. However, it is also true that it has outgrown its original design. How else do you explain "IMG" tags? Why would they be required in a txt only medium?
Yes there are limitations originating from its design goal that generate a sense of awkwardness when implementing graphic oriented pages. However, there are principles of web page design which can be followed to minimize the awkwardness. Graphics is now very much on the web : deal with it the best you can. Closing your eyes and hoping it will go away is not a good solution.
I have no solution for the original problem posed regarding programming for multiple browsers. This is inded a bitch. But the one about multiple resolutions is much more easily fixed : program your webpages to a fixed resolution. I contract at IBM, and IBM's standard is that the webpage must be displayable on a 640x480 resolution without having to scroll. There are exceptions to this rule of course, but these sites need to get approval for exceptions from higher up.
Re:Have they really thought it through. (Score:2)

by 348 ( 124012 ) writes:

2.It is not our/your internet! It is everyones Internet! If the internet has "dumbed down" then it is just appealing to the masses.
IUt is not everyones internet. The internet was funded by business for business and is supported and enhanced by business and for business. You are an invited guest here, mind your manners.
The dumbing down is done by the masses, but it is neither wanted nor promoted. The internet gets it's legs from the billions in capital business (mostly US) provide for their benefit, not yours. Pr0n, Joe sixpack's dog pics and AOL crap are just unwanted byproducts.
Re:The web is broken. (Score:2)

by Mr. Slippery ( 47854 ) writes:

I have to disagree with you there. Undoubtedly, the web started out as and was designed for a text oriented medium of information propagation. However, it is also true that it has outgrown its original design. How else do you explain "IMG" tags? Why would they be required in a txt only medium?

You're confusing "text oriented" with "text only". "Text oriented" doesn't mean that images, animations, sounds, etcetera aren't present; it just means that text is primary.
How do people find your pages? How do indexes and search engines work? It's all based on the textual content. Google doesn't do OCR on your GIFs of scanned brochures, or voice recognition on your MP3s of your radio spots. Even if images or sounds are the focus of your site, you'd best have plenty of text that indexes and describes that content.
What loads fastest, given surfers the information they're looking for in the least time? Text.
What can be displayed in the user's choice of colors and fonts, so that it's legible in any situation? Text.
What can be rendered on a PDA, or read by a text-to-speech converter for the blind? Text.
What should web designers do when clients don't understand these issues? Apply the clue stick. Gently and with respect, but firmly, make it clear that you know more about the internet and the WWW than they do, that's why they are paying you, and if they want an rhinestone-encrusted illegible and unusable site that takes three days to load over a 28.8k PPP link, then they can hire a 12-year-old who's just finished reading HTML for Dummies instead of a professional - and then spend the next few months wondering why they bother having a web site, since it's done fsck all for their business.
22% of sites were sites that we couldn't find. (Score:2)

by BMIComp ( 87596 ) writes:

Hmmm, isn't possible that the number of disconnected sites is much greater, do to the fact that you can't find all the non-linked sites... since there is no where to find them from? Anybody have any info from the study?
Sources and sinks (Score:3)

by Seth Finkelstein ( 90154 ) writes: on Tuesday May 16, 2000 @03:54AM (#1070092) Homepage Journal

1) How many corporate pages EVER link outside their site?
2) Advertisers and news sites link into corporate pages
3) Personal home pages are highly likely to link into popular sites, but not be linked-into themselves
Applying these ideas, and others like them, leads to the "bowtie".

Share
twitter facebook
Re:The web is broken. (Score:3)

by coaxial ( 28297 ) writes: on Tuesday May 16, 2000 @07:29AM (#1070098) Homepage

> The web is broke. We're not using it properly

I agree with your second statement. The web isn't broke... people just aren't using it properly. There are so many corporate sites that look like brochures. It's sickening. My previous job was to set up a web page for a small business, and all they wanted me to do was scan each page of their brochure into GIF's, put them up on the web, and put "forward" and "backward" buttons on the bottom to navigate between pages. I said, WTF!?!? The concept of actually including text information and links to other resources was totally absurd to my boss.

These kinds of people think of the web only as a marketing tool, and thus can't take advantage of the power it has to offer.

Look at news sites. Howmany times do you come across a articles that are word-for-word taken directly from the printed page. (Almost to the fact that it says, "continued on page 3C".)

The worst part is the page-turning. You know, the "next page" links at the bottom of articles. That right there is a sign that your sight is broken. You're using a static and linear approach in a dynamic and nonlinear medium.

Break the story up. Link God damn it! If a comany gets mentioned link to it, not one of those pathetic stock quote drivels that news sites make. If some person made a speech, don't just quote the one or two sentences, link to the speech.

I'm convinced that the web is going to suck until our children ascend to power. Look at television. In the early days of the late 40s and 50s everything was very rigid. You basically had radio programs being done in front of a cammera. After a generation was raised on televions did you actually get programs that started to take advantage of the medium. Compare how news was done in 1950 to how it's done today. Look at educational television. Before you had the monotone droning voice of an old man, and now you have Sesame Street. The same thing is going to happen to the web.

Share
twitter facebook
Re:Have they really thought it through. (Score:2)

by bfree ( 113420 ) writes:
I must comment
1. Are nerds only interested in linux and open source? NO
2. It is not our/your internet! It is everyones Internet! If the internet has "dumbed down" then it is just appealing to the masses.
3. Buy a domain like elitegeek.net and create your own net if you want with search engines that only have the data you want in them. The very freedoms that allowed the explosive growth of the net allows you to carve off a little section if you want.
4. By what arrogance do you believe you have a right to join your IPv6 net where others do not. Who is the test body?
5. It scares me to death to read a post like this when I have to wonder if you could actually mean it!
Your post smacks of the argument for removing the right to vote because everyone keeps voting in the same dumb bastards.
The web is broken. (Score:5)

by TheTomcat ( 53158 ) writes: on Tuesday May 16, 2000 @03:58AM (#1070103) Homepage

I'm a web developer. I've always loved the potential of the web until recently. Now I don't like working with it. I can't stand developing for 3 different browsers on 4 different platforms, 12 screen resolutions, 3 color depths, and design templates that came from a print artist who thinks that the web is one big brochure.

The web is broke. We're not using it properly, there are too many poorly done corporate sites, contributing to insecurity, poor usability and incompatibility.

Many clients we work with are dead set against sending anyone away from their site. I don't think they realize that links are what the web is made of. This contributes to the unreachable part of the bowtie. These corporate folk are afraid that by linking away from the site, they will lose a viewer, and that use won't find their way back. They don't realize that the web is a pull technology, and the if the user was looking for certain information, the user will come back if it is the best source of such info. The back button is one of the browsers most used features.

We need more of these research projects to help us figure out what needs to be changed. The W3C is a start, but it's expensive to join [w3.org] and it's rare that you find a website that conforms to the standards. In fact, I've run into web developers who have never HEARD of the w3c.

The web is a new, completely different medium. It's not a CDROM, it's not a brochure, it's not TV. We can't keep treating it like these other media.

Share
twitter facebook
Re:Is it just me ? (Score:2)

by marnanel ( 98063 ) writes:

Well, according to the abstract [ibm.com],

The study of the web as a graph is not only fascinating in its own right, but also yields valuable insight into web algorithms for crawling, searching and community discovery, and the sociological phenomena which characterize its evolution.
So it has immediate practical use if you're writing spiders, and so on. I'm not sure whether "insight into... the sociological phenomena which characterize [the Web's] evolution" counts as something which does you any good, but you never know where the resulting studies might lead.
(Anyhow, who says research has to do anyone any good?)
Re:22% of sites were sites that we couldn't find. (Score:2)

by Bryan Andersen ( 16514 ) writes:

I've often wondered howmany sites aren't linked to by any other site and have never been scanned by a search engine. Chances are we'll never know.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

how does it work? what is next? (Score:1)

Have YOU really thought it through? (Score:2)

Re:Sources and sinks (Score:1)

Dark Matter pages (Score:1)

Thanks for the obvious (Score:1)

Re:Have they really thought it through. (Score:2)

A distributed effort to create a map of the web? (Score:1)

Re:Sources and sinks (Score:2)

Re:I doubt you're a developer (Score:1)

Of COURSE there are four groups (Score:4)

To: whomever knows graph theory? (Score:1)

Re:The /real truth/ about web's topology... (Score:2)

Looks like an amoeba (Score:1)

Re:Alternate Pasta-Based Web Theories (Score:1)

What about secure sites and generated pages? (Score:1)

How to clean up the Internet (Score:2)

Good to see a study on this... (Score:2)

Re:Sources and sinks (Score:2)

Re: Personal/Coporate (Score:1)

Re:The web is broken. (Score:2)

Re:How useful is this? (Score:1)

Re:Just Wondering (Score:1)

Doesn't have the same ring to it. (Score:2)

Is this a dup also? (Score:1)

Obvious glaring error (Score:2)

Alternate Pasta-Based Web Theories (Score:3)

Web Theories (Score:1)

Re:Is it just me ? (Score:1)

Re:Have they really thought it through. (Score:2)

Re:The web is broken. (Score:2)

This is no surprise (Score:5)

Re:The web is broken. (Score:2)

why didn't you link to the abstract (Score:1)

How useful is this? (Score:4)

Re:Have YOU really thought it through? (Score:1)

It wasn't always that way, now was it? (Score:2)

Re:Dark Matter pages (Score:1)

Re:Why more connected? (Score:2)

Danger Will Robinson! (Score:1)

Re:Danger Will Robinson! (Score:1)

Sheesh, Troll? (Score:1)

Re:US right to vote is worthless anyway [OT] (Score:2)

Impact of secure sites and generated pages? (Score:4)

Linked by (Score:1)

Re:Have they really thought it through. (Score:2)

Re:The web is broken. (Score:2)

shape (Score:1)

Re:This is no surprise (Score:1)

Re:Read more at K5 (Score:1)

Some ways of finding unlinked to sites (Score:1)

Re:This is no surprise (Score:1)

Nerd != Linux (Score:1)

Referances (Score:2)

The Astral Plane Model (Score:1)

Re:Alternate Pasta-Based Web Theories (Score:1)

Re:The web is broken. (Score:5)

Re:The /real truth/ about web's topology... (Score:1)

Re:22% of sites were sites that we couldn't find. (Score:1)

Uses AltaVista's raw data... (Score:2)

Why more connected? (Score:3)

YHBT YHL HAND :-) (Score:1)

Re:The web is broken. (Score:1)

Re:how does it work? what is next? (Score:1)

Re:22% of sites were sites that we couldn't find. (Score:1)

We're all part of a big BOW TIE (Score:1)

The web ain't broke (Score:1)

Re:The web is broken. (Score:1)

Someone desperately needs to get laid (Score:1)

Going To Heck (Score:1)

Re:More information (Score:1)

Slashdot's internal topology... (Score:1)

A silly question (Score:1)

If you want it to look the same on all browsers... (Score:2)

Exactly (Score:2)

Need a ladder to get off that high horse? (Score:2)

I doubt you're a developer (Score:2)

oh please! it's a "clip-on" (Score:2)

Read more at K5 (Score:2)

Re:The web is broken. (Score:2)

Re:More information---Not really (Score:2)