Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×

Finding New Code 158

tabandmountaindew writes "Too much time is wasted re-implementing code that someone else has already done, for the sole reason it's faster than finding the other code. Previous source code search engines, such as google codesearch and krugle, only considered individual files on their own, leading to poor quality results, making them only useful when the amount of time to re-implement was extremely high. According to a recent newsforge article a fledgling source-code search engine All The Code is aiming to change all of this. By looking at code, not just on its own, but also how it is used, it is able to return more relevant results. This seems like just what we need to unify the open-source community, leading to an actual common repository of unique code, and ending the cycle of unnecessary reimplementing."
This discussion has been archived. No new comments can be posted.

Finding New Code

Comments Filter:
  • I wonder how fast we will see other types of code.
  • by analog_line ( 465182 ) on Monday February 05, 2007 @11:13AM (#17889618)
    I'm not a coder, but my impression of the vast majority of coders is that they reinvent the wheel because they believe that everyone screwed up their wheel implementation and if no one is going to do it right, they should.
    • by introp ( 980163 ) on Monday February 05, 2007 @11:19AM (#17889688)

      There's an old adage in the racing business: if you're building a parade float, buy your wheels. If you're racing a 300 kph Formula One car, consider building your own. If you place very high demands on a component because it is at the core of what you do, the stock component may not be good enough.

      So, in software, if the "wheel" is at the core if your product, you may have to re-invent it to get exactly what you need. This is not because everyone else screwed it up, just that the stock "wheel" serves 90% of the features for cheap. Look at the Mac iPhone's OS, Cisco's move to (cheaper, memory-wise) VxWorks on the WRT54, etc.

      • Re: (Score:2, Insightful)

        The iPhone isn't really a good example other mobile oses (e.g. Linux) could have done the job just as well if not better. There are only two reasons for apple to use their own OS branding and vendor lockin.
      • by cant_get_a_good_nick ( 172131 ) on Monday February 05, 2007 @01:32PM (#17891698)
        There was a Joel On Software post [joelonsoftware.com] a while back that explained this.

        Most applicable quote:

        If it's a core business function -- do it yourself, no matter what.
        • by Raenex ( 947668 )
          Funniest quote:

          The only exception to this rule, I suspect, is if your own people are more incompetent than everyone else, so whenever you try to do anything in house, it's botched up. Yes, there are plenty of places like this. If you're in one of them, I can't help you.
    • Re: (Score:1, Interesting)

      by Anonymous Coward
      What I tend to find is that a lot of existing code is not sufficiently abstracted enough from the original implementation, or contains too many dependencies on either the rest of the original codebase or has external dependencies that I am either unable or unwilling to satisfy. If I have to import 3000 lines of structure definitions, macros and constants and modify my code just to be able to re-use a couple of functions, it's likely going to be less work and effort in the long term to write my own implemen
    • by ajole ( 132756 )
      The first thing you have to do is get people to READ code! How many times have you pushed a door that had a big sign that said "Push"?

      Go Python!
      • by ajole ( 132756 )
        er.... a sign that said "Pull".

        It's early here :)
        • by NotQuiteReal ( 608241 ) on Monday February 05, 2007 @11:34AM (#17889886) Journal
          I often Push on the Pull doors, just to see if they work.

          Often the Push/Pull sign is just some control freak placing arbitrary rules on things. So what if you clock a little old lady on the other side once in a while.

          Freedom to swing both ways has its price!

          • I often Push on the Pull doors, just to see if they work.

            Often the Push/Pull sign is just some control freak placing arbitrary rules on things.
            You're modded funny (and that might be your intention), but I do the exact same thing. Arbitrary rules piss me off. What annoys me more, though, is handles (that are easily pulled) on the side of the door you can only push. Talk about poor user interface design!
          • Re: (Score:3, Insightful)

            by lhand ( 30548 )
            "I often Push on the Pull doors, just to see if they work...."

            Yup. Same here.

            I once walked up to the doors in a parking garage. The sign above the door knob said turn and push. So I did a pirouette and gave the door a shove.

            It opened. You should have seen the looks on the faces of the people around me.

            Anyway, back to the main topic.

            I suspect the "not invented here" problem is why these code repositories aren't used. Most of us programmers don't want to use someone else's code because we have to first fi
          • Standard QA, am I right?
      • by Lazerf4rt ( 969888 ) on Monday February 05, 2007 @12:06PM (#17890320)

        I don't think people are giving programmers enough credit for having common sense, and this project to reduce code re-implementation sounds pretty idealistic. I don't know if I call bullshit on it, but I smell a few flawed assumptions.

        First flawed assumption seems to be that the hard part in re-using code is simply to find it. But that's crap. When code is in the form where it can even be re-used, it's called a module, and a big chunk of this code is the module's interface. The interface is what lets you re-use it. But there are huge differences between interfaces. There are different calling conventions, different parameter orderings, different limitations on thread re-entry, different permissions on order of things you're allowed to call, and entirely different approaches to specifying the interface. A streaming library can have a public function Read() or it can have a pair of public functions BeginRead()/EndRead(), and there are many valid reasons for both cases.

        Point being, you often have to refactor a module's interface before you can fit it into your project, and depending on the size and purpose of that module, the refactoring might be just as slow as writing a new implementation.

        Second flawed assumption is that developers aren't even able to find the code which they need to find today. But that's crap too because there are a lot of great, re-useable libraries that programmers already commonly know about, or can easily find through Google or Sourceforge. First of all, the standard C/C++ libraries give you a lot. Then there's zlib, curl, glut, Allegro, and hundreds (thousands, depending on your standards) more, depending on what you're doing.

        Come to think of it, when you really want to re-use code, you look for libraries, not source code. Searching for source code mainly helps people who want to learn programming.

        I know that the site linked in the summary only contains Java code right now, and I'm mainly focusing on C/C++ here but I think a lot of what I said applies. (Don't tell me that Java's automatic turning of each class into a monitor solves the thread re-entry problem, because that really just substitutes one problem for another.)

        • Searching for source code mainly helps people who want to learn programming.

          Ugh.... That makes me sick just thinking about it. Because there is no hyperlinking, there is no way to build human quality judgments into the search. The search results will be of average quality, which is poor.

          That means that programmers with good taste will refuse to wade through the sludge returned by code searches. They will much prefer to wade through lots of good, irrelevant code looking for something relevant than t

        • First flawed assumption seems to be that the hard part in re-using code is simply to find it.

          Second flawed assumption is that developers aren't even able to find the code which they need to find.


          I guess the third flawed assumption was mine, that the first and second assumptions were going to be different?
    • I don't think that is true in all cases. Often others' code has subtle bugs, or implementation details that cause problems later on. While you might get a boost in productivity using someone else's code, that can quickly be lost when you have to go back and either fix your code due to some side effect or fix the other person's code as a result of bugs, short cuts, etc.

      Moreover, with differences in coding styles, it can be hard to wrap your head around someone else's code. A solution is to use standard idiom
      • Often others' code has subtle bugs, or implementation details that cause problems later on.

        I hate to be the one to tell you this, but your code does too.

    • by j00r0m4nc3r ( 959816 ) on Monday February 05, 2007 @11:25AM (#17889764)
      I think a system like this might work if it has a user feedback system, where particular authors get a good reputation by positive feedback. So you know that an implementation is probably good if that author has good ratings. Think about system libraries. Nobody (well, mostly nobody) writes their own implementation of system libs, because they trust (usually) the implementation provided by the OS or the compiler. Why do they trust? Why is a Microsoft routine more trustworthy than S00p3rC0d3r's? Just because those functions have been tested by users over and over again. And they're (usually) well-documented.
      • I would like to point out that system libraries can contain coding errors themselves. One example is Apple's malloc(). It opened a file for debugging with the programs permissions, leading to a security hole in all setuid programs that need to allocate memory.source [apple.com]
        • Unlike Programmer Bob's reimplemented malloc, which contains 12 new security holes, degrades to O(n) performance and fragments memory badly.

          The point is: system libraries are ALMOST always better than anything you will be able to write yourself, unless you already specialise in writing such libraries.
          • by epine ( 68316 )

            The point is: system libraries are ALMOST always better than anything you will be able to write yourself, unless you already specialise in writing such libraries.

            If it were that simple, theory and practice would not sleeping in separate beds.

            I recall an interview where Theo claimed that a substantial number of bugs discovered in the OpenBSD code audit were created because the programmer failed to understand the library or system API they were programming against. We're talking about simple libraries here,

    • Re: (Score:2, Informative)

      Coders typically reinvent the wheel because it is usually easier to rewrite something than it is to learn how the existing code works. That actually leads into my main problem with using code search like this to try to promote more code reuse, namely, trust. The search engine is going to need to provide some way for me to make a judgment on how well written and bug free I can expect the code to be. Joel Spolksy of Joel On Software has said that if you are writing an application that is mission critical t
    • by tb()ne ( 625102 )

      I'm not a coder, but my impression of the vast majority of coders is that they reinvent the wheel because they believe that everyone screwed up their wheel implementation and if no one is going to do it right, they should.

      I've seen that happen numerous times but it is often just an excuse to cover the fact that they are too lazy to read and comprehend someone else's code.

      • by pla ( 258480 )
        I've seen that happen numerous times but it is often just an excuse to cover the fact that they are too lazy to read and comprehend someone else's code.

        Although I agree with you in essence, I find it odd to call someone "lazy" for doing more work...

        The real problem here comes from how much time it takes to really understand someone else's non-trivial code. A good coder, regardless of some mythical level of "laziness", needs to ask whether it will take longer to verify someone else's code, or to roll on
      • The origin of "hacking your own" starts with the most characteristic aspect of software development: unfamiliarity with the task and ignorance of the parts needed. Developers start with a small, limited idea of what they need and can't afford to sink a bunch of time into learning a large library or framework which might not suit their purposes. At this point, they're often accused of laziness. "You know you need to log stuff, so just download a logging library and use it." Yet it's too soon to do that
    • I've never really coded anything overly complicated outside of a school environment, but when I do I usually like to avoid using someone else's code. If I do, I usually only ever use it for reference purposes, so I know if I'm on the right track. Probably not the best way to do it, but for me, its not enough for me to just take someones code and integrate it into my project, but I need to understand it. Its considerably easier for me to debug my own code, then to try and debug someone elses, especially when
    • Re: (Score:3, Interesting)

      by wizbit ( 122290 )
      I agree, but with a slightly less hostile view to coders, since I am one:

      I like to code. I'm a programmer because I like to write code. If given the choice between learning and re-implementing somebody else's solution - for non-trivial tasks, this is usually the best way to go, but also takes a long time - and writing my own solution, I tend to gravitate toward the latter. Why? It usually is faster development-wise. Probably not in the long-run, since well-known code is also usually well-tested and my solut
    • I'm not a coder,

      So your out of your domain.

      IT was UNIX its self that came with the concept of "Software Tools" [amazon.com].

      It's the goal of any good programer to make simple tools that one can reuse over and over. But few can actuly do it, and do it well.

    • Here are some of the top reasons why I end up rejecting use of open source code:

      1. No description on the web site of features & system requirements.

      2. No documentation for how to install or use it.

      3. No stable release and/or still alpha.

      4. Written in PHP.

      Finding the code is the least of the problems.
    • by arivanov ( 12034 )
      You are misplacing your blame here mate. The blame is with pseudomanager wannabees, not coders.

      In order to use someone's else code you have to perform some research, evaluate it and even do a few proof of concept runs. Once this has been done you generally have to freeze the third party code as a part of your platform and develop on the frozen version. You also have to reevaluate this third party code before planning every release cycle. Research, evaluation, poc and maintenance of said third party code t

    • Obviously, you are not a coder. There are some coders who will reimplement everything. That is the classic NIH syndrome (back in the late 80's, early 90's, HP had it real bad).

      Others will ask will it work for me? If it will not, then they will redo one (interestingly, many of them will "borrow" design and code from L?GPL version. I have seen it in 3 places now).

      If it will work AND is not the core slowdown, then they will use.
      If it works, but it is too slow due to being too general, then they will redevel
  • by knightmad ( 931578 ) on Monday February 05, 2007 @11:18AM (#17889672)
    1) "Java Only for now, more coming soon!"
    2) "Alpha"
    3) The linked article is a "product announcement" on Newsforge

    This is slashvertisement for a vaporware product. Although this is promising, there is nothing concrete there to call it "what we need to unify the open-source community", not even an alternative to Google codesearch.

    Btw, is alpha the new beta?
    • No alpha is the old beta.

      An unworking game should never have been called alpha. When companies beta test software it tends to be what early alpha testing is. When companies release software that's what used to be called beta testing (and in the old days there was gamma testing in addition).

      However beta (and gamma) took time and money... and what company wants to spend that.
  • by hacker ( 14635 ) <hacker@gnu-designs.com> on Monday February 05, 2007 @11:22AM (#17889728)

    If we create this grand, uber code-searching portal, which can search the context of the code, aren't we making it easier for commercial entities to go ahead and and pick and choose those bits of code to use in their products, knowing full well that they're going to violate the GPL (or other OSS licensing models) by doing so?

    I've talked to NO LESS THAN a dozen commercial companies in the last 2-3 years where they're actively taking FOSS source and incorporating it into their products, because.. (and I quote) "..Its freeware, so we can use it however we wish."

    The licensing differences between "freeware" and "free software" seem to escape them. Just google around and you'll see thousands of FOSS projects listed on sites like TUCOWS, download.com and others, as "freeware" and not the proper "free software" that they are. There are also people who think "free software" means just that (lowercase "F" there).

    Let's be sure that if we have a search engine that let's brainless developers look like experts by cutting and pasting bits of OSS code from here and there together to make their software work, that they know what the license is and that they must be in compliance with it to use it.

    Please?

    • by gravesb ( 967413 ) on Monday February 05, 2007 @11:34AM (#17889872) Homepage
      We shouldn't ignore a good idea just because it makes it easier for someone to do something illegal. There are laws to protect the code. I think the benefits will outweigh any loss from commercial companies stealing the code. I hope this does work out as well as it looks like it could.
    • I'm curious about these companies because un my experience, companies take licenses very seriously. Were these small companies or large companies? Were they clueless idiots in general, or just on licenses? Was this done out of malice or lack of understanding? Did you press the issue to them? If so, what was their reaction.

      I would love to hear more.
      • by Raenex ( 947668 )

        I'm curious about these companies because un my experience, companies take licenses very seriously.

        In my experience, it's up to the developers to be license-aware. I've never been told about the importance of respecting licenses or to be careful about using third-party software.

    • by Anonymous Coward
      If these companies release public binaries, and you feel what they are doing is morally wrong, you could consider anonymously blowing the whistle on them to people who are capable of analyzing the binaries and who can take it from there legally speaking. FSF or some similar organization would seem to be a prime candidate. Some people seem to be very skilled at recognizing open-source code even in binaries - mostly, I believe, by looking for embedded strings and similar - but they probably would be more effe
    • So wait, you're concerned about possible copyright violation due to internet distribution, and want to make sure that if the work is distributed on the internet, it has the appropriate precautions to make sure people don't violate the license? ... because that's exactly why the record industry supports DRM.
    • Re: (Score:3, Informative)

      by Kjella ( 173770 )
      I've talked to NO LESS THAN a dozen commercial companies in the last 2-3 years where they're actively taking FOSS source and incorporating it into their products, because.. (and I quote) "..Its freeware, so we can use it however we wish." The licensing differences between "freeware" and "free software" seem to escape them.

      A 150,000$ dollar lawsuit in RIAA style (multiply by number of source files as separate 'works' if you like) and a cease-and-desist forcing them to stop shipping their product, along with
      • I got the impression that the companies weren't using the code entirely, just lifting bits and pieces here and there (e.g., not a whole image library, but maybe the code to read/write JPEGs). Which is just as illegal, but might be a bit harder to prove...

        Of course, it's Monday, so what do I know...
    • by Wildclaw ( 15718 )
      Seeing commercial companies violating licenses like that sucks. It is not unexpected due to the fact that coorporatism is a breeding ground for greed and selfishness, and buisness laws often fail to protect the rights of those who aren't in the business of making big money themselves.

      Categorizing "free software" under freeware however isn't really incorrect. The only license part of "free software" that could cast into doubt the validity of "free software" being freeware is the redistribution part. "Free so
    • Koders.com [koders.com] lets you search by license.
    • I've talked to NO LESS THAN a dozen commercial companies in the last 2-3 years where they're actively taking FOSS source and incorporating it into their products, because.. (and I quote) "..Its freeware, so we can use it however we wish."

      <shrug> That's what you get for calling your software free when it's not. </shrug>

      Note to mods about to mark me down because they don't agree: would you be thinking the same if I were objecting to the phrase "intellectual property theft"?

  • Dependency Rejection (Score:4, Interesting)

    by aprilsound ( 412645 ) on Monday February 05, 2007 @11:24AM (#17889752) Homepage
    I'm of the opinion that most frameworks and libraries don't reimplement enough. In the FOSS Java world, everyone has a dependency on some version of Apache Commons Lang. For what? Just so we don't have to write StringUtils.isEmpty?

    Don't get me wrong, if you're developing a stand alone project that wont be a dependency for someone else, then you absolutely want to rewrite as little code as possible. Let someone else maintain as much of your codebase as possible. But if you are writing something that other projects will be using as a dependency, don't you dare make me download four other libraries just to run your code. Write your own dang StringUtils or, if you're lazy and your project is GPLed, just copy the code.

    • Re: (Score:3, Interesting)

      Or better yet, ship a "minimal dependency library". That has all of the minimal sets of dependencies inside of it (either of real code or replacement code). Thus if you are using "Apache Commons Lang" already, it'll pick and use that. If not, you can use the drop in replacement code.

      Kirby

    • by anarxia ( 651289 )
      Most apache projects use maven so dependencies are handled automatically for them. The learning curve for maven is steep but it's well worth it.
  • Phew (Score:1, Funny)

    by Anonymous Coward
    real programmers use CPAN
  • by Timesprout ( 579035 ) on Monday February 05, 2007 @11:28AM (#17889802)
    I just ran a search for "the 500,000 lines of code I need to finish by friday all the stupid extra features the PHB wanted after we had set a deadline based on the original spec".

    0 results, rather disappointingly.
  • So what if a particular solution has been implemented before? Continuously writing code should keep people on their feet, and ready for when they have to respond to a unique situation, right?
    • Re: (Score:3, Insightful)

      by tb()ne ( 625102 )

      So what if a particular solution has been implemented before? Continuously writing code should keep people on their feet, and ready for when they have to respond to a unique situation, right?

      Except that most people have jobs with deadlines. Besides, there's always more code to write. Even if I pull available code out of a repository, I'll still be continuously writing code. Furthermore, there's value in seeing how other people implement a solution because it is probably not exactly the way you would have implemented it and you just might learn something from their solution.

  • Now we'll see the One True Wheel (tm) which will fit on all cars.
    Screw different tires for different conditions, screw differing performance.
    The One Wheel will rule us all.
    • No, not quite.

      1. Use masking tape (generic) to fit The One True Wheel(tm) to your axle (proprietary)
      2. Use a knife (kitchen) and some cement (Home Depot) to give The One True Tire(tm) the correct grooves for the road condition.
      3. Profit!
  • I am an embedded engineer. Various firms I have worked for have tried to implement some kind of "reuse" code store. But every time real-time considerations and platform specifics have derailed it (thankfully) in the early stages. At the low-level so called "code reuse" is (IMHO) a nothing short of a right royal pain in the neck. It looks good on paper, managers like the concept - but it is impossible to implement without large amounts of hardware abstraction. Maybe it makes more sense further up the SW t
  • The reason people often roll their own "trivial bits" of software is not so they get the best quality. It is so the developers have full comprehension of the code and that it follows the guidelines they use for their projects.

    There are enough faulty assumptions about code you developed internally. Why should one keep trying to cram the square peg into the round hole. Sure you can shave it down, refactor the block all you want. But it may well prove to be faster and better to craft the piece you need.

    Div
    • by RonBurk ( 543988 )
      Exactly what Dareth said! And "follows the guidelines they use for their projects" does not just refer to how the code is indented/commented. It can mean substantive issues like: pluggable into a test harness, memory management policy, thread safety, error handling policy, avoiding recursion, etc.
  • Licensing (Score:5, Insightful)

    by maxwell demon ( 590494 ) on Monday February 05, 2007 @11:35AM (#17889894) Journal
    I think in order to be really useful for not reinventing the wheel, it should allow intelligent searching for licensing. That is, it should allow to restrict your search to codes with certain licenses, or even better, to code under a license compatible with any given license (or set of licenses).

    For example, if you are working on code which you want to release as BSD, it's not much help if you find code licensed under the GPL, even if that code on its own is great. Likely, if you are writing GPLed code, you are not interested in code under licenses incompatible with the GPL (like e.g. the MPL).

    Of course, the search engine cannot make a guarantee that the license will fit your needs, but then, it cannot guarantee that about the code's functionality either.
    • You can still find out how the GPL guy done something that may be hard for you to do and then reimplement it under whatever license you want.
      • Re: (Score:1, Funny)

        by Anonymous Coward
        When I worked at SCO we used to do that all the time, works great... I can specifically recommend the linux kernel, it's just so wel documented. (Yes, I am joking and probably flamebait)
  • Well.... (Score:4, Insightful)

    by VE3OGG ( 1034632 ) <`VE3OGG' `at' `rac.ca'> on Monday February 05, 2007 @11:37AM (#17889924)
    While the article mentions that too much time is spent re-implementing new code, I disagree that this is necessarily a bad thing (tm). Re-inventing the wheel can often cause evolution of code, as opposed to the stagnation that can occur if something remains static. Now, of course people will say that this is GPL code, and people can then modify it -- this is of course true, but modification on that level seldom equates to evolution per se, sometimes because the changes as specific to the application, sometimes because you are trying to do something with code that simply wasn't designed for (I guess you could equate it to trying to run a a web server from Windows 95).
    • How can code by "evolving" if you're throwing it away and doing it again?

      Perhaps you mean that people can look at existing code and take the good ideas and do it better.

      But that's not evolution. And the problem is that thinking you can do it better, doesn't actually mean you can. At most, it will probably be "more suitable for the way I think and the constraints I do/don't care about".
  • Too much time? (Score:3, Insightful)

    by jmagar.com ( 67146 ) on Monday February 05, 2007 @11:37AM (#17889934) Homepage
    I wonder if writing it yourself is a time saver. Some reuse is okay, IF you have intimate knowledge to the code in question... but finding code that "might" fit the current problem is risky as the Dev is unable to truly assess whether or not the code it appropriate. The undocumented bugs, or system assumptions, will lead to using code and then countless hours debugging problems you didn't expect. Now you are debugging code you didn't write, and likely taking more time understanding than it would have taken to build it yourself.

    There is probably a point at which the system complexity of the resued code becomes great enough that the re-use is valuable. But how big, and how mature the reusable codebase, affects this decision.

    • Re: (Score:2, Informative)

      by tppublic ( 899574 )
      I wonder if writing it yourself is a time saver.

      Like most questions, the appropriate answer is "it depends". Take an example: I just spent yesterday rewriting a single class to fit into a standardized library. After 20 minutes of coding, 1 hour of documenting, and 2 hours of writing tests, I actually have something that meets the library standards. Could I have used the original class? Sure. But it had problems and inconsistencies. The main problem is that most open source code goes through the codi

  • I tried searching for more general level stuff.

    "Radiosity"

    gave me several pages of "Radix Sort".

    For inexperienced reader: these are not related at all. One is a general sorting algorithm in computer science and one is lightning algorithm used in computer graphics, and games like Quake.

    So in my guess it just does a fuzzy search and yields more results. Getting more results, which are not the ones you want won't help you one bit. Useless for me.
  • ...GCC is. A Compiler and a complete set of libraries for most mainstream languages, along with the source. You have everything you need to write your application.

    No need to reinvent basic_string or itoa()

    But of course this is little help to those in my profession who insist upon wrapping up (obfuscating) everything they do in layers of 'abstraction'.

    Most of thier stuff is legacy code before it is even released. Me, I stick to the KISS philosophy. That way, when the FNG comes along he can understand
    • by hoggoth ( 414195 )
      > But of course this is little help to those in my profession who insist upon wrapping up (obfuscating) everything they do in layers of 'abstraction'.

      Several years ago I had my 'real world' introduction to object oriented programming when I worked on a financial management application developed and used by a large Wall St. brokerage house.

      The C++ code had classes 15-20 layers deep, each layer adding just one tiny bit of functionality to the layer it inherited from. It was a nightmare, taking 15-20 minute
  • This system would be a great feature of SourceForge. Finding all the common components in different projects to be factored out and share instead. I was always disappointed in SourceForge's lack of intelligence about the related contents of its different projects. This thing could find relevant code and import it into the integrated navigation.
  • It all just seems to task intensive.

    I'm committed to the better solution being better languages. The likes of python, ruby, boo (no lisp related debates please) add better features making coding denser and faster. Better tools and to some extent in the areas of GUI widgets components mean there is less re-work.

    The big gain is not having to search for what has been done already, learn it, tailor it.


  • 'binary diff', 'levenshtein distance' -- no hits.

    'morphological analyser' -- 1 hit (inappropriate)

    It's completely useless. Am I jumping to conclusions? Mm... no, I don't think so, it really is utterly useless.
  • One thing that I find disappointing with all the code search engines is they all treat them as regular text files, more or less.
    None of them seem to make an effort at understanding the code syntax.

    That's why a few years ago I wrote one for C/C++ code called http://csourcesearch.net/ [csourcesearch.net]

    I just did it as an experiment, and using all open source software and in my spare time, but I think it having the ability to syntactically know the difference between a comment, a function, a structure, etc. makes a big differen
  • Just because you find code doesn't mean that it works as advertised, doesn't have memory leaks, buffer overruns and so on.

    Code also isn't tagged with environmental issues: how much memory is used, STL or Boost requirements, what system functions are needed, library dependencies, etc.

    Code is often layered on top of custom libraries. Sure, here's code to render HTML, but it needs a dozen custom data structure modules from the Netscape code base, for example.
  • What code search engines don't tell us is sometimes the code you get just sucks ... or is poorly adapted to your need.

    i usually search for a function to say, validate a string for phone ... or ... homemade browser or ... whatever, just function to help me do what im trying to do without reinventing the wheel because i know its been done already.

    but most of the time when you do find it chances are it wont be adapted to your need, so much that you end up writing your own based on what you've seen. which expla
    • What code search engines don't tell us is sometimes the code you get just sucks ... or is poorly adapted to your need.
      And what text search engines don't tell us is sometimes the text you get just sucks ... or is only remotely relevant to your need.

      Which doesn't mean those search engines are not useful.
      • Yes i know but the point of text search engine is information in general, not to allow you to plug what you read in your term paper as-is.

        whereas the point of code search engine is to allow you to re-use code that's already been written so that you don't have to paraphrase what you've just seen.

        I know its hard, or even unfeasible to build code that answer everybody's need, let alone, expect someone else to write a code that'll work for his project AND my project.

        im just saying, if we're gonna talk about eng
  • So far, every search I have made, from names of people, companies, class names, etc. yielded results that didn't even contain the search term(s).

    Fuzzy searching is one thing, but at least TRY to get an exact match and let me know when you're just taking the first 2 characters of a search term.
  • Two points.

    First, in some cases taking a chunk of somebody else's code is a great time saver. It can be problematic, however, when the code doesn't do what you want and then you have to jump into it. Is there anything more painful than trying to understand somebody else's code? On rare occasions it's easy - good comments, good style, good structure. In most cases it's a pain in the ass, even if they've taken the effort to write good comments and nice code. In these cases one has to make the call on whe
  • I worked for a company, that wrote its own distributed computing system (in Java/XML). It sucked awfully by all measures (latency, CPU-load, memory requirements, bandwidth), but they would not dump it in favor of PVM [ornl.gov] or one of the MPI [anl.gov] implementations because:

    1. We don't know, who wrote that PVM thing and how to support it.
    2. The guys, who wrote our own system are both really nice and dumping their work would offend them...

    This is such a common problem, there is a term [wikipedia.org] for it...

  • Rather than use this as a product announcement, they should have quietly rolled it out. Gotten more than the Java repository going, then announced.

    Rating: Meh

  • by simm1701 ( 835424 ) on Monday February 05, 2007 @12:14PM (#17890438)
    Is the code easy to find? Will a quick search of sensible key words take me to a short list of results with high accuracy? No point in spending an hour wading through results that may or may not be useful when I can implement and test it myself in 2 hours.

    Is the license clear? I may eventually want to release as open source or commercially use something I write. If I include someone else's code/library I have to make a note (hopefully in the LICENSE file provided with the code or in the top of the code comments) on what the license is. Is it BSD, GPL, public domain, not stated or some commercial license that lets me look at the code but not use it myself?

    Is the code self contained? This generally means does it come as a library. I dont like copying and pasting code into my code - especially if its not the same coding practice as my own. (this comes abck to licenses above - if its self contained and with an incompatable license atleast I can rip and replace later if I need to)

    Is the code well known? Is if the defacto standard for doing this type of thing (STL, Perl core, glibc)? Or is it one of several well known options for the same thing (gtk, qt, kde)? Or is it an unknown? This will help you know how well this code is field tested already - I don't like signing up to be someone else's beta tester for free!

    Is the code still maintained? Is this an active company with a project? Or a group on source forge? Are the developers still around and the forums active? If I need a new feature further down the line is there chance of support? I don't usually want to pick up the whole dead weight of supporting unsupported code that I didn't write if I can avoid it.

    Can I use it as is? It frequently takes longer to modify an "almost there" modules to do what you need than it would have done to reimplement the wheel as it were and write it yourself first time, and writing it yourself will atleast make future debugging easier assuming you have a good memory for design and good coding practices.

    Is there documentation? The old comparison about documentation and sex, when its good its very very good and when its bad its better than nothing. I dont want to have to read someone elses uncommented un documented code just to evaluate if it might work for me. I want to be able to read a good overview of the library, its functions, methods, attributes, errors and exceptions - CPAN is an excellent (in most cases) example of what I mean.

    Thats a pretty hard list of requirements to meet - true it shouldn't be, but this is the real world. If those requires are not met then odds are it will be less effort in the long run for more reward for me to implement it myself.
  • ... sounds to me like a thoroughly bad idea.

    There's a reason for things like specifications, documentation, source control, testing, etc.

    Maybe you'd rather google for popular home remedies rather than consult a professional doctor?

    I'd google for *algorithms* if I was at a loss, but I'd certainly want to implement them myself.

    What is useful in terms of code reuse is more controlled coherent collections of code that are highly tested, documented and generally controlled, such as the C++ Boost collection, but
  • More languages Summer of 2007?

    You product announce in Feb 2007, and expect people to remember to "check back with you" 6 months later? Let me know how that works for you. This crowd will have forgotten you by then because someone else will have done it properly.

    Languages to get up RIGHT NOW and screw "Summer of 2007":

    • JavaScript
    • Perl
    • Python
    • SQL
    • MDX
    • C (++ and #)
    • VB (.net and legacy)
    • COBOL
    • FORTRAN
    • And many others where large libraries already exist...

    The problem is, this outfit is already trying to be in

  • 1st of all it's a dupe. Aprox. a year old I suspect.
    2dn: Frameworks, standardised open source Application stacks and comprehensive documentation are what speed up coding and developement. Copy/Pasting foreign code rarely helps.
    3rd: It's only for Java.
  • I find it somewhat ironic that there are Google Adwords Ads all over this "new open source coded" search engine... I mean don't they aim to take Google down?
  • by pbhogan ( 976384 ) on Monday February 05, 2007 @01:12PM (#17891312)
    While I certainly would welcome anything that could help me find code, the reason I'd want it is to find reference code, not reusable code. I've been programming for, oh, two decades now and one thing I find myself doing constantly is finding a bunch of libraries or bits of code and coming to the conclusion that I should just write it myself because of one of the following:

    1. The library/code is good, but doesn't quite work the way I want it to
    2. The library/code is close, but getting it to work the way I want is painful
    3. The library/code is bad
    4. The documentation is bad/nonexistent
    5. The license is prohibitive or annoying (i.e. it's not LGPL or BSD or the like)
    6. I enjoy writing code and sometimes I feel I could do it more elegantly, or efficiently (I might just want a very specific and optimized part of it)

    More often than than not though I just enjoy coding and I love learning to code by writing new code. The black box thing... eh... I like to tinker under the hood and find out how things work.

    But my point is that finding code is not that hard. It's finding code that fits *exactly* what we want. Code is usually just not quite as modular as we'd like to believe and, if we're honest, as programmers we have a certain vanity about writing code so it does things My Way. :)
  • this is the same thing, except this one is developed by students. http://sourcerer.ics.uci.edu/ [uci.edu] it crawls sourceforge and lparses the code, the creator, etc. ITs in beta stages of course, but it has great potential.
  • And documentation can be searched quite nicely on Google. What's the running time of this algorithm? What are the limits on the inputs? How to change traversal of the data structure to different order? Without a clear answer, maintaining the code may well be more difficult than writing it from scratch.
  • .... reasons.

    source is done in various languages, modifying existing code to fit a new project can be more prone to creating bugs then doing it fresh, code might be useful in a way it was not original though of being used (creating another aspect of search...), a there may be licensing issues, etc..

    Ultimately what we need is a higher level abstraction machine BUT disconnected from actual code. A level of abstraction that can be used to define the objective and constraints required and apply it to a code fra
  • Ignoring the slashvertisement, I think the real reason people tend not to reuse code is because any code they find will be either (1) broken, or (2) not up to the specific task and also broken. With rare exceptions, all code is broken to some degree, including yours (and including mine). Newer code tends to be slightly less broken about older code, as more people find new ways to break things; but however much some CS professors like to go on about OOP or whatever the latest fad is, the art of software en

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...