Comment Email response (Score 5, Interesting) 161
> Here's a laundry list of why your O'Caml program in inefficient:
>
> 1. You use lists. Lists aren't designed to be fast (computationally)
> to use. They're designed to be fast (programmatically) to use. You'll
> be hard pressed to find a production, speed-sensitive Lisp or O'Caml
> program that uses lists.
Okay... but here's my point: Every single example that shows how elegant Haskell and OCaml are uses lists. The 4-line Quicksort example for Haskell uses lists. All of the code that demonstrates easy reuse of functions and functions taken as arguments uses lists (like how easy it is to implement quite complicated algorithms using only map and filter, for example).
So, proponents say "Everyone should use functional languages because they can express complicated problems in elegant ways and result in cleaner, more reusable code."
But what you're saying in #1 above is that in "production," speed-sensitive code, no one is using lists... this would mean that no one is using map, filter, or any other pieces of reusable primitive code. So, they are instead all using mutable data structures... I.e., they are programming with side-effects and loops (random access instead of recursion, even when ever element of an array/list needs to be accessed/processed).
That was my point exactly. If you write elegant OCaml code using all of the lovely (and I mean lovely, really) tricks that they present when they demonstrate why OCaml is cool, you end up with code that is too slow to use in the real world.
I would say that my C++ (or most would call it C) implementation is elegant enough... easy to understand... no messy optimization tricks. Sure, I'm not using objects and templates everywhere, but these structures are hardly needed to solve this simple problem.
> 2. Practically none of your functions are written tail-recursively.
Good point.
> 2.5. You use a list append (@) inside a loop (generateStates).
> List.append is O(m), where m is the length of its first argument. If
> you write an implementation, you'll see why. It probably doesn't make
> much of a difference here (generateStates is only called once) but it's
> something to watch out for.
Of course, as you point out, generateStates has almost no effect on the running time. However, I wonder how you might implement that in an elegant way in OCaml without @. In C, I just looped over all numbers between 0 and 2^stateLength and converted the bit representations for the numbers to cell on/off states.
> 3. For Pete's sake, man, you're using an association list for your
> memos! Surely you know that lookup in an association list is O(n) in
> the size of the list.
I simply Googled for "memoization Ocaml" and found that code:
http://www.emeraldtiger.net/modules.php?op= modload &name=News&file=article&sid=9
The author pointed out how "sweet" polymorphism is... one block of code that can be used to memoize any function. Sweet indeed, and it certainly sped up my OCaml code a lot (without memoization, it was so slow as to be intractable for anything larger than about 4x4).
So... maybe you can re-write higher-order memoization code using more efficient data structures? I would love to see that code, and I'm sure the OCaml community would benefit from having that in their toolbox.
I agree that the memoization code is probably the problem in the OCaml version. However, this code came directly from the OCaml community and was the *only* example of memoization in OCaml that I could find.
For Haskell, I used an infinite list of results that was filled in lazily as the results were needed. This also sped up the algorithm dramatically. However, I cannot get a Haskell compiler to compile itself on my platform, so I was testing all code in the Hugs interpreter, which made it too slow to be practical. Isomorphic compiled OCaml code was hundreds of times faster, as I recall. Of course, memoized Hugs beats unmemoized OCaml (and memoized OCaml beats unmemoized c++)---the exponential factor completely drowns out any effects that can be attributed to implementation details.
> Your program would likely be faster if you
> *didn't* memoize, since your lookups (once those lists get to be any
> significant size) are probably more expensive than calculating the
> value again.
Trust me.... memoization, though it might be slow, takes you from an O(2^(m*n)) algorithm to an O(2^(m)) algorithm. It makes a HUGE difference. The un-memoized OCaml algorithm is basically intractable (for a 10x10 grid, it is doing exhaustive search on a 2^100 state space... what is that about particles in the known universe?) So, memoization (even slow memoization) is necessary. And, I wrote the OCamle code without memoization first and tested it extensively for correctness. I added memoization later and noticed a huge speedup.
> It's worth noting that your C++ memo is just 3 constant-time
> indirections. No wonder it mops the floor with your O'Caml code.
Agreed.
> 4. You aren't memoizing isCovered. You may think you are, but you're
> not. You may have bound memoized_isCovered in it, but you don't use it
> -- you just call isCovered (the still-unmemoized version) inside
> itself.
I tried memoizing isCovered, but I removed that code because it was not practical... the state space is simply too large (I also gave up on memoizing it in the C++ version). The problem is that you are indexing based on 3 vectors, and each vector can have 2^m possible states. So, for example, if m=10, you have over a billion possible states... that would take at least a Gig of memory to memoize.
> 5. You aren't completely memoizing optLayout. You may think you are,
> but you're not. When you call optLayout instead of memoized_optLayout,
> you're not memoizing, and I can see at least one significant place
> where you are still calling optLayout from within optLayout,
Couldn't find this inside optLayout (except where I define memoized_optLayout). Can you give me a line reference.
>and another outside of it.
See that call... not sure that it matters because it isn't part of the recursion. Just tried changing it, and it doesn't change the running time at all.
> I think the moral of this story is that you really ought to talk to
> some O'Caml folks before you go around spreading falsehoods, of all
> places, at slashdot, home of the gullible and insipid (not to mention
> loud) wannabe geek masses.
I don't think I spread any falsehoods. I mean, my experiment was real, and the results are real, and the code is there for people to inspect and try on their own. I also talked in my /. post about OCaml code that is isomorphic to C being fast, but functional code perhaps not being fast.
Somebody on /. just translated my C code into OCaml, using a fixed-size array for the memo table and everything. Of course, the code is comparable to C in terms of speed (9 seconds vs. 6 seconds on a 7x7 grid). But I am not interested in translating my C programs directly into OCaml.
> If you fix the major issues with your non-memoization,
Tried changing those calls... no effect on speed.
> and your inefficient memoization,
This is the tough one... can this be done efficiently in an elegant way (i.e., without using a fixed-size array)?
> your O'Caml code should run within at least an
> order of magnitude of your C++ code. The rest of the speed you could
> easily get by using an array to represent the garden, which is a much
> more appropriate data structure for this problem anyway.
>
> I'm no blogger, but it seems the appropriate thing for you to do would
> be to suck up your pride and post about how you were gravely mistaken
> about the nature of your O'Caml code :)
I have no problem doing this... However, the point of my post was that if you write *elegant* OCaml using all of the nice functional tricks (including memoization tricks from OCaml bulletin boards), you cannot necessarily expect reasonable performance.
Trust me... I am *dying* to use OCaml or Haskell for real-world programming. I have spent the past month or so exploring these languages and trying to apply them to real programming problems. Especially when shootout results showed that OCaml was sometimes faster than C, and when I discovered that OCaml was much faster than Hasell, I was really starting to think that OCaml was a possibility.
However, the ONLY reason why I would want to use OCaml is to take advantage of the expressiveness of pure functional programming. There is no reason for me to switch from imperative C/C++ to imperative OCaml... from loops and arrays in C to loops and arrays in OCaml. Functional evangelists make their case in terms of elegance, abstraction, reuse, and type safety.
So, I thought that perhaps functional OCaml might be comparable to C in terms of performance. Maybe a bit slower... If it was 4 times slower, I think that would be too slow... but on a real problem, I found it to be hundreds of times slower.
And OCaml was hundreds of times faster than Haskell. Poor Haskell... so beautiful, yet no one out there has written a proper implementation for you!
>
> 1. You use lists. Lists aren't designed to be fast (computationally)
> to use. They're designed to be fast (programmatically) to use. You'll
> be hard pressed to find a production, speed-sensitive Lisp or O'Caml
> program that uses lists.
Okay... but here's my point: Every single example that shows how elegant Haskell and OCaml are uses lists. The 4-line Quicksort example for Haskell uses lists. All of the code that demonstrates easy reuse of functions and functions taken as arguments uses lists (like how easy it is to implement quite complicated algorithms using only map and filter, for example).
So, proponents say "Everyone should use functional languages because they can express complicated problems in elegant ways and result in cleaner, more reusable code."
But what you're saying in #1 above is that in "production," speed-sensitive code, no one is using lists... this would mean that no one is using map, filter, or any other pieces of reusable primitive code. So, they are instead all using mutable data structures... I.e., they are programming with side-effects and loops (random access instead of recursion, even when ever element of an array/list needs to be accessed/processed).
That was my point exactly. If you write elegant OCaml code using all of the lovely (and I mean lovely, really) tricks that they present when they demonstrate why OCaml is cool, you end up with code that is too slow to use in the real world.
I would say that my C++ (or most would call it C) implementation is elegant enough... easy to understand... no messy optimization tricks. Sure, I'm not using objects and templates everywhere, but these structures are hardly needed to solve this simple problem.
> 2. Practically none of your functions are written tail-recursively.
Good point.
> 2.5. You use a list append (@) inside a loop (generateStates).
> List.append is O(m), where m is the length of its first argument. If
> you write an implementation, you'll see why. It probably doesn't make
> much of a difference here (generateStates is only called once) but it's
> something to watch out for.
Of course, as you point out, generateStates has almost no effect on the running time. However, I wonder how you might implement that in an elegant way in OCaml without @. In C, I just looped over all numbers between 0 and 2^stateLength and converted the bit representations for the numbers to cell on/off states.
> 3. For Pete's sake, man, you're using an association list for your
> memos! Surely you know that lookup in an association list is O(n) in
> the size of the list.
I simply Googled for "memoization Ocaml" and found that code:
http://www.emeraldtiger.net/modules.php?op
The author pointed out how "sweet" polymorphism is... one block of code that can be used to memoize any function. Sweet indeed, and it certainly sped up my OCaml code a lot (without memoization, it was so slow as to be intractable for anything larger than about 4x4).
So... maybe you can re-write higher-order memoization code using more efficient data structures? I would love to see that code, and I'm sure the OCaml community would benefit from having that in their toolbox.
I agree that the memoization code is probably the problem in the OCaml version. However, this code came directly from the OCaml community and was the *only* example of memoization in OCaml that I could find.
For Haskell, I used an infinite list of results that was filled in lazily as the results were needed. This also sped up the algorithm dramatically. However, I cannot get a Haskell compiler to compile itself on my platform, so I was testing all code in the Hugs interpreter, which made it too slow to be practical. Isomorphic compiled OCaml code was hundreds of times faster, as I recall. Of course, memoized Hugs beats unmemoized OCaml (and memoized OCaml beats unmemoized c++)---the exponential factor completely drowns out any effects that can be attributed to implementation details.
> Your program would likely be faster if you
> *didn't* memoize, since your lookups (once those lists get to be any
> significant size) are probably more expensive than calculating the
> value again.
Trust me.... memoization, though it might be slow, takes you from an O(2^(m*n)) algorithm to an O(2^(m)) algorithm. It makes a HUGE difference. The un-memoized OCaml algorithm is basically intractable (for a 10x10 grid, it is doing exhaustive search on a 2^100 state space... what is that about particles in the known universe?) So, memoization (even slow memoization) is necessary. And, I wrote the OCamle code without memoization first and tested it extensively for correctness. I added memoization later and noticed a huge speedup.
> It's worth noting that your C++ memo is just 3 constant-time
> indirections. No wonder it mops the floor with your O'Caml code.
Agreed.
> 4. You aren't memoizing isCovered. You may think you are, but you're
> not. You may have bound memoized_isCovered in it, but you don't use it
> -- you just call isCovered (the still-unmemoized version) inside
> itself.
I tried memoizing isCovered, but I removed that code because it was not practical... the state space is simply too large (I also gave up on memoizing it in the C++ version). The problem is that you are indexing based on 3 vectors, and each vector can have 2^m possible states. So, for example, if m=10, you have over a billion possible states... that would take at least a Gig of memory to memoize.
> 5. You aren't completely memoizing optLayout. You may think you are,
> but you're not. When you call optLayout instead of memoized_optLayout,
> you're not memoizing, and I can see at least one significant place
> where you are still calling optLayout from within optLayout,
Couldn't find this inside optLayout (except where I define memoized_optLayout). Can you give me a line reference.
>and another outside of it.
See that call... not sure that it matters because it isn't part of the recursion. Just tried changing it, and it doesn't change the running time at all.
> I think the moral of this story is that you really ought to talk to
> some O'Caml folks before you go around spreading falsehoods, of all
> places, at slashdot, home of the gullible and insipid (not to mention
> loud) wannabe geek masses.
I don't think I spread any falsehoods. I mean, my experiment was real, and the results are real, and the code is there for people to inspect and try on their own. I also talked in my
Somebody on
> If you fix the major issues with your non-memoization,
Tried changing those calls... no effect on speed.
> and your inefficient memoization,
This is the tough one... can this be done efficiently in an elegant way (i.e., without using a fixed-size array)?
> your O'Caml code should run within at least an
> order of magnitude of your C++ code. The rest of the speed you could
> easily get by using an array to represent the garden, which is a much
> more appropriate data structure for this problem anyway.
>
> I'm no blogger, but it seems the appropriate thing for you to do would
> be to suck up your pride and post about how you were gravely mistaken
> about the nature of your O'Caml code
I have no problem doing this... However, the point of my post was that if you write *elegant* OCaml using all of the nice functional tricks (including memoization tricks from OCaml bulletin boards), you cannot necessarily expect reasonable performance.
Trust me... I am *dying* to use OCaml or Haskell for real-world programming. I have spent the past month or so exploring these languages and trying to apply them to real programming problems. Especially when shootout results showed that OCaml was sometimes faster than C, and when I discovered that OCaml was much faster than Hasell, I was really starting to think that OCaml was a possibility.
However, the ONLY reason why I would want to use OCaml is to take advantage of the expressiveness of pure functional programming. There is no reason for me to switch from imperative C/C++ to imperative OCaml... from loops and arrays in C to loops and arrays in OCaml. Functional evangelists make their case in terms of elegance, abstraction, reuse, and type safety.
So, I thought that perhaps functional OCaml might be comparable to C in terms of performance. Maybe a bit slower... If it was 4 times slower, I think that would be too slow... but on a real problem, I found it to be hundreds of times slower.
And OCaml was hundreds of times faster than Haskell. Poor Haskell... so beautiful, yet no one out there has written a proper implementation for you!