Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Cloud Oracle IT Technology

Amazon's Move Off Oracle Caused Prime Day Outage in One of its Biggest Warehouses, Internal Report Says (cnbc.com) 130

Amazon is learning how hard it can be to move off of Oracle's database software. From a report: On Prime Day, while the e-retailer was dealing with a major website glitch that slowed sales, the company was also dealing with a technical problem in Ohio at one of its biggest warehouses, leading to thousands of delayed package deliveries, according to an internal report obtained by CNBC. The problem was in large part due to Amazon's migration from Oracle's database to its own technology, the documents show. The outage underscores the challenge Amazon faces as it looks to move completely off Oracle's database by 2020, and how difficult it is to re-create that level of reliability. It also shows that Oracle's database is more efficient in some aspects than Amazon's rival software, a point that Oracle will likely emphasize during this week's annual OpenWorld conference in San Francisco.
This discussion has been archived. No new comments can be posted.

Amazon's Move Off Oracle Caused Prime Day Outage in One of its Biggest Warehouses, Internal Report Says

Comments Filter:
  • Really? (Score:5, Insightful)

    by willaien ( 2494962 ) on Tuesday October 23, 2018 @12:13PM (#57524159)

    Was it just a regular outage that could have happened to anyone, or something very specific to their own infrastructure?

    Just because a change was made at some point in the past, you don't get to just assume that everything would have been fine if Change X or Y hadn't been made. Oracle isn't a silver bullet.

    • I use bussiness management products from oracle with an underlying oracle database. I feel like sometimes the IT department must not be shoveling enough coal into the boiler or something beacuse this antiquated inflexible interface just stalls all the time and very frequently has to go down for some sort of synchronization. It's slick like Amazon's web site. I don't understand why Oracle even exists given my experience with it.

      • by AlanBDee ( 2261976 ) on Tuesday October 23, 2018 @12:36PM (#57524343)

        I don't understand why Oracle even exists given my experience with it.

        Because it's a damn good database. The question isn't about it's capabilities, it's whether it's worth the cost. As for their other products I agree with you; it's way too sluggish. But I believe Amazon was just using their database.

        Now Amazon moving away from Oracle is a good thing; as servers get faster and the open source alternatives get better Oracle's database is losing it's foothold. I for one won't be sad to see that happen.

        • Re: (Score:2, Informative)

          by Anonymous Coward

          I think most people don't understand that the actual database product is rock solid. It's Oracle middleware that needs to die in a fire. That and their licensing which makes Microsoft look like the good guy. I don't understand how they can make a good dbms but fail so miserably on the middleware. Want a tomcat server that barely works? Get it from Oracle! Otherwise it'll work solid everywhere else.

          • I think most people don't understand that the actual database product is rock solid.

            You're right we don't understand that because we know better.

            • by Cederic ( 9623 )

              I've been working with Oracle databases for a couple of decades now.

              "rock solid" is an extremely good description of them.

              They're fucking expensive and some of the configuration is a royal pain in the arse but they work, they work well and they keep working.

              I wouldn't recommend anybody starting a business to actually use one, but that's completely and entirely due to cost and Oracle's business practices, and fuck all to do with the underlying technology.

          • Re: (Score:2, Interesting)

            by Anonymous Coward

            I think most people don't understand that the actual database product is rock solid. It's Oracle middleware that needs to die in a fire. That and their licensing which makes Microsoft look like the good guy. I don't understand how they can make a good dbms but fail so miserably on the middleware.

            The bulk of Oracle DB was made in the past, at a time when Oracle the company actually employed talented engineers, designers, and programmers.
            It really was built to be rock solid and with plenty of features to make heavy workloads a breeze.

            Sadly that time has long since past and is not the Oracle the company of today.

            A large portion of their middleware was either a 3rd party acquisition they purchased and had their off shore code monkeys try to integrate, or was actually made by said offshore code monkeys,

        • by amorsen ( 7485 )

          Note that the cost isn't just monetary. If you buy Oracle, you will forever have to fear their licensing antics. You never know when an audit might happen, and the licensing terms are so convoluted that you're likely in breach. Just to make it worse, the terms constantly change.

        • Because it's a damn good database. The question isn't about it's capabilities

          Actually, it is. The Oracle vs. Google lawsuit was about Oracle's wanting to use Java patents to hammer Google into cross-licensing its map-reduce patents so that Oracle could scale to the levels demanded by customers like Amazon. Cringely had a leaker years back confirming this.

          Google won that one, and now Amazon has broken free of Oracle.

          Personally I like it that my Subscribe-and-Save stopped taking 3 minutes to update an order

      • I feel like sometimes the IT department must not be shoveling enough coal into the boiler or something beacuse this antiquated inflexible interface just stalls all the time

        Ok, so imagine that, but worse. That was Prime Day. Hours on hours of not stalling, but simply not working at all.

        What you are describing sounds like maybe the devs aren't as good as they could be at optimizing, or maybe the company is stingy on hardware. What happened to Amazon was a world-class system brought to a halt simply because

        • by Anonymous Coward

          "What happened to Amazon was a world-class system brought to a halt simply because of too many users and the system fell over. That is something that Oracle is just better at handling (when it's administered right and has some powerful hardware at work, which Amazon has in spades for anything they stand up)."

          You seem to have not read the articles about Prime day, such as:

          https://www.cnbc.com/2018/07/19/amazon-internal-documents-what-caused-prime-day-crash-company-scramble.html

          Sable is:
          - Is not an RDBMS
          - Is

      • by hey! ( 33014 )

        It is really easy to screw up your Oracle database server. It's practically an operating system in itself, and there are multiple resource pools that, improperly managed, can starve various back end processes your DBA has barely even heard of. That said, properly managed it should handle heavy workloads for the iron you're running it on.

        This is why Oracle *doesn't* make sense for a lot of installations. You need DBAs who either have a great deal of arcane Oracle server management knowledge, or who have

    • Re:Really? (Score:5, Insightful)

      by Mr D from 63 ( 3395377 ) on Tuesday October 23, 2018 @12:31PM (#57524311)

      Was it just a regular outage that could have happened to anyone, or something very specific to their own infrastructure?

      Just because a change was made at some point in the past, you don't get to just assume that everything would have been fine if Change X or Y hadn't been made. Oracle isn't a silver bullet.

      This, and the obvious risk of issues anytime you make such a large change. You fix them and move on. "thousands of delayed packages" sounds like a blip for Amazon. Bad weather can do that.

    • Re:Really? (Score:5, Funny)

      by GameboyRMH ( 1153867 ) <gameboyrmh&gmail,com> on Tuesday October 23, 2018 @01:38PM (#57524741) Journal

      Oracle is a silver bullet if your wallet is made from werewolf fur!

    • Re:Really? (Score:5, Informative)

      by lgw ( 121541 ) on Tuesday October 23, 2018 @01:47PM (#57524811) Journal

      Was it just a regular outage that could have happened to anyone, or something very specific to their own infrastructure?

      Just because a change was made at some point in the past, you don't get to just assume that everything would have been fine if Change X or Y hadn't been made. Oracle isn't a silver bullet.

      I have some contacts at Amazon and can shed some light on this. Normally, Amazon retail prioritizes "Prime Day prep" above all else. Every team must prove they can stand up to the spike in load, and fill out lots of paperworks demonstrating they did adequate diligence. Rumor is that Prime Day was actually started as a way to do this exercise twice a year (and thus get better at it), rather than only for Christmas shopping.

      However, this year is different. Moving off Oracle has been made the first priority of every retail team (well, every one that uses Oracle in any way, which is most). No doubt that shift in priorities is what's at play here: given the thousands of teams, it's no surprise that some team somewhere dropped the ball given the conflicting priorities.

      So it's less about "Oracle was a silver bullet" and more about "changing stuff you don't usually change".

  • by dj245 ( 732906 ) on Tuesday October 23, 2018 @12:16PM (#57524183) Homepage
    Oracle: Don't you dare change to a competing product. Bad things will happen to you.
    • by ilsaloving ( 1534307 ) on Tuesday October 23, 2018 @12:22PM (#57524227)

      Apparently we need a +1 Ominous moderation.

    • by xxxJonBoyxxx ( 565205 ) on Tuesday October 23, 2018 @01:34PM (#57524705)
      The article proves that the short-term pain of dumping Oracle IS worth the gain.

      >> thousands of delayed package deliveries

      Leading to what...maybe $100K's of losses at a ridiculously inflated top-end? Vs. $100,000K's of savings from not having to write Oracle checks? I think that's a trade-off any smart business would take.
    • Oracle: Don't you dare change to a competing product. Bad things will happen to you.

      Right, and what "competing product" will they change to?

      • by Anonymous Coward
        PostgreSQL. All the features of Oracle (and complexity, if you need it) at a fraction of the cost.

        I'm betting that Amazon is switching to that. I would hope they are not switching to MySQL/MariaDB/PerconaDB.
        • by hey! ( 33014 )

          I like Postgres too, but it's not even close to having the features of Oracle. The problem is that using those features ties you to Oracle, so it's something you don't want to do casually (although Oracle likes you to).

          More features is not necessarily better, particularly when the features are non-standard, but some of the things Oracle does are actually quite useful. For example it's possible to fork and merge database versions, and the various versions of the database will share common database pages.

  • by Anonymous Coward

    So the only glitch was a short delay in a single warehouse?

    Sounds like a massive success story to me.

    • The failure is they don't know root cause, and they need better tools and capacity to manage savepoints with their new system.

      It sounds like a secondary failure is insufficient testing prior to rollout...

  • by ilsaloving ( 1534307 ) on Tuesday October 23, 2018 @12:25PM (#57524261)

    That phrase confused me.

    I can absolutely understand wanting to move off Oracle. But why would they re-invent the wheel and write their own database? At least, that's what it sounds like they're doing based on the way the article was phrased.

    Wouldn't it have been better to just switch to Postgres and use the oracle compatibility layer if they needed things like PL/SQL support?

    Ilsa

    • by jeff4747 ( 256583 ) on Tuesday October 23, 2018 @12:31PM (#57524315)

      https://en.wikipedia.org/wiki/... [wikipedia.org]

      They're developing their own technology because of implementing RDS. IIRC, RDS was originally a customized MySQL, and then they implemented Aurora.

    • by larkost ( 79011 )

      I presume that this is DynamoDB that we are talking about, so a Document Store (the typical NoQSL type) rather than a Relational database. At the scale the Amazon is using Postgres is simply not going to compete without having a lot of extra custom logic on the top. And once you are doing that, most of the advantage of something like Postgres is lost and Document Stores (and their inherent scalability) start to be better solutions.

    • by Hulfs ( 588819 ) on Tuesday October 23, 2018 @12:48PM (#57524415)

      Look up Amazon Aurora.

      They've basically created new a DBMS that runs on top of their cloud infrastructure and is optimized for their EBS (elastic block storage). They have Postgres and MySQL flavors of the database, both of which utilize the actual DB "engines", Amazon has written their own storage backends and added a bunch of other optimizations to the codebase (they've made most messaging asynchronous where possible). Because of the use of the actual database engines they claim 100% compatibility for both Postgres and MySQL. We use the MySQL flavor and haven't run into any compatibility issues with SQL queries or stored procs. Because of the performance optimizations inherent in how it was designed to run in their cloud, we were able to significantly reduce the amount of CPU/RAM utilized to run our application and still retain similar throughput - in essence, we were able to use a smaller RDS instance size, thus reducing our costs.

      One of the really nice things about it is virtually instant (and faultless) replication due to the way they rely on EBS itself to replicate data, rather than through a replication system sending queries (or binary data) to another remote system.

      • I would mod you +1 informative if I could. Thank you for that! I've seen Aurora but haven't had time to really explore it. And I didn't know they had expanded that to Postgres too.

    • by Hadlock ( 143607 )

      They have RDS, which is just managed postgres/mysql/maria,

      They also have Aurora, which is (I think) compatible with Postgres/mysql/maria, but designed from the ground up to run in the cloud.

      A lot of traditional software is designed to run on a traditional server, and has certain design constraints that follow you when you move to the cloud. Designing something to be both compatible but cloud-native has been an important step and both Amazon/Google have created this type of product, if Micro

  • by Darlok ( 131116 ) on Tuesday October 23, 2018 @12:26PM (#57524275)

    Between Java and their Enterprise platforms, if Oracle spent as much time listening and responding to their customers as they spent threatening them, they might be in a far better position today. Any major platform transition is going to have problems unless you're exceptionally lucky. There's just too many moving parts in Enterprise systems for humans to get everything right on the first try. Oracle won't tout all of the problems people have moving ONTO their software from a competitor, but that transition pain happens too.

    Every year that goes by, it seems like Oracle is in a more tenuous position, despite their increased revenue. They've already lost the SME space -- I don't know of a single company anywhere in our client base, or within my sphere of influence, that still uses Oracle software. Organizations are bumping up against the limits of NetSuite -- the costs to integrate 3rd-party or industry-specific components, compared with other ERPs, are turning out to be more significant than expected. So we have clients and vendors migrating ERPs over time.

    Oracle is becoming the Comcast of the software world. They treat everyone like crap, but were so deeply embedded that they were hard to dislodge. With every passing year, that is less true, and I think Oracle knows it. Unfortunately, they seem to be choosing to double-down on the "treat everyone like crap" strategy, rather than actually fixing the systemic problems that might eventually sink them...

    • I think Oracle sees the writing on the wall...

      Of course they do; that's why they're bullish on pineapples and fighter planes.

    • by ctilsie242 ( 4841247 ) on Tuesday October 23, 2018 @12:43PM (#57524393)

      The funny thing is that Oracle could get back into many peoples' good graces. If they offered ZFS under the GPL and allowed it to become part of the default Linux kernel, this would be one of the biggest enterprise issues that would get solved.

      Similar if they opened up a lot of their Solaris IP, instead of letting it die a slow death. Zones and LDOMs would be quite useful in Linux, even with it duplicating existing hypervisor functionality.

      • Hell will freeze over before Oracle do any good; their corporate culture and legacy has been toxic.
        • Hell will freeze over before Oracle do any good; their corporate culture and legacy has been toxic.

          I worked '97 - '07 for someone who is now effectively a VP at Oracle, and he's still as bad as the rest of them. When the Director who replaced him retired, this VP shows up, which wasn't surprising, but then spent the next several hours attempting to persuade me and my somewhat drunken coworkers and managers to throw out all the MS SQL, and 'invest' in Oracle. None of us wanted anything to do with Oracle, it's about as welcome as a STD.

          I fully believe that this guy would even push that crap at a funeral

      • by Penguinisto ( 415985 ) on Tuesday October 23, 2018 @01:20PM (#57524619) Journal

        They'd do a far better job of returning to customers' good graces by not being such totalitarian get-every-last-dime asshats about their licensing terms.

        Ever wonder why Oracle was so slow to get any traction in/among virtual machines?

        • by primebase ( 9535 )
          I agree about their sales teams - they can be real vultures. That being said, as someone who's worked with Oracle for decades now, I can tell you these two things: 1) They've never won a sale because their customer's love of the sales teams 2) If the sales folks are that universally reviled and they are ~still~ pushing 40% market share, that should tell you something about the capabilities and robustness of the product. Oracle clearly isn't a solution for every problem, and they seem to have totally miss
          • by anegg ( 1390659 )
            In my experience, the Oracle sales teams sell to the senior executives. Although the database product isn't complete crap, it was adopted far beyond its technical merits warranted when I worked with it (late 1980s). My Significant Other has worked with it extensively since then, and reports that Oracle and the Oracle layered products still hold an unholy fascination with the senior executives. She has spent 3 years demonstrating the flexibility and cost effectiveness of other products at her workplace, b
      • The funny thing is that Oracle could get back into many peoples' good graces. If they offered ZFS under the GPL and allowed it to become part of the default Linux kernel, this would be one of the biggest enterprise issues that would get solved.

        It's too late. If they had done this before BTRFS became production-worthy, it would have taken the air out of BTRFS. Now it's got momentum.

    • Between Java and their Enterprise platforms, if Oracle spent as much time listening and responding to their customers as they spent threatening them, they might be in a far better position today.

      Maybe Oracle needs one of those "Codes of Conduct", that seem to be the rage these days . . . ?

      Listening to customers is for startups . . . not for established market leaders. Their market dominance leads them to believe that their customers must listen to them.

    • by sjames ( 1099 ) on Tuesday October 23, 2018 @01:12PM (#57524567) Homepage Journal

      Oracle has simply overplayed their hand. For years, they have used the intrinsic difficulty of migrating as a tool to keep customers on-board in spite of constant abuse.

      They finally tightened the thumb screws one turn too tight and their customers have decided that the intrinsic pain of migration is less than the pain of staying with Oracle.

    • by nasch ( 598556 )

      people have moving ONTO their software from a competitor

      Does that actually happen though? I mean who would migrate to Oracle from something else at this point?

  • The outage underscores the challenge Amazon faces as it looks to move completely off Oracle's database by 2020, and how difficult it is to re-create that level of reliability. It also shows that Oracle's database is more efficient in some aspects than Amazon's rival software, a point that Oracle will likely emphasize during this week's annual OpenWorld conference in San Francisco.

    Nothing in the article really supports those conclusions.

    Was it due to some actual inferiority in "their own technology" (postgresql?), or was it just a migration issue?

  • Comment removed (Score:5, Insightful)

    by account_deleted ( 4530225 ) on Tuesday October 23, 2018 @12:55PM (#57524457)
    Comment removed based on user account deletion
    • by Anonymous Coward

      Likely as well, the $90K that this incident cost them is a rounding error in the total budget of the project, and the long term savings that the project will provide over the years, and additional monies coming in due to being able to now sell this as a services on their AWS platform.

      I am sure Amazon probably looses more money per year, maybe even month due do damages of product in shipment than this little mishap cost them.

    • $90K is likely similar to what the Oracle license costs them per day. If you think I'm joking, that's $30M/year - which wouldn't surprise me for a company the size of Amazon.

      • by Cederic ( 9623 )

        $30m/year could go just on Oracle Financials at their scale, let alone the database.

  • Oracle is a complete nightmare. I've ported several large databases off Oracle, and have spent to many years developing using Oracle. There were constant issues with Oracle. Reliable, please. Every month we were running into open bugs and submitting issues. All while paying obscene money for the privilege to use their products
  • It's certainly unsurpassed in the efficient manner in which it eats all available IT funding. What licensing scheme are they using to rip off their customers this year? By CPU cores? By clock speed? Both?

    Amazon could, obviously, have done a better job of testing before flipping the switch on a migration this big. It's not like the company is hurting for the money that could have been used to put together an appropriate environment to prevent a snafu like this.

    • by PincushionMan ( 1312913 ) on Tuesday October 23, 2018 @03:31PM (#57525517)
      Don't forgot, their new Java licensing scheme: Per physical core on the server side, and also by named user on the client side. $10 each. Yes, even if all the users use the workstation in shifts, they want to be paid 3 times or more. Combine that with the rapid deprecation of features (JavaFX, Java Web Start), and the Chrome catching version numbering scheme, and you have a recipe for disaster if you choose Java for any projects today. In fact, if you've done any development in Java, now might be the time to investigate alternative cross-platform technologies, like .NET.

      I cannot believe I just recommended .NET over Java. What's the world coming to? So, for clarification, is there any possibility that MS could pull an Oracle with .NET?
      • by swilver ( 617741 )

        There's OpenJDK. We've been running enterprise stuff on it since 2014, you don't need Oracle.

      • The problem with developing in .NET is that the developers are way more expensive. We can get teams of Java developers for only $17 per hour per developer, and there's plenty available waiting for a job. If we start up a C#/.NET project we have to pay up to $60 per hour per dev because the pool of available developers is near non existent.
  • Anyone who expected otherwise has not done a major migration. But once the move off of Oracle is complete, Amazon may be in a much better place.
  • HA HA, you thought your homebrew infrastructure was up to snuff.
  • Amazon having trouble rolling out a platform migration does not mean Oracle is a reliable platform. On the contrary, my experience is that due to the high licensing costs, many business forego implementing the replication and redundancy measures needed to make Oracle's db reliable. Amazon having trouble rolling out a platform migration only goes to show that scale makes such migrations difficult and underscore how important planning is in IT.
  • by Tablizer ( 95088 ) on Tuesday October 23, 2018 @03:05PM (#57525333) Journal

    It also shows that Oracle's database is more efficient in some aspects than Amazon's rival software

    Big databases usually require careful tuning to handle big loads. Could it be the new incarnation has yet to undergo such tuning? The new incarnation may also have a different trade-off profile such that the porting process moved operations mostly as-is instead of rebalance the trade-offs to fit the new host. Much of the Oracle DB tuning may be direct production experience, something the new incarnation won't have by definition.

    For a car analogy, suppose you are used to hauling big loads up the mountain in a Ford pickup truck. You switch to a Chevy truck and find your productivity drops. At first you blame the Chevy.

    After weeks of experience you find the Chevy less powerful at directly going over boulders; however, it's more maneuverable than the Ford such that you just learn to swerve around boulders instead of try to go over them. Once you get used to the Chevy, the haul time is roughly the same.

  • Larry Ellison taunts Amazon [cloudpro.co.uk] that they still use Oracle and can't do without them, thus ensuring that Amazon will stop at nothing to be rid of Oracle and him.

  • The outage underscores the challenge Amazon faces as it looks to move completely off Oracle's database by 2020, ...

    Not for long.

    and how difficult it is to re-create that level of reliability.

    Not for long.

    It also shows that Oracle's database is more efficient in some aspects than Amazon's rival software, ...

    Not for long.

    a point that Oracle will likely emphasize during this week's annual OpenWorld conference in San Francisco.

    No doubt, forever and ever.

  • Oh, look, a 'news' article paid for by Oracle.

So you think that money is the root of all evil. Have you ever asked what is the root of money? -- Ayn Rand

Working...