CowboyRobot writes: Lucian Carata and colleagues at the University of Cambridge describe the importance of knowing where data comes from and what processes may have altered it. In "A Primer on Provenance", they describe how sometimes the transformations applied to data are not directly controlled by or even known to developers, and information about a result is lost when no provenance is recorded, making it harder to assess quality or reproducibility. Computing is becoming pervasive, and the need for guarantees about it being dependable will only aggravate those problems; treating provenance as a first-class citizen in data processing represents a possible solution.
CowboyRobot writes: The Internet relies heavily on two protocols. In the network layer, IP (Internet Protocol) provides an unreliable datagram service and ensures that any host can exchange packets with any other host. Since its creation in the 1970s, IP has seen the addition of several features, including multicast, IPsec (IP security), and QoS (quality of service). The latest revision, IPv6 (IP version 6), supports 16-byte addresses. The second major protocol is TCP (Transmission Control Protocol), which operates in the transport layer and provides a reliable bytestream service on top of IP. TCP has evolved continuously since the first experiments in research networks. MPTCP is a major extension to TCP. By decoupling TCP from IP, TCP is at last able to support multihomed hosts. With the growing importance of wireless networks, multihoming is becoming the norm instead of the exception. Smartphones and data centers are the first use cases where MPTCP can provide benefits.
CowboyRobot writes: Storage systems continue to lay the foundation for modern Internet services such as Web search, e-commerce, and social networking. Pressures caused by rapidly growing user bases and data sets have driven system designs away from conventional centralized databases and toward more scalable distributed solutions, including simple NoSQL key-value storage systems, as well as more elaborate NewSQL databases that support transactions at scale.
Eventual consistency is increasingly viewed as a spectrum of behaviors that can be quantified along various dimensions, rather than a binary property that a storage system either satisfies or fails to satisfy. Advances in characterizing and verifying these behaviors will enable service providers to offer an increasingly rich set of service levels of differentiated performance, ultimately improving the end user's experience.
CowboyRobot writes: Writing for ACM's Queue magazine, Paul Vixie argues, "The edge of the Internet is an unruly place." By design, the Internet core is stupid, and the edge is smart. This design decision has enabled the Internet's wildcat growth, since without complexity the core can grow at the speed of demand. On the downside, the decision to put all smartness at the edge means we're at the mercy of scale when it comes to the quality of the Internet's aggregate traffic load. Not all device and software builders have the skills and budgets that something the size of the Internet deserves. Furthermore, the resiliency of the Internet means that a device or program that gets something importantly wrong about Internet communication stands a pretty good chance of working "well enough" in spite of this. Witness the endless stream of patches and vulnerability announcements from the vendors of literally every smartphone, laptop, or desktop operating system and application. Bad guys have the time, skills, and motivation to study edge devices for weaknesses, and they are finding as many weaknesses as they need to inject malicious code into our precious devices where they can then copy our data, modify our installed software, spy on us, and steal our identities.
CowboyRobot writes: Hard on the heels of a recent move to make its Watson supercomputer a service that developers can invoke via RESTful APIs, IBM is now making $100 million available to help developers build cognitive computing applications that can run on top of Watson. The funding is part of a larger $1 billion investment that IBM is making to create a formal Watson Group and is designed to encourage independent software vendors to become part of an ecosystem that IBM is trying to build around Watson.
CowboyRobot writes: BitPay, virtual currency payment processor, has announced a beta trial for its new Bitcoin Payroll API. The API allows employers to offer a portion of employee pay in bitcoin.Targeted at employers and payroll service providers, the API allows employees to opt-in to receive a payroll deduction in bitcoin on a recurring basis.
CowboyRobot writes: How an API was designed and implemented is usually of little interest to consumers of the thousands of APIs available today. What matters most is that an API works, is easy to integrate and solves a real problem. But for the growing number of companies looking to take advantage of the booming API economy and considering developing APIs, design is an important subject. Historically, API design has been a consumer-first exercise: a company develops a data-rich application and at some point, decides to build an API through which that data can be accessed. In recent years, however, more and more companies have opted for an API-first approach under which APIs are designed, implemented and documented before the application that will consume them even exists. “Don’t be afraid to break the structure of your API” in the early days... “Find a few key customers that are willing to work with you and keep in constant communication with them. This will allow you to change the API given their comments.”
CowboyRobot writes: What if all the software layers in a virtual appliance were compiled within the same safe, high-level language framework? Cloud computing has been pioneering the business of renting computing resources in large data centers to multiple (and possibly competing) tenants. The basic enabling technology for the cloud is operating-system virtualization such as Xen1 or VMWare, which allows customers to multiplex VMs (virtual machines) on a shared cluster of physical machines. Each VM presents as a self-contained computer, booting a standard operating-system kernel and running unmodified applications just as if it were executing on a physical machine. While operating-system virtualization is undeniably useful, it adds yet another layer to an already highly layered software stack now including: support for old physical protocols (e.g., disk standards developed in the 1980s, such as IDE); irrelevant optimizations (e.g., disk elevator algorithms on SSD drives); backward-compatible interfaces (e.g., POSIX); user-space processes and threads (in addition to VMs on a hypervisor); and managed-code runtimes (e.g., OCaml,.NET, or Java). Are we really doomed to adding new layers of indirection and abstraction every few years, leaving future generations of programmers to become virtual archaeologists as they dig through hundreds of layers of software emulation to debug even the simplest applications?
CowboyRobot writes: Andrew Koenig at Dr. Dobb's argues that by looking at a program's structure — as opposed to only looking at output — we can sometimes predict circumstances in which it is particularly likely to fail. "For example, any time a program decides to use one or two (or more) algorithms depending on an aspect of its input such as size, we should verify that it works properly as close as possible to the decision boundary on both sides. I've seen quite a few programs that impose arbitrary length limits on, say, the size of an input line or the length of a name. I've also seen far too many such programs that fail when they are presented with input that fits the limit exactly, or is one greater (or less) than the limit. If you know by inspecting the code what those limits are, it is much easier to test for cases near the limits."
CowboyRobot writes: If you shop carefully online, you can buy a general purpose enterprise SSD, such as Intel’s DC S3700 for about $2.65/GB or a read oriented drive like the Intel DC S3500 for $1.30/GB. By comparison, a 4TB nearline SATA hard disk such as Western Digital’s RE or Seagate’s Constellation cost under $400 or $0.09/GB. Interestingly, consumer/laptop SSDs are well below the magic $1/GB level with Crucial’s M500 selling for about $0.59/GB — about what hard drives cost in 2005. If we assume that SSD prices will fall at their historical 35% annual rate and hard drive prices will fall at a more conservative 15% by 2020, the enterprise SSD will cost almost 13 cents a gigabyte, more than the hard drive costs today, while the 20TB drives the hard drive vendors are promising for 2020 will cost under 3 cents a GB. The price difference will have shrunk from 30:1 to around 5:1. If drive prices fall at a closer to historical 25%, they’ll still be a tenth the cost of SSDs at the end of the decade.
CowboyRobot writes: Google opened its newest data centers earlier this month in Taiwan and Singapore, setting up the Internet giant to capitalize on one of the Internet's fastest growing regions. "While we've been busy building, the growth in Asia's Internet has been amazing," wrote Joe Kava, the company's VP of data centers. "Between July and September of this year alone, more than 60 million people in Asia landed on the mobile Internet for the first time. That's almost two Canadas, or three Australias." Meanwhile, Mike Matchett, a Taneja Group senior analyst, said that it may be more than just a matter of cloud strategy; it could be Google's way of protecting users in far-off lands from the kind of snooping to which Americans have been subjected. "In light of the Snowden revelations, we would expect companies to require more and more of their data to stay local."
CowboyRobot writes: The adoption of REST as the predominant method to build public APIs has over-shadowed any other API technology or approach in recent years. Although several alternatives (mainly SOAP) are still (very) prevalent in the enterprise, the early adopters of the API movement have taken a definitive stance against them and opted for REST as their approach and JSON as their preferred message format. But while REST is still the poster child of the API movement, there are a number of initiatives, technologies and discussions that are starting to nibble at the crust of the REST de-facto standard. These include: Asynchronous APIs, Orchestration / Experience APIs, the distinction between SDKs vs APIs, and Binary protocols. Common for these is the need for an interface definition in a protocol-specific format, which is then processed by included tools to generate code for supported languages. All have support for most popular languages today.
CowboyRobot writes: One month after halting sales of the HP Chromebook 11 following complaints about charger failures, Google and HP, in conjunction with the US Consumer Product Safety Commission, have issued a recall of the micro-USB charger that comes with the device. The recall notice says that Google has received nine reports about these chargers overheating and melting. Though no fires have been started as a result of the malfunction, one person reported minor burns, and another reported minor damage to a pillow. The recall affects about 145,000 units, a figure that reflects distribution, rather than sales.