Big Data trends intensify storage business boom – DigiTimes

Big Data trends intensify storage business boom – DigiTimes

Big Data trends intensify storage business boom

Staff writer, DIGITIMES, Taipei [Tuesday 3 June 2014]

The cumulative amount of data produced worldwide is predicted to reach 40ZB by 2020, which is 50 times more than that of 2010 according to preliminary estimates. That means the amount of storage data will double once every two years, and the era of the unprecedented digital universe is now upon us. The massive data analysis opportunities that can be derived as a result will significantly increase data storage demand.

A total of 103,680 hours worth of YouTube videos are uploaded; Amazon sells up to 440,640 items; Google’s search engine provides five million search services (this seems low); Alibaba Group makes 12 million online transactions; and Facebook processes 2.5 billion messages each day.

These data are not given to emphasize the booming businesses of the relevant industries, but symbolize that the era of information explosion is approaching. These endless data contain information on consumer behavior and dispositions, and have profound analysis value. Even for a small company, if it can mine the gold from this ocean of data, the company may create innovation that will rock the market and propel it forward to become the market leader. Therefore, the Big Data analysis trend has been red hot in recent years and rivals even that of cloud computing.

However, there is no denying that if an organization wants to profit from Big Data, its existing IT system alone is not enough. Relevant systems must be optimized and expanded, and the process will inevitably increase the burden on the IT infrastructure. The organization must also increase its investments in 3S equipment (servers, switches, and storage devices). According to research institutes, enterprises will have to increase their investments in 3S products, and as a result storage devices will see a CAGR of more than 40%.

Why must you deploy storage devices if you want to take advantage of Big Data? The reason is simple, the explosive growth of information must be stored in order to be analyzed later; and if the data must be stored, they are bound to occupy a significant amount of space. When we take a general look at smartphones, tablets, desktop computers, and notebooks; their built-in storage capacity decreases as sizes become smaller and thinner. Therefore, the huge storage space requirements cannot be met with personal devices and must be satisfied by the hard drives of storage devices in server rooms.

With flexible scalability, scale-out NAS becomes the new hot-topic

The application of Big Data typically means mining data sources with commercial value, which usually refers to unmarked file-based non-structural information. In the past, these data have mostly not been within the realm of management of enterprise data centers. However, they are today, and the most obvious impact is that the amount of data has increased 10-fold, increasing the burden on IT infrastructure. Enterprises must do everything possible to reduce IT spending, such as adopt virtualization and cloud computing technologies in order to enhance the efficiency of IT resources.

As enterprises move towards cloud applications, they must satisfy the massive copying and mirroring demands of virtualization, which constitutes a heavy load on the storage system. At present, enterprises never use expensive FC SAN equipment to deal with huge Big Data management and cloud storage needs due to cost considerations. Therefore, networks that support a horizontal scale-out architecture with network attached storage (NAS) equipment are becoming increasingly popular. This is perhaps a worthwhile subject of development for the numerous NAS equipment suppliers in Taiwan.

Scale-out NAS uses the Cluster File System as its foundation. Multiple NAS controllers are combined to form a cluster architecture, with corresponding storage spaces behind each controller. Therefore, if a user encounters storage capacity or inadequate performance issues, the user can use these spaces to perform online expansion. In other words, the more NAS controller nodes, the larger the storage devices, the better the access performance, and the more the hardware can assist enterprises to overcome Big Data management challenges.

In contrast, traditional NAS device equipment that takes the longitudinal scale-up route also appears to have room for capacity to expand despite relatively greater restrictions. Most enterprise-class scale-up NAS devices contain one or two controller modules, connected to a JBOD storage expansion disk cabinet. Therefore, as long as more JBODs are connected to the back-end of a controller, storage capacity can be expanded. However, due to restrictions posed by the number of I/O ports, and I/O controller chip performance, the number of disks that this architecture can support is not limitless. Even if an enterprise was to adopt a high-end dual-controller NAS system, it would be difficult for the system to support more than one thousand disks. If Big Data management needs exceeded this limit, there is no other alternative except to deploy a second, third, or even fourth set of equipment; significantly increasing management complexity.

In terms of scale-out NAS systems, they generally provide value-added features such as tiered storage, Information Lifecycle Management (ILM), etc., in addition to relatively flexible expansion capabilities in order to satisfy automatic data migration demand. However, this also means that the file path may be modified often. To avoid user access problems, global namespace virtual path functionality is added to the scale-out NAS cluster environment. This means that a single namespace can be created across multiple controller nodes, and that the data input and output demands from the front-end can be broken up and executed in collaboration by multiple controller nodes.

There are only two methods for scale-out NAS controllers to connect to back-end devices. The first method is for the NAS controller node to connect to the disk equipment through a SAN network. This method provides greater expansion flexibility whereby the user can either expand the number of NAS controllers or the SAN disk device’s capacity, which can also facilitate IOPS performance. The second method is DAS connectivity. That is, the controller node individually connects to the local disk device (including SATA or SAS hard drives internally connected to the master host interface) and externally connects to a disk cabinet JBOD through a SAS HBA cascaded connection. Even though the second method is inferior to the first in terms of performance, costs are relatively lower.

Take Thecus as an example. Thecus has launched a creative daisy chain expansion method that is more oriented towards the abovementioned DAS connection architecture. If the N16000PRO 16-Bay model designed by Thecus for daisy chain connections is used as the master, it only requires a simple volume expansion method to subsequently use a 10GbE switch to stack another eight N16000PROs, and each N16000PRO can cascade connect to four more D16000 DASs. Therefore, this architecture can expand up to a maximum capacity of 2.5PB with minimal investment. This architecture can also improve data transfer rates and is quite suitable for Big Data scenarios.

The “applization” of NAS equipment can satisfy diverse application demand

The value of today’s NAS equipment for large-, medium-, or small-sized enterprises or SOHO studios is more than just lateral storage expansion capacity. Another obvious trait is the beginning of “applization”. That is, users can select and install applications, such as electronic bulletin boards, e-commerce, content management, security monitoring, or utilities, from the NAS vendor’s app store. Apps can enable the NAS to become more than just a storage device and to an extent play the role of an application server.

In addition, storage vendors also offer Enterprise File Sync and Share (EFSS) programs, a form of private cloud. These programs enable enterprise employees to enjoy flexible disk space allocations and realize cross-terminal device file synchronization to create a “the data follow me” effect. This is yet another high-profile development path.

The EFSS products already on the market all have multiple value. First, these products can operate synergistically with existing enterprise NAS devices and do not change the data access habits of the end-user, and can provide another backup mechanism as a result (when a user stores files to the NAS, the device will automatically synchronize to the EFSS device). Second these products can replace traditional FTP servers. When using group sharing, a file does not need to be copied multiple times in order to be transmitted to the various subjects in the group. Moreover, e-mail attachments can be converted to share links helping users save network bandwidth and mail server capacity. Finally, the products can create mobile access applications for companies and enable end-users to share links through their mobile devices, which can in turn be used to access the actual files.

In short, in order for enterprises to properly respond to the deferent levels of Big Data analysis processing needs such as structured data, non-structured data, or streaming data; the issues of how to balance performance and cost, as well as establish the most appropriate storage architecture are critical, and this area offers the most potential for storage equipment to show its benefits.

See original – 

Big Data trends intensify storage business boom – DigiTimes

Share this post