
At the heart of a File Area Network, lies a virtualization engine which has as its foundation some form of namespace providing an abstraction layer between the users and physical storage. Virtualisation is no longer an over-hyped buzzword, but it is having a positive impact on how organizations manage their infrastructure. To put it simply, it provides location-transparency across distributed and heterogeneous filesystems. Users would then be able to access their data logically, without having to remember the physical location of those files, very similar to how users access web pages on the Internet today. Once the namespace is implemented, the administrators are free to move any data behind the scene, in order to optimize, centrally manage, archive or simply migrate, without causing any disruption to users.
From a business perspective, virtualization in the file space allows for immense saving opportunities by centralized management, better storage utilization, easy migration, information lifecycle management, remote office consolidation and/or optimization and various other functionalities.
Now as any File Area Networking solution needs proper planning and changes the way how an organization organizes and manages their unstructured file data, it is imperative to carefully consider various aspects of a file virtualization solution.
1. Ease of use and implementation
Any solution must be simple to implement, with no overheads and pose no risk to the production environment. Companies would think twice about implementing a solution if it required reconfiguring the network or changing the security model or changing the way users access their data today. It should also offer an administrator-friendly and intuitive GUI to simplify everyday storage management tasks.
2. Scalability
As organizations grow, the namespace must be able to scale almost infinitely. If there are any restrictions, in hardware or software, that limits this growth, then the solution would not be viable in a large enterprise environment. Also from a performance standpoint, as the number of transactions grows, the namespace should not become a choking point. This is particularly true if a hardware device is inserted in the data path, ie between users and data.
3. Resiliency
As all access to data goes through the namespace, its resilience is of paramount importance. This can be achieved in various ways, depending on the solution. If this is an in-band solution, clustering the devices is critical. Careful consideration must be given to protecting the enterprise against a site disaster where the whole cluster could be out of action. In an out-of-band solution where the resilience is typically achieved through replication within the enterprise directory services (eg Active Directory), clustering would not be necessary, instead administrators must guarantee directory replication is happening at specified intervals.
4. Namespace Management
The solution must offer features in order to create, populate, maintain, audit and protect the namespace. If for any reason, the namespace changes, it is crucial that this is not only recorded, but the administrator is able to restore a last known good version with speed and ease. This is especially true with systems where the metadata resides on proprietary hardware.
As the namespace is a core enabling technology, the ability to manage it is absolutely key. For instance, the namespace should be synchronized with the changes in the physical storage and for better protection, the logical structure should be reproducible on another platform, if this cannot be done through directory services.
5. No Vendor Tie-in
The solution must not tie users to specific platforms. Companies often complain about being tied to one storage vendor and the complexity of moving away from that platform often means that the organization’s data is in effect held hostage. The virtualization solution must work with all industry-standard platforms and protocols and must allow for freedom of movement to and away from any storage platform. More importantly there should be no tie-in to the namespace provider and provision must be made for the ability to access the storage without having to go through the namespace. For instance, if a major bug is discovered in the OS kernel, the administrators should be able to bypass the namespace in order for users to gain direct access to their storage very quickly. This can be architecturally challenging with some of the solutions which dynamically spread files around according to user-defined criteria.
6. Non-disruptive Data Migration
Constant changes in technology coupled with an insatiable appetite for data forces companies to continually migrate or consolidate data. This is often a big challenge, as it is almost always accompanied by downtime and disruption. There are clear benefits if administrators are able to move data, with minimal disruption and at the same time guaranteeing the integrity of the data.
The virtualization solution should be able to move data by integrating tightly with the namespace, so that users are automatically redirected to the new target, hence avoiding any disruption or need to remap. Also, easy-to-use management tools for data migration projects are a key requirement.
7. Remote Data Consolidation
Maintaining multiple remote sites is not only costly and complex, but also reduces user productivity, as there is often a slow WAN link that users need to go across in order to access their data. Using a combination of namespace and WAFS technologies, it is possible to centralize the data, therefore removing infrastructure servers from the branches and at the same time offer a LAN-like performance to users across a WAN link. Customers should think about considering a vendor that offers not only the File-Virtualization technology in the data center, but also extensions to consolidate and optimise across geographical borders and slow WAN links.
8. Business Continuity
In today’s business environment, it is critical to ensure data is always available. As enterprises have to be fully prepared to deal with planned and unplanned outages, it is critical to consider not only how the data is protected, but also, in the event of a disaster, how quickly users would be able to gain access to that data.
In the event of a disaster, the whole process of failing over to a DR location must be as simple and seamless as possible. This is not the time for companies to discover that the script they had prepared for the failover needs to be modified for it to work. Therefore, the solution must offer a quick, cost-effective and transparent means of switching users over to the DR site, either automatically or at a press of a button.
9. Data Classification
Organisations often need to profile their data in order to assess its importance to the business. This is often the first step towards creating an ILM strategy where the data is aligned to the right platform. This is also a critical step if the organization has deployed a chargeback model and is therefore required to produce department-specific reports on usage of data, as well as capacity planning.
It is important that the reporting capability is at a logical level, rather than physical, as a department’s data may be spread across several physical locations and the process of collating multiple reports can be quite laborious.
10. Storage Tiering
Implementing a tiered storage architecture enables organizations to move less frequently accessed data to less expensive storage, maintaining the same user access model. The virtualization solution should offer policies which automates this process based on various criteria relevant to the organization.
11. Storage Optimisation
Statistics suggest that open systems storage devices have a utilization rate averaging 40%. The responsibility of identifying this imbalance often falls on the administrators’ shoulders and of course the ensuing task of redressing the balance and driving efficiencies.
The virtualization solution should be able to automatically identify utilization levels and load-balance across multiple devices, without impact to users.