Monday, February 18, 2019

Create a big file on ESXi 6.x with zero's written on it (Testing Purpose)

Many times for the testing purpose you need to create big capacity dummy files on the filesystems to check few features of disks/arrays or some kind of drivers. So here I will show you how you can create such a big file and also depending on your requirement if you need just an empty file or you need to zero it out, means writing zero's on it.

ESXi server has a inbuilt utility which makes this task easier, named "vmkfstool". This is the same utility which is also used for making new VMDK files for the VM's, and that's what we will be using for testing purpose:


Create 50GB large empty file:

# vmkfstools -c 50G /vmfs/volumes/esx_ds/dummy-fill/fill1.vmdk -d thin
Create: 100% done.


Create 50GB large file with zero's written on it:

# vmkfstools -c 50G /vmfs/volumes/esx_ds/dummy-fill/fill1.vmdk -d eagerzeroedthick
Creating disk '/vmfs/volumes/esx_ds/dummy-fill/fill1.vmdk' and zeroing it out...
Create: 100% done.

[root@CRT3-D-ESX6U3:~] ls -l /vmfs/volumes/esx_ds/dummy-fill/fill1*
-rw-------    1 root     root     1073741824 Feb 15 09:38 /vmfs/volumes/esx_ds/dummy-fill/fill1-flat.vmdk
-rw-------    1 root     root           491 Feb 15 09:38 /vmfs/volumes/esx_ds/dummy-fill/fill1.vmdk

Now if you observe here it will create two files

"fill1-flat.vmdk" : This is the default large virtual disk data file that is created when we create VMDK file and this should not be an RDM. When using thick disks, this file will be approximately the same size as what you specify when you create your VMDK file for VM.

"fill1.vmdk" : It's a small text disk descriptor file, which describes the size and geometry of the VMDK file. This descriptor file also contains a pointer to the large data file as well as information on the virtual disk drive sectors, heads, cylinders and disk adapter type.

In case if you want to delete or remove above VMDK file:

# vmkfstools -U /vmfs/volumes/esx_ds/dummy-fill/fill1.vmdk

This will delete both of the autogenerated files.

Create a big file on RHEL/CentOS with zero's written on it (Testing Purpose)

Many times for the testing purpose you need to create big capacity dummy files on the filesystems to check few features of disks/arrays or some kind of drivers. So here I will show you how you can create such a big file and also depending on your requirement if you need just an empty file or you need to zero it out, means writing zero's on it.

To create an empty 50GB file using dd utility:

# dd if=/dev/zero of=/dummy-fill/dummy_file.img bs=1 count=0 seek=50G
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.00042393 s, 0.0 kB/s

Here above command will create 50GB of the file named "dummy_file.img" inside "/dummy-fill/" folder. Where "53687091200" is in Bytes which is equivalent to 50GB.


To create a big 50GB file and to write zero's on it follow below commands;

# dd if=/dev/zero of=/dummy-fill/dummy_file.img count=1024 bs=52428800
1024+0 records in
1024+0 records out
53687091200 bytes (50 GB) copied, 58.52256 s, 426 MB/s

"bs" stands for block size and the count is nothing but a number of such blocks used to create this big dummy file. so If you convert this to Bytes by multiplying "BS * count" will give you the total size of the file in bytes.

52428800 * 1024 = 53687091200 Bytes  = 50GB

]# ls -l /dummy-fill/
total 1048580
-rw-r--r-- 1 root root 53687091200 Feb 15 15:05 dummy_file.img

This last command will take some time depending upon the file size it has to zero out and the RAM/CPU present on your server or VM.

Create a big file on Windows 2012R2 and 2016 with zero's written on it (Testing Purpose)

Many times for the testing purpose you need to create big capacity dummy files on the filesystems to check few features of disks/arrays or some kind of drivers. So here I will show you how you can create such a big file and also depending on your requirement if you need just an empty file or you need to zero it out, means writing zero's on it.


To create an empty 50GB file using Powershell:

PS C:\> fsutil file createnew d:\dummy_file.img 53687091200

Here above command will create 50GB of the file named "dummy_file.img" inside "D:" filesystem. Where "53687091200" is in Bytes which is equivalent to 50GB.


To create a big 50GB file and to write zero's on it follow below commands;

Creates a new file of a specified size:

PS C:\> fsutil file createnew d:\dummy_file.img 53687091200
File d:\fill1 is created


Set the valid data length for a file:

PS C:\> fsutil file setvaliddata d:\dummy_file.img 53687091200
Valid data length is changed


Set the zero data for a file:

PS C:\Users\Administrator> fsutil file setzerodata offset=0 length=53687091200 d:\dummy_file.img
Zero data is changed

This last command will take some time depending upon the file size it has to zero out and the RAM/CPU present on your server or VM.

Wednesday, February 13, 2019

Linux Kernel Architecture

A kernel is the core of an Operating System. It provides basic services for all other components of the OS. It is the main layer between the OS and hardware, and it helps with process and memory management, file systems, device control, and networking.

Following is Linux Kernel Architecture:


User Space is the memory area where all user mode applications works and this memory can be swapped out when needed. Userspace process normally runs in its own virtual memory space and unless explicitly requested, cannot access the memory of other processes. Due to the protection afforded by this sort of isolation, crashes in user mode are always recoverable.

Kernel Space is strictly reserved for running the kernel, OS background process, kernel extensions and device drivers. In Linux kernel space gives full access to the hardware, although some extensions runs in the user space. Crashes in kernel mode are catastrophic; they will halt the entire PC.

These two modes are enforced by the CPU hardware. If code executing in User mode attempts to do something outside, example accessing a privileged CPU instruction or modifying memory that it has no access to, then a trappable exception is thrown. Instead of your entire system crashing, only that particular application crashes.

x86 CPU hardware actually provides four protection rings: 0, 1, 2, and 3. Only rings 0 (Kernel) and 3 (User) are typically used.

Ring 0 : Kernel space code or instructions runs here.
Ring 1 : Usually used by Hypervisors, virtual machines, and a few drivers.
Ring 2 : Reserved for device drivers
Ring 3 : Userspace application runs here. It is the least privileged ring, having limited access or access to a subset of the processor instructions.

Zoning and its types

Zoning is an Fibre Channel switch function that enables node ports within the fabric to be logically segmented into groups and communicate with each other within the group. If zoning is not configured, the fabric controller sends an RSCN to all the nodes in the fabric. Involving the nodes that are not impacted by the change results in increased fabric-management traffic.

Zoning also provides access control, along with other access control mechanisms, such as LUN masking. Zoning provides control by allowing only the members in the same zone to establish communication with each other.

Types of Zoning:

Port zoning: Uses the physical address of switch ports to define zones. In port zoning, access to node is determined by the physical switch port to which a node is connected. The zone members are the port identifier (switch domain ID and port number) to which HBA and its targets (storage devices) are connected. If a node is moved to another switch port in the fabric, port zoning must be modified to allow the node, in its new port, to participate in its original zone. However, if an HBA or storage device port fails, an administrator just has to replace the failed device without changing the zoning configuration.

WWN zoning: Uses World Wide Names to define zones. The zone members are the unique WWN addresses of the HBA and its targets (storage devices). A major advantage of WWN zoning is its flexibility. WWN zoning allows nodes to be moved to another switch port in the fabric and maintain connectivity to its zone partners without having to modify the zone configuration. This is possible because the WWN is static to the node port.

Mixed zoning: Combines the qualities of both WWN zoning and port zoning. Using mixed zoning enables a specific node port to be tied to the WWN of another node.

Example of WWN zoning on Brocade and Cisco switches:

On Brocade Switch

Brocade:admin> zonecreate "brocade_zone1", "50:00:00:00:00:00:00:11; 10:00:00:00:00:00:00:11"
Brocade:admin> cfgadd "brocade_cfg", "brocade_zone1"
Brocade:admin> cfgenable brocade_cfg

On Cisco switch

switch# conf t
switch(config)# zoneset name cisco_cfg vsan 1
switch(config-zoneset)# zone name cisco_zone1
switch(config-zoneset-zone)# member pwwn 21:00:00:00:00:00:00:12
switch(config-zoneset-zone)# member pwwn 10:00:00:00:00:00:00:12
switch(config-zoneset-zone)# zone commit vsan 1
switch(config)# zoneset activate name cisco_cfg vsan 1

Fabric Login types in switched network

Fabric services have three login types as explained below:

FLOGI also known as Fabric Login. Performed between a N_Port and a F_Port. To log on to the fabric, a node sends a FLOGI frame with the WWNN and WWPN parameters to the login service at the pre-defined FC address FFFFFE (Fabric Login Server). In turn, the switch accepts the login and returns an Accept (ACC) frame with the assigned FC address for the node. Immediately after the FLOGI, the N_Port registers itself with the local Name Server on the switch, indicating its WWNN, WWPN, port type, class of service, assigned FC address and so on. After the N_Porth as logged in, it can query the name server database for information about all other logged in ports.

PLOGI also known as Port Login. Performed between two N_Ports to establish a session. The initiator N_Portsends a PLOGI request frame to the target N_Port, which accepts it. The target N_Port returns an ACC to the initiator N_Port. Next, the N_Ports exchange service parameters relevant to the session.

PRLI also known as Process Login. performed between two N_Ports. This login relates to the FC-4 ULPs, such as SCSI. If the ULP is SCSI, N_Portsexchange SCSI-related service parameters.

Fabric Services

Fibre Channel switches, regardless of the manufacturer, provide a common set of services as defined in the Fibre Channel standards. These services are available at certain pre-defined addresses. Some of these services are Fabric Login Server, Fabric Controller, Name Server, and Management Server.

The Fabric Login Server is located at the predefined address of FFFFFE and is used during the initial part of the node’s fabric login process.

The Name Server (also known as Distributed Name Server) is located at the predefined address FFFFFC and is responsible for name registration and management of node ports. Each switch exchanges its Name Server information with other switches in the fabric to maintain a synchronized, distributed name service.

Fabric Controller located at the predefined address FFFFFD. The Fabric Controller provides services to both node ports and other switches. The Fabric Controller is responsible for managing and distributing Registered State Change Notifications (RSCNs) to the node ports registered with the Fabric Controller. If there is a change in the fabric, RSCNs are sent out by a switch to the attached node ports. The Fabric Controller also generates Switch Registered State Change Notifications (SW-RSCNs) to every other domain (switch) in the fabric. These RSCNs keep the name server up-to-date on all switches in the fabric.

Management Server located at the predefined address FFFFFA. The Management Server is distributed to every switch within the fabric. The Management Server enables the FC SAN management software to retrieve information and administer the fabric.

Fibre Channel Layers

It is easier to understand a communication protocol by viewing it as a structure of independent layers. Fibre Channel Protocol defines the communication protocol in five layers: FC-0 through FC-4 (except FC-3 layer, which is not implemented).

FC-0 Layer is the lowest layer in the Fibre Channel Protocol stack. This layer defines the physical interface, media, and transmission of bits. The FC-0 specification includes cables, connectors, and optical and electrical parameters for a variety of data rates. The Fibre Channel transmission can use both electrical and optical media.

FC-1 Layer defines how data is encoded prior to transmission and decoded upon receipt. At the transmitter node, an 8-bit character is encoded into a 10-bit transmission character. This character is then transmitted to the receiver node. At the receiver node, the 10-bit character is passed to the FC-1 layer, which decodes the 10-bit character into the original 8-bit character. Fibre Channel links with speed 10 Gbps and above use 64-bit to 66-bit encoding algorithm. This layer also defines the transmission words such as Fibre Channel frame delimiters, which identify the start and end of a frame and primitive signals that indicate events at a transmitting port. In addition to these, the FC-1 layer performs link initialization and error recovery.

FC-2 Layer provides Fibre Channel addressing, structure, and organization of data (frames, sequences, and exchanges). It also defines fabric services, classes of service, flow control, and routing.

FC-4 Layer is the uppermost layer in the Fibre Channel Protocol stack. This layer defines the application interfaces and the way Upper Layer Protocols (ULPs) are mapped to the lower Fibre Channel layers. The Fibre Channel standard defines several protocols that can operate on the FC-4 layer. Some of the protocols include SCSI, High-Performance Parallel Interface (HIPPI) Framing Protocol, Enterprise Storage Connectivity (ESCON), Asynchronous Transfer Mode (ATM), and IP.



Types of Switched Fabric ports

Ports reserved in a switched fabric can be one of the following types:

N_Port is an endpoint in the fabric. This port is also known as the node port. Typically, it is a host port (HBA) or a storage array port that is connected to a switch in a switched fabric.

E_Port is a port that forms the connection between two FC switches. This port is also known as the expansion port. The E_Port on an FC switch connects to the E_Port of another FC switch in the fabric ISLs.

F_Port is a port on a switch that connects to a N_Port, which is also known as a fabric port.

G_Port is a generic port on a switch that can operate as an E_Port or a F_Port and determines its functionality automatically during initialization.


Identifying major and minor numbers for a block device in Linux

One of the basic features of the Linux kernel is that it abstracts the handling of devices. All hardware devices look like regular files; they can be opened, closed, read and written using the same, standard, system calls that are used to manipulate files. Every device in the system is represented by a file. For block (disk) and character devices, these device files are created by the mknod command and they describe the device using major and minor device numbers.

The kernel needs to be told how to access the device. Not only does the kernel need to be told what kind of device is being accessed but also any special information, such as the partition number if it's a hard disk or density if it's a floppy, for example. This is accomplished by the major number and minor number of that device.

Major Numbers: All devices controlled by the same device driver have a common major device number. The major number is actually the offset into the kernel's device driver table, which tells the kernel what kind of device it is (whether it is a hard disk or a serial terminal).

Minor Numbers: The minor number tells the kernel special characteristics of the device to be accessed. For example, the second hard disk has a different minor number than the first. The COM1 port has a different minor number than the COM2 port, each partition on the primary IDE disk has a different minor device number, and so forth. So, for example, /dev/hda2, the second partition of the primary IDE disk has a major number of 3 and a minor number of 2.

[root@RHEL610 ~]# ls -l /dev/sd*
brw-rw---- 1 root disk  8,  16 Feb  8 17:47 /dev/sdb
brw-rw---- 1 root disk  8,  32 Feb 11 14:32 /dev/sdc
brw-rw---- 1 root disk  8,  48 Feb 11 15:56 /dev/sdd

Note: In the above output "8" is major number for sdb/sdc/sdd devices. Minor number for sdb is "16", "32" for sdc and "48" for sdd block device.

The major numbers for SCSI and IDE disks are fixed:

SCSI disks (/dev/sd*) major number is 8.
IDE disks (/dev/hd*) major number is 3.

What are Udev, HAL, Dbus and Netlink ?

Hot-plugging (which is the word used to describe the process of inserting devices into a running system) is achieved in a Linux distribution by a combination of three components: Udev, HAL, and Dbus.

Udev is a userspace daemon, that supplies a dynamic device directory containing only the nodes for devices which are connected to the system. It creates or removes the device node files in the /dev directory as they are plugged in or taken out. Dbus is like a system bus which is used for inter-process communication. The HAL gets information from the Udev service, when a device is attached to the system and it creates an XML representation of that device. It then notifies the corresponding desktop application like Nautilus through the Dbus and Nautilus will open the mounted device files.

Dbus is an IPC mechanism, which allows applications to register for system device events.

Udev is the device manager for the Linux 2.6 kernel that creates/removes device nodes in the /dev directory dynamically. It runs in userspace and the user can change device names using Udev rules.

Udev depends on the sysfs file system which was introduced in the 2.5 kernel. It is sysfs which makes devices visible in user space. When a device is added or removed, kernel events are produced which will notify Udev in userspace. Udev directly listens to Netlink socket to know about device state change events (kernel uevents).