Title of Invention

A HOST AND METHOD OF VALIDATING AN ACCESS REQUEST TO A HOST

Abstract A host may be coupled to a switched fabric and include a processor, a host memory coupled to the processor and a host-fabric adapter coupled to the host memory and the processor and be provided to interface with the switched fabric. The host-fabric adapter accesses a translation and protection table from the host memory for a data transaction. The translation and protection table entries include a region identifier field and a protection domain field used to validate an access request.
Full Text FORM 2
THE PATENTS ACT, 1970 (39 of 1970)
COMPLETE SPECIFICATION (See Section 10, rule 13)
A HOST AND METHOD OF VALIDATING AN ACCESS REQUEST TO A HOST
INTEL CORPORATION of 22 0 0 MISSION COLLEGE BOULEVARD, P.O. BOX 58119 SANTA CLARA,CA 95052 U.S.A., AMERICAN Company
The following specification particularly describes the nature of the invention and the manner in which it is to be performed : -

TRANSLATION AND PROTECTION
TABLE AND METHOD OF USING THE SAME
TO VALIDATE ACCESS REQUESTS
Technical Field
The present invention relates to a data network, and more particularly, relates to the arrangement and use of a region identifier field provided in translation entries of a translation and protection table (TPT).
Background
In disadvantageous network architectures, the operating system (OS) virtualizes network hardware into a set of logical communication endpoints and multiplexes access to the hardware among these endpoints (e.g., computers, servers and/or I/O devices). The operating system (OS) may also implement protocols that make communication between connected endpoints reliable (e.g., transmission control protocol, TCP).
Generally, the operating system (OS) receives a request to send a message (data) and a virtual address that specifies the location of the data associated with the message, copies the message into a message buffer and translates the virtual address. The OS then schedules a memory copy operation to copy data from the message buffer memory to a target device. A translation and protection table (TPl) may be used to translate the virtual address, received in the form of descriptors, into physical addresses and to defme memory regions before a host network adapter can access them (e.g., for transfer to/trom a remote device) during data transfer (movement) operations. There is a need for a more efficient technique of using and accessing the translation and protection table (TPT) to perform virtual-to-physical address translations while providing additional memory access protection during data

tiansfer operations.
Accordingly there is provided, a host comprising:
a processor;
a host memory coupled to said processor; and
a host-fabric adapter coupled to said processor and provided to interface with a
switched fabric including one or more fabric-attached I/O controllers, the host-fabric
adapter including logic for accessing a selected translation and protection table from
said host memory for a data tiansaction, said tianslation and protection table
including at least on entry having a region identifier filed used to validate an access
request and a protection domain filed also used to vaHdate an access request.
The host wherein said region identifier filed comprises a key portion and a handle portion, said handle portion being related to a location of said entry in said tianslation and protection table.
The host wherein said host memory comprises a first memory region and a second memory region and wherein each entry in said tianslation and protection table associated with said first memory region includes a first region identifier in said region identifier field and each entry in said tianslation and protection table associated with said second memory region includes a second region identifier, said first region identifier being different than said second region identifier.
A method of validating an access request to a host, said host being coupled to a switched fabric and including a processor, a host memory coupled to the processor and a host-fabric adapter coupled to the processor and provided to interface with the switched fabric, the method comprising:
accessing a selected tianslation and projection table from said host memory for a data tiansaction, said tianslation and protection table including at least one entry having a region identifier field and a protection domain field;

Comparing a first protection domain with a second protection domain provided in said protection domain field of said entry to validate said access request; and Comparing a first region identifier with a second region identifier provided in said region identifier field of said entiy to validate said access request.
BRIEF DESCRIPTION OF THE DRAWINGS
A more complete appreciation of example embodiments of the present invention) and many of the attendant advantages of the present invention) will become readily apparent as the same becomes better understood by reference to the following detailed description when considered in conjunction with the accomipanying drawings in which like reference symbols indicate the same or similar components) wherein:
FIG. 1 illustiates an example data network according to an embodiment of the
present invention;
FIG. 2 illustiates a block diagram of a host of an example data network according to
an embodiment of the present invention;
FIG. 3 illustrates a block diagram of a host of an example data network according to
another embodiment of the present invention;
FIG. 4 illustiates an example software driver stack of a host of an example data
network according to an embodiment of the present invention;
FIG. 5 illustiates an example translation and protection table;
FIG. 6 illustiates an example tianslation and protection table;
FIG. 7 illustiates an example tianslation and protection table entry according to the
present invention;
FIG. 8 iUustiates one example embodiment of how the region identifier field may be
created in accordance with the present invention;
FIGS. 9A and 9B iUustiate examples of descriptors;

" FIG. 10 illustrates an example send processing technique according to the,present
I
invention; and
FIG. 11 illustrates an example write processing technique according to the present invention. 5
DETAILED DESCRIPTION
The present invention is applicable for use with all types of data networks and clusters designed to link together computers, servers, peripherals, storage devices, and communication devices for communications. Examples of such data networks may
10 include a local area network (LAN), a wide area network (WAN), a campus area network (CAN), a metropolitan area network (MAN), a global area network (^GAN), a storage area network and a system ai"ea network (SAN), including newly developed data networks using Next Generation I/O (NGIO), Future 1/0 (FIO), Inflniband and Server Net and those networks which may become available as computer technology develops in the future.
15 LAN system may include Ethernet, FDDI (Fiber Distributed Data Interface) Token Ring LAN, Asynchronous Transfer Mode (ATM) LAN, Fiber Channel, and Wireless LAN, However, for the sake of simplicity, discussions will concentrate mainly on exemplary use of a simple data network having several example hosts and I/O units including I/O controllers that are linlced together by an interconnection fabric, althcjiugh the scope of the
20 present invention is not limited thereto.
Attention now is directed to the drawings and particiiiariy to FIp. 1, an example
data network having several interconnected endpoints (nodes) for data communications is
I illustrated. As shown in FIG. 1, the data network 100 may include, for example, an
1 interconnection fabfric (hereinafter referred to as Aswitched fabrics) 102 of one or more

switches A, B and C and corresponding physical links, and several ei^dpoints (nodes)
I
which may correspond to one or more I/O units 1 and 2\ computers and servers such as,
1 I
for example, host 110 and host 112. 1/0 unit 1 may include 6nc or niorp controllers
I
f
connected thereto, including I/O controller 1 (lOCl) and I/O controller 2 (IOC2). 5 Likewise, I/O unit 2 may include an I/O coi|troller 3 (IC03) connected thereto. Each I/O controller 1, 2 and 3 |(I0C1,10C2 and I0C3) may operate to control one or more I/O
I
devices, for example, I/O controller 1 (lOCl) of the VO unit 1 may be connected to VO
device 122, while I/(|) controller 2 (I0C2) may be connected to I/O device 124. Similarly, I
I/O controller 3 (I0C3) of the I/O unit 2 may be connected to I/O devices 132 and il34.
I
I
10 The I/O devices may be any of several types of I/O devices, such as storage deV|ices (e.g.,
a hard disk drive, tape drive) or other I/O device.
The hosts ind I/O units including attached I/O controllers and I/O devices may be
organized into groups Icnown as clusters, with each cluster including one or more hosts
and typically one or more I/O units (each I/O unit including one or more f/0 controllers).
15 The"hosts and I/O units may be interconnected via a switched fabric 102, which is a
I collection of switches A, B and C and colresponding physical links connected between the
switches A, B and C.
In addition, each I/O unit includes one or more I/O controller-fabric (IOC-fabric) adapters for interfacing between the switched fabric 102 and the I/O controllers (e.g., 20 lOCl, I0C2 and I0C3). For example, IOC-fabric adapter 120 may interface the I/O controllers 1 and 2 (IOC 1 and IOC2) of the I/O unit 1 to the switched fabric 102, while IOC-fabric adapter 130 interfaces the I/O controller 3 (IOCS) of the I/O unit 2 to the switched fabric 102.
The specific number and arrangement of hosts, I/O units, I/O controllers, I/O

devices, switches and links shown in FIG. 1 are provided simply as an example data network. A wide variety of implementations and arrangements of any number of hosts, 1/0 units, I/O controllers, I/O devices, switches and links in nil types of data networks may be possible.
An example embodiment of a host (e.g., host 110 or host .112) may be shown in FIG. 2. Referring to FIG. 2, a host 110 nky include a processor 202 coupled to a host bus 203. An I/O and memory controller 204 (or chipset) may be connected to the host bus
1
203. A main memory 206 may be connected to the I/O and memory dontroller 204. An I/O bridge 208 may operate to bridge or interface between the I/O and memory controller 204 and an I/O bus 205. Seveiral I/O controllers may be attached to I/O bus 205, including I/O controllers 210 and 212. I/O controllers 210 and 212 (including any I/O devices connected thereto) may provide bus-based I/O resources.
One or more host-fabric adapters 220 may also be connected to the I/O bus 205. Alternatively, the host-fabric adapter 220 may be connected directly to the I/O and memory controller (or chipset) 204 to avoid the inherent limitations of the I/O bus 205 (see FIG. 3). In either situation, the host-fabric adapter 220 may be considered to be a type of a network interface card (e.g., NIC which usually includes hardware and firmware) for interfacing the. host 110 to a switched fabric 102. The host-fabric adapter 220 may be utilized to provide fabric communication capabilities for the host 110. For example, the host-fabric adapter 220 converts data between a host format and a format that is compatible with the switched fabric 102. For data sent from the host 110, the host-fabric adapter 220 formats the data into one or more packets containing a sequence of one or more cells including header information and data information.
According to one example embodiinjent or implementation, the hosts or I/O units of


the data network of the present invention may be compatible with an infmiband architecture. Infiniband information/specifications are presently under development and will be {iublisl)ed by the Infiniband Trade Association (formed August 27, 1999) having the Iriternet address of http://www.Infinibandta.org. The hosts of I/O units of the data network may also be compatible with the "Next Generation Input/Output (NGIO) Specification" as set forth by the NGIO Forum on March 26, 1999. The host-fabric
adapter 220 may be a Host Channel Adapter/HCA), and the lOC-faoric adapters may be
I
Target Channel Adapters (TCA). The host channel adapter (HCA) may be used to provide
1
I
an interface between the host 110 or 112 and the switched fabric 102 via high speed serial
I
links. Similarly, "target channel adapters (TCA) may be used to provide an interface .
I between the switched fabric 102 knd the I/O controller of either an I/O unit 1 or 2, or
I
another network, including, but not limited to, local area network (LAN), wide area network (WAN), Ethernet, ATM and fibre channel network, via high speed serial links. Both the host channel adapter (HCA) and the target channel adapter (TCA) may be implemented in the Infiniband architecture or in compliance with "Next Generation I/O Architecture: Host Channel Adapter Software Specification, Revision 1.0" as set forth by Intel Corp., on May 13, 1999". In addition, each host may contain one or more host-fabric adapters (e.g., HCAs). However, Infiniband andNGIO are merely example embodiments or implementations of the present invention, and the invention is not limited thereto. Rather, the present invention may be applicable to a wide variety of data networks, hosts and I/O controllers.
As described with reference to FlGs. 2-3, the I/O units and respective I/O controllers may be connected directly to the switched fabric 102 rather than as part of a host 110. For example, I/O unit 1 including I/O controllers 1 and 2 (lOCl and I0C2) and

I/O unit 2 including an I/O controller 3 (I0C3) may be directly (or inclependently) connected to the switched fabric 102. In other words, the I/O units (and their connected I/O controllers and I/O devices) arc attached as separate and indepentlent i/O resources to the switehed fabric 102 as shown in FIGs. 1-3, as opposed to being part of a host 110. As a result", I/O ynits including 1/0 controllers (and I/O devices) connected to the switched fabric 102 may be flexibly assigned to one or more hosts (rather than having a
predetermined or fixed host assignment based upon being physically connected to the
I host=s local I/O bus). The I/O units, I/O controllers and I/O devices which ar the switched fabric 102 may be referred to ^s fabric-attached I/O resources (i.e., fabric-
i
attached I/O units, fabric-attached I/O controllers and fabric-attached I/O devices) because
these components are directly attached to the switched fabric 102 rather ihan being
1
connected as part of a host.i
In addition, the host 1 lOlmay detect and then directly address and exchange data
I
with I/O units and I/O controllers (and attached I/O devices) which are directly attached to the switched fabric 102 (i.e., the fabric-attached I/O controllers), via the host-fabric-adapter 220. A software driver stack for the host-fabric adapter 220 may be provided to allow host 110 to exchange data whh remote I/O controllers and I/O devices via the switched fabric 102, while preferably being compatible with many currently available operating systems, such as Windows 2000. The host-fabric adapter 220 may include an internal cache 222,
FIG. 4 illustrates an example software driver stack of a host 110 havirig fabric-attuchcd 1/0 resources according to an example embodiment of the present invention. As shown in FIG. 4, the host operating system (OS) 400 includes a kernel 410, an I/O manager 420, and a plurality of I/O controller drivers for interfacing to various I/O


controllers,, including I/O controller drivers 430 and 432, According to an exan^ple embodiment, the host operating system (OS) 400 may be Windows 2000, and the I/O , manager 420 may be a Plug-n-Play manager.
In addition, a fabric adapter driver software module may be provided to access the 5 switched fabric 102 and information about fabric configuration, fabric topology and connection information. Such a driver software module may include a fabric bus driver (upper driver) 440 and a fabric adapter device driver (lower driver) 442 utilized to establish communication with a target fabric-attached agent (e.g., I/O controller), and perform functions common to most drivers, including, for example; channel abstraction,
10 send/receive 10 transaction messages, remote direct memory access (RDMA) transactions (e.g., read and write operations), queue management, memory registration, descriptor management, message flow control, and transient error handling and recovery. Such software module may be provided on a tangible medium, such as a floppy disk or compact disk (CD) ROM, or via Internet downloads, which may be available for plug-in or
15 download into the host operating system (OS) or any other viable method.
The host 110 may communicate with I/O units and I/O controllers (and attached I/O devices) which are directly attached to the switched fabric 102 (i.e., the fabric-attached I/O controllers) using a Virtual Interface (VI) architecture. Under the "Virtual Interface (VI) Architecture Specification, Version I.O," as set forth by Compaq Corp., Intel Corp.,
20 and Microsoft Corp., on December 1^, 1997, the VI architecture comprises four basic components: virtual interface (VI) of pairsj of works queues (send queue and receive
queue), VI consumer which may be an application program, VI provider which may be
I
hardware and software components responsible for instantiating VI, and completion queue (CQ). VI ip the mechanism that allows VI consumers to directly access a VI provider.

U(
Each VI represents a communication endpoint, and endpoint pairs may be logically
connected to support bi-directional, point-to-point data transfer. Under the VI
architecture, the host-fabric adapter 220 and VI kernel agent may constitute the VI
provider to perform endpoint virtualization directly and subsume the tasks of multiplexing,
5 de-muitipiexing, and data transfer scheduling normally performed fay the host operating
system (OS) kernel 410 and device driver 442 as showin in FIG. 4.
I
I
, The tranlslation and protection table (TPT) 230 shown in FIG. 5 toay be used to
translate virtual addresses, received in ,a form of packet descriptors (e.^., a data structure
that describes a request to move data), into {physical addresses and to define memory
10 regions of the host memory 206 that may be accessed by the host-fabric adapter 220
f (validate access to host memory). In addition, the translation and protection table (TPT)
230 may alsp be used to validate access pennission rights of the host-fabric adapter 220
and to perform address translation before accessing any other memory in the host 110.
The translation and protection table (TPT) 230 may contain a plurality or TPT entries, for
1 ^ example, TPT(O), TPT(l) ... TPT(t-2) and TPT(t-l), in the system memory address space.
Each TPT entry lalso called translation entry) may represent a single page of the host
memory 206, typically 4KB of physically contiguous host memory 206. The TPT table
t 230 may be stored within the host memory 206 or it may be stored in a different memory
area of the host 110.
20 Figure 6 illustrates another translation and protection table (TPT) 240 that may be
used to translate virtual addresses into physical addresses. As discussed above, the translation and protection table 240 may validate access permission rights of the host-fabric adapter 220 and perform address translation before accessing any other memory in the host 110. Each translation and protection table 240 may contain a plurality of entries


that are associated with virtual buffers. For the example shown in Figure 6, three virtual buffers may be associated with the translation protection table 240, namely virtual buffer A(VBa), virtual buffer B(VBb) and virtual buffer C(VBc), Each translation entry in the example of Fig. 6 may correspond to one page of a virtual buffer, or 4 KB of data. For this example, virtual buffer A includes 8 is of data, virtual buffer B includes 12 KB of
I
data and virtual buffer C includes 12 KB of data. Accordingly, the translation protection
I
table 240 includes entries 244 and 246 for the addresses of page 1 and page 2 of virtual
buffer A, respectively. The translation and protection table 240 also includes entries 248,
I 250" and 252 for the addresses of page 1, page 2 and page 3 of virtual buffer B,
respectivejly. The translation and protection table 240 fiirther includes entries 256,258
dnd 260 for the addresses of page 1, page 2 and page 3 of virtual buffer C, respectively.
The translation and protection table 240 may also include unused portii)ns 242 that
separate the pages of the different virtual buffers. That is, an unused portion 242 may
I
separate the pages of virtual buffer A fi-pm the pages of virtual buffer B and a similar
I ■ unused portion 242 may separate the pajges of virtual buffer B from the pages of virtual
buffer C. The unused portions 242 may also be proyided at the beginning and end of the
translation and protection table.
Figure 7 shows an example translation and protection table entry that includes a
region identifier (also calleq region ID) field 330 in accordance with the present invention.
Each TPT entry 300 may correspond to a single fegistered memory page and include a
series of protection attributes (also referred to as access rights) 310, a translation cacheable
flag 320, a region identifier field 330, a physical page address field 340, and a protection
domain field 350. The protection domain field 350 may also be referred to as a protection
tag field, especially with respect to a NGIO architecture.

The protection attributes 310 may include, for example, a Memory Write Enable flag that indicates whether the host-fabric adapter 220 can write to a page (e.g., "1" page is write-enable, "0" page is not write-enable); a RDMA Rend Ennbie flag tlint indicates whether the page can be a source of l^DMA Read operation (e.g., "1" page can be source, i 5 "0" page cannot be source); and a RDMA Write Enable flag that indicates whether the page can be a target of RDMA Write operation (e.g., "1" page can be target, "0" page can not be target). The protection attributes 310 may control read and write access to a given memory region. These permissions are generally set for memory regions and virtual interfaces (Vis) when they are created, but may be modified later by changing the
10 attributes of the memory region, and/or of the VI. If the protection attributes between a VI and a memory region do not match (during an attempted,access), the attribute offering the most protection will be honored. For instance, if a VI has RDMA Read Enabled, but the memory region does not, the result is that RDMA reads on that VI from that memory region will fail. RDMA Read and Write access attributes are enforced at the remote end
15 of a connection that is referred to by the descriptor. The Memory Write Enable access attribute is enforced for all memory access to the associated page. An attempted message transfer operation that violates a memory region=s permission settings may result in a memory protection error and no data is transferred.
The translation cacheable flag 320 may be utilized to specify Whether the host-
20 fabric adapter 220 may cache addresses across transaction boundaries. If the translation
I
cacheable flag 320 is not set ("0" flag), the host-fabric adapter 220 may flush or discard a corresponding smgle TPT entiy from the internal cache 222 and retranslate buffer and/or descriptor addresses each time a new transacjtion is processed. However, if the translation
cacheable flag 320 is"set ("1" flag), the host-fabric adapter 220 may choose to reserve or
I


TFf entry in the interna) cache 222 for future re-use. Tliis way only a designated TPT entry as opposed to all TPT entries stored in the internal cache 22?, Ill lliu CIKI ofiin IQ tiiiii."iiiclion iiiiiy ho IhiMlicd or (li."iciirilad (i"oiii (lie iiilciiinl ciiclio 222. Since the host-fabric adapter 220 is instructed to flusli or discard indivitkia! "I"Pl" 5 entries as opposed to all cached TP"I" entries stored in the internal cache 222 at the end of an 10 transaction, the number of times the host-fabric adapter 220 must flush cached . address translations in the internal cache 222 may be di"astically reduced. The software driver of the host operating system 400 (see BIG. 4) may be used to sbt the status of the
r
I translation cacheabie flag 320 of each individual TPT entry stored in the internal cache 10 222.
The physical page address field 340imay include|the physical page frame address
I , , "
of the ent|-y. The protection domain field 3!50 includes identifiers that arc associated both

with Yls and wi|.jli host mcmoiy regions to tlcfine the access permission. Mcmoiy access I I
I I
may be alloi"ed by the host-fabric adapter 230 if the protection domain field of the VI and

15 of the memoiy regioiij involved are identical. Attempted memory accesses tliat vic|late this
rule may result in a memojy protection error and no data is transferred. While the
■ ■
protection domain 350 may be used to deny or allow access to the translation ajid
" J protection table, ^t is possible that different virtual buffers may be associated widi ^ similar
1 protection domain (or protection tag). In this situation, a wrong address liray bq accessed
20 ^ for the virtual buffer. Stated differently, in cases where the virtual address supplied wi,di
the memory handle! is outside of the rangje of addresses in the associated memory region,
the combination ofthat address and the memory handle can point to a translation entiy of
a different memoiy region that contains the same protection domain as the associated
memory regiofi"i. This error could allow the contents of the memoi^ regions to be

corrupted without detection.
The region identifier field 330 is prbvided to further deny or allow access to the translation and protection table. The region identifier field 330 provides memory access by the host-fabric adapter 220 if the region identifier field of the virtual interface and of 5 the memory region involved are identical. The region identifier field 330 thereby provides further protection functionality. Each translation entry associated with a specific memory region contains the same region identifier. Figure 7 shows a region identifier field 330 that would include a 12 bit region identifier. The use of 12 bits is merely an example embodiment. The present invention is
10 not limited to this number of bits as other lengths of bits for the region identifier field 330 may also be provided in accordance with the present invention. The region identifier field 330 is used to determine whether a protection violation may occur and thus access to memory is denied. This use of the region identifier field 330 provides unique advantages such as the ability to additionally deny or allow access to the buffers based on information
15 othei; than the protective domain field 350. As will be discussed below in greater detail, the region identifier field 330, unlike the protection dorhain field 350, is mathematically related to the entries within the translation and protection table and therefore helps to
, further distinguish between virtual buffers. Accordingly, the region identifier field 330
I I
may be used to deny or allow access tb a memory region even if different virtual buffers
20 are associated with a similar protection doii(iain.
I Figure 8 shows one embodiment of"how the region identifier field 330 may be f obtained during imemory registration or address translation by using a memory handle 400.
The handle MOO may be 32 bits and include a 27 bit handle portion 370 and a 5 bit key
portion 360. The 27 bit handle portion 370 is mathematically related to a specific ,"


translation ent^y and thus is related to a physical address of the data, T|ie 5-bit key portion 360, on the other hand, may be assigned by the control software when a virtual memory
buffer is registered.! The 5 bit key portion 360 may be selected by any number of means
I such as a (sequential value. For example, the control software may retain a copy of the
5 value of the last kejl portion used for each protection domain (i.e., for each memory
buffer). When a memory registration operation is requested, the control software [may
I look at the last key portion used for a given protection domain (i.e., for ope mehiory
, buffer) and then advance that value to the next sequential value. Th6 new valub of the key
portion may be Siived and used as the 5 bits of the key portion 360. Other algorithms for
10 advancing the valbe of the key portion 360 may also be used, such as random selection.
The host-fabric adapter 220 may determine the region identifier field 330 by combining
the 5 bit keyjportion 360 with the lower seven bits 372 of the 27 bit handle portion 370.
I I
However, the 5 bits of the key portion 360 and the lower seven bits 372 of the 27 bit handle portion 370 are merely an example embodiment. The present invention is also
15 applicable to other lengths for both the key portion 360 and the handle portion 370 and the combination thereof.
The 32 bit handle 400 may be supplied as part of the operation of requesting access to memory via the host channel adapter 220. For operations arriving from a remote host, the incoming message may contain the virtual address, the handle and the length of the
20 request. For operations originating on the local host, the outgoing descriptor may contain the virtual address, the handle and the length of the buffers to be accessed. As discussed above, the 12 bit region identifier field 330 may be generated from a 32 bit handle 400, which includes the 5 bit key portion 360 and the 27 bit handle portion 370.
For purposes of completeness, data transfer operations between host 110 and I/O

units and I/O controllers attached to the switched fabric 102 using TPT entries may be described as follows. Data transfer requests may be represented by descriptors. There are two general types of descriptors, send/receive and read/write (RDMA). Descriptors are data structures organized as a list of segments. Each descriptor may begin with a control 5 segment followed by an optional address segment and an arbitrary number of data segments. Control segments may contain control and status information. Address segments, for read/write operations, may contain remote buffer information (i.e., memory
associated with the VI targeted to receive the read/write jrequest). Data-segments for both
I
send/receive and read/write operations may contain information about the local memory 10 (i.e., memory associated with the VI issuing the send/receive or read/write request).
Figure 9A illustrates an example send/receive type descriptor 900 having a control
t
segment 902 and a data segment 904. Data segment 904, in turn, has a segment length
I
field 906, a memory handle field 908, and a,virtual address field 910. The segment length
t
field 906 spepifies the length of the message to be sent or that is to be received. The 15 meniory handle field 908 may be used to verify that the sending/requesting processi owns
the registered memory region indicated by segment length 904 and virtua one embodiment, the memory handle 908 may be 32 bits in length, corres 32 bit handle 400
address 910. In
3ondiiig to the
shown in Fig. 8 that includes the 5 bit key portion 360 and the 27 bit handle portion 370. The 12 bit region identifier field 330 may be formed from this 20 memojy handle 908. For a send operation, the virtual address 910 identifies the starting memory lockfion of the message (i.e., data) to be sent in the sending VI=s local memory space. For a receive operation, the virtual address 910 identifies the starting memory location of where the received message (data) is to be stored in the requesting VI=s local memory space.

Figure 9B illustrates an example read/write type descriptor 912 having a control
segment 914, an address segment 916, and a data segment 918. The address segment 916
has a remote memory handle field"920 and a remote virtual address field 922. The data
segment 918 has a segment length field 924, a local meipiory handle field 926, and a local
5 virtual address field 928. Similar to that discussed above, the remote memory handle 920 I and the local memory handle 926 may be 32 bits in length, corresponding to the 32 bit
I handle 400 shown in Fig. 8 that includes thej 5 bit key portion 360 and the 27 bit handle
1 portion 370. For a rsad operation, the remote virtual address 922 identifies the memory
location in[ the remote process" memory space, of the message (data) to be read. The local
10 virtual address 928 identifies the starting memory location in the local process^ memory
I space of where the received message is to be placed. The amount of memory to be |Used to
store the message is specified by the segment length field 924. For a writd oper£}tion, the remote virtual address 922 identifies the memory location in the local process" memory space of the message (data) to be written. The local virtual address 928 identifies the 15 starting memory location in the local process" memory space of where the message being written is Stored. The size of the message is specified by the segment length field 924.
The remote nfiemory handle 920 is that memory handle associated with the memory
I identified by remote virtual address 922. ."The local memory handle 926 is that memory
handle associated with the memory identified by local virtual address 928 and may be 32
20 bits in length including a 5 bit key portion 360 and a 27 bit handle portion 370. The 12 bit
region identifier field 330 may be formed from this local memory handle 926.
When a descriptor is processed by the host-fabric adapter 220, the virtual address
and the associated memory handle may be>used to generate a protection domain (or
protection tag or protection index). As discussed above, the protection domain may be

used to identify a TPT entry that corresponds to a single page of registered memory on which the posted descriptor is located. The 32 bit handle 400 may also be used to generate the region identifier field 330 as discussed above by using the 5 bit key portion 360 and the lower 7 bits of the 27 bit handle, portion 370. If the generated region identifier 5 field corresponds with the region identifier field of the TPT table 240 and the protection domains also match, then access to the addresses within the TPT table 240 is allowed, On the other hand, if the generated region identifier field does not correspond with the region identifier field of the TPT table 240, then access to the address is denied. From the identified TPT entry, the physical address associated with the virtual address may be
10 obtained. In send and receive operations, virtual address and the memory handles
correspond to memory handle field 908 and virtual address field 910 of FIG. 9A. In read and write operations, tiie virtual address and memory handle correspond to the remote memory handle 920 and remote virtual address field 92(2 on the remote host-fabric adapter, and local memory handle field 926 apd local virtual address field 928 on the local
15, host^fabric adapter 220 of FIG. 9B.
An example send descriptor may be processed by the host-fabric adapter 220 in the
manner as shown in FIG. 10. The order oflthe blocks in Fig. 10 are merely an example
embodiment as the Islocks may be performed in other orders in accordance with the
I present invention. In block 1000, the host-fabric adapter 220 retrieves the message=s
20 starting virtual address 910 (in the local, or sending process= memory space), and a
memory handle 908 associated with the message=s memory region. The virtual aijdress
910 and the memory handle 908 may be used to generate a protection doitiain (block
1002). The memory handle 908 may also be used to generate the region identifier field
I 330 as discussed above. The protecfion domain and the region identifier field 330 may be


used to identify and retrieve translation information stored in a TPT entry that
i , " corresponds to a single page of registered memory on which the posted descriptor is
" located (blocks 1004 and 1006). If the retrieved protection domain matches the protection
I domain associatedlwith the local (sending) process (>yes== prong of block 1008), and if
5 the retrieved region identifier field and the sending process= region identifier ^eld match I (>yes= prong of block 1010), then the host-fabric adapter 220 sends the message toward
the destination (remote) by transmitting (block 1012) the same (the message or data) via
I
the switched fabric 102 (see FIGs. 1-3). If the retrieved protection domain and the sending process= protection domain do not match (>no= prong of block 1008) or if the retrieved
10 region identifier field and the sending process= region identifier field do not match (>no= prong of block 1010) then a memory protection fault may be generated (blocks 1113 and 1114) and no data is transferred via the switched fabric 102. Receive descriptors may be processed in an analogous fashion.
Similarly, an example read descriptor may be processed by the host-fabric adapter
15 220 the manner as shown in FIG. 11. The order of the blocks in Fig. 11 are merely an example embodiment as the operations may be performed in other orders in accordance with the present invention. In block 1100, the host-fabric adapter 220 retrieves the message=s destination virtual address 928 (in the local, or receiving process= memory space), a memory handle 926 associated with the message=s destination memory region,
20 and indication of how long the incoming message is. The virtual address 928 and memory handle 926 may be used to generate a protection domain (block 1102). The memory
handle 926 may be used to generate the region identifier field 330 as discussed above.
I The protection domain is used to identify and retrieve translation information stored in a
TPT entry that corresponds to a singje page of registered, memory on which the posted

descriptor is located (bloclcs 1104 and 1106). ilf tlie retrieved protection domain matches
r
I "
the protection domain associated with the local (receiving) process (>yes= prong of bloclc 1108) and if the retrieved region identifier field and the sending process^ region identifier field match, (yes= prong of block 1110), then the host-fabric adapter 220 copies (bioclc 1112) thejmessage into the local process= memory. If the retrieved protection domain and
the receiving pr6cess= protection domain do not match (>no= prong of bioclc 1108) or if
I
the retrieved region identifier field and the sending process^ region identifier field 330 do
I I
y
not match (>[email protected] prong of block 1110), then a memory protection fault is.generated
(blocks 1113 or 1114) and no data is transferred via the switched fabric 102. Write
10 descriptors may tje processed in an analogous fashion.
Figures 10 and 11 show one embodiment of using a protection domain and region
identifier" field to, validate an access request. The order of operations shown in these
figures is ncit limited by the disclosed order as these operations may be performed in other
orders. i
15 One further advantage of the present invention is that each time an address is
translated and its protection is checked, only one access to the translation and protection table is needed.
While there have been illustrated and described what are considered to be example embodiments of the present invention, it,will be understood by those skilled in the art and
20 as technology develops that various changes and modifications rhay be made, and
equivalents may be substituted for elements thereof without departing from the tme scope of the present invention. For example, the present invention is applicable to all types of redundant type networks, including, but not limited to, Infiniband, Next Generation Input/Output (NGIO), ATM, SAN (system area network, or storage area network), server

net. Future Input/Output (FIO), fiber channel, Ethernet). In addition, the process shown in FIGs. 10 and 11 may be performed by a computer processor executing instructions organized into a program module or a custom designed state machine. Storage devices suitable for tangibly embodying computer program instructions include all forms of non-
5 volatile memory including, but not limited to: semiconductor memory devices such as EPROM, EEPROM, and flash devices; maetic disks (fixed, floppy, and removable); other magnetic media such as tape; and optical media such as CD-ROM disks. Many modifications may be made to adapt the teachings of the present invention to a particular situation without departing fi-om the scope thereof. Therefore, it is intended that the
10 present invention not be limited to the various example embodiments disclosed, but that the present invention includes all embodiments falling within the scope of the appended claims.

We Claim:
1. A host comprising:
a processor;
a host memory coupled to said processor; and
a host-fabric adapter coupled to said processor and provided to interface with a switched fabric including one or more fabric-attached I/O controllers, the host-fabric adapter including logic for accessing a selected translation and protection table from said host memory for a data transaction, said translation and protection table including at least one entry having a region identifier field used to validate an access request and a protection domain field also used to validate an access request.
2. The host as claimed in claim 1, wherein said region identifier field comprises a key portion and a handle portion, said handle portion being related to a location of said entry in said translation and protection table.
3. The host as claimed in claim 1, wherein said host memory comprises a first memory region and a second memory region and wherein each entry in said translation and protection table associated with said first memory region includes a first region identifier in said region identifier field and each entry in said translation and protection table associated with said second memory region includes a second region identifier, said first region identifier being different than said second region identifier.
4. The host as claimed in claim 1, wherein each of said translation and protection table entries represents translation of a single page of said host memory.
5. The host as claimed in claim 1, wherein said host-fabric adapter is provided to perform virtual to physicd address translations and validate access to said host memory using entries in said translation and protection table.
6. The host as claimed in claim 1, wherein each of said translation and protection table entries further comprise: protection attributes that control read and write access to a given memory region of said host memory; a translation cacheable flag that specifies whether said host-fabric adapter may flush a corresponding translation and protection table entry stored in an internal cache; and a physical p^ge address field that addresses a physical page frame of data entry.


7. A network, comprising
a switched fabric:
I/O controllers attached to said switched fabric; and
a host comprising an operating system, a host memory, and a host-fabric adapter that accesses a translation and protection table fi-om said host memory for a data transaction, said translation and protection table including at least one entry having a region identifier field used to validate an access request and a protection domain field also used to vaUdate said access request.
8. The network as claimed in claim 7, wherein said region identifier field comprises a key portion and a handle portion, said handle portion being related to a location of said entry in said translation and protection table.
9. The network as claimed in claim 7, wherein said host memory comprises a first memory region and a second memory region and wherein each entry in said translation and protection table associated with said first memory region includes a first region identifier in said region identifier field and each entry in said translation and protection table associated with said second memory field includes a second region identifier in said region identifier field, said first region identifier being different than said second region identifier.
10. The network as claimed in claim 7, wherein each of said ti-anslation and protection table entries represents translation of a single page of said host memory.
11. The network as claimed in claim 7, wherein said host-fabric adapter is provided to perform virtual: to physical address translations ^d vaUdate access to said host memory using entries in said translation and protection table.
12. The network as claimed in claim 7, wherein each of said translation and protection table entries fiirther comprise: protection attributes that control read and write access to a given memory region of said host memory; a translation cacheable flag that specifies whether said host-fabric adapter may flush a corresponding translation and protection table entry stored in an internal cache; and a physical page address field that addresses a physical page fi-ame of data entry.

13. A method of validating an access request to a host, said host being coupled to a switched fabric
and including a processor, a host memory coupled to the processor and a host-fabric adapter
coupled to the processor and provided to interface with the switched fabric, the method
comprising:
accessing a selected translation and protection table from said host memory for a data transaction,
said translation and protection table including at least one entry having a region identifier field
and a protection domain field;
comparing a first protection domain with a second protection domain provided in said protection
domain field of said entry to validate said access request; and
comparing a first region identifier with a second region identifier provided in said region identifier
field of said entry to validate said access request.
14. The method as claimed in claim 13, wherein said region identifier field comprises a key portion and a handle portion, said handle portion being related to a location of said entry in said translation and protection table.
15. The method as claimed in claim 13, wherein said host memory comprises a first memory region and a second memory region and wherein each entry in said translation and protection table associated with said first memory region includes said second region identifier in said region identifier field and each entry in said translation and protection table associated with said second memory region includes a third region identifier in said region identifier field.
16. The method as claimed in claim 13, wherein each of said translation and protection table entries represents translation of a single page of said host memory.
17. The method as claimed in claim 13, further comprising said host-fabric performing virtual to physical address translations after vaUdating said access requests.

Documents:

abstract1.jpg

in-pct-2002-01648-mum-cancelled pages(18-11-2002).pdf

in-pct-2002-01648-mum-claims(granted)-(24-05-2004).doc

in-pct-2002-01648-mum-claims(granted)-(24-05-2004).pdf

in-pct-2002-01648-mum-correspondence(23-03-2005).pdf

in-pct-2002-01648-mum-correspondence(ipo)-(17-10-2007).pdf

in-pct-2002-01648-mum-drawing(24-05-2004).pdf

in-pct-2002-01648-mum-form 19(15-12-2003).pdf

in-pct-2002-01648-mum-form 1a(18-11-2002).pdf

in-pct-2002-01648-mum-form 1a(24-05-2004).pdf

in-pct-2002-01648-mum-form 2(granted)-(24-05-2004).doc

in-pct-2002-01648-mum-form 2(granted)-(24-05-2004).pdf

in-pct-2002-01648-mum-form 3(24-05-2004).pdf

in-pct-2002-01648-mum-form 5(24-05-2004).pdf

in-pct-2002-01648-mum-form-pct-ipea-409(18-11-2002).pdf

in-pct-2002-01648-mum-form-pct-ipea-416(14-1-2003).pdf

in-pct-2002-01648-mum-form-pct-isa-210(18-11-2002).pdf

in-pct-2002-01648-mum-petition under rule 138(16-11-2004).pdf

in-pct-2002-01648-mum-power of attorney(24-05-2004).pdf


Patent Number 211115
Indian Patent Application Number IN/PCT/2002/01648/MUM
PG Journal Number 45/2007
Publication Date 09-Nov-2007
Grant Date 17-Oct-2007
Date of Filing 18-Nov-2002
Name of Patentee INTEL CORPORATION
Applicant Address 2200 MISSION COLLEGE BOULEVARD, P O BOX 58119 SANTA CLARA, CA 95052 USA
Inventors:
# Inventor's Name Inventor's Address
1 BERRY FRANK L 21269 NW PUMPKIN RIDGE ROAD, CORNELIUS, OR 97133, U.S.A.
PCT International Classification Number G06F 1/00
PCT International Application Number PCT/US01/16103
PCT International Filing date 2001-05-17
PCT Conventions:
# PCT Application Number Date of Convention Priority Country
1 09/583,950 2000-05-31 U.S.A.