Friday, October 23, 2020

I know it when I see it

video of the entire presentation.

I know it when I see it. The true measure of quality. Any user will tell you if they had a good experience or not.

In-flight connectivity is a little difficult to measure. Typically, the focus is on Service Level Agreement, or SLA, and that relates to another thing that we hear about, which is Quality of Service, or QoS. QoS really focuses on the radio technology that connects the airplane to the ground. 

A radio channel can be set aside for different users; groups of users. Those are slices or network slices. 

Within a particular network slice, you can differentiate service between users; that's called traffic shaping

Intertwined with all of that is cybersecurity.
I need an analogy for service level agreement or SLA. I thought it would be fun to think about finding a parking spot at a business.  I'm sure you've all gone to a grocery store or hardware store and you expect to find a spot, hopefully near the door so you don't have to carry your stuff too far. If you show up and there is no spot, that can be frustrating, and you might end up going to a different grocery store with more parking. 

I’m going to call that SLA - providing a parking spot. 
In this example, there's one parking spot available. A little blue car arrives.
The spot is open, and the car pulls in. The service level agreement has been met. We provided a parking spot.

The question is how many parking spots do you need to provide? 

What happens if it's hard to find a parking spot? We'll call that contention. 

What if you can't find a parking spot at all? We'll call that congestion.

For example, you arrive and the one parking spot is occupied; there's somebody in it. 
What are you going to do? You're going to leave! This is an example of congestion; the SLA was not met.

The business owner is going to add capacity. For example, they put a second spot; now there are two spots. One of them is filled, but the other one is available. 
You arrive and pull into that second spot. This is an example of contention. You found a parking spot. You had to choose between the two, maybe it's a little further out, but you got your spot. The SLA was met.

Even with two spots, you arrive and both spots are full. 
What's going to happen? You're going to leave. Another example of congestion. The SLA was not met.

Perhaps in a progressive situation like this, the business may respond more aggressively, maybe providing a total of six spots instead of two. Now there are plenty of spots available when you show up and two are occupied. 
This is an example of contention. The SLA was met. Maybe it's not the best parking spot, but there was one available.
The business is paying for six spots and three of them are sitting there open. This is unused or excess capacity.
Let's take a look at a bigger example. Here we've got a parking lot with a whole bunch of parking spots. In this example, two of the spots are occupied and a car arrives. Obviously, it's not going to be a problem finding a parking spot.
What if the two cars that were in the lot were using more than their allocated spot? For example, what if they were overlapping into an adjacent spot, or even three or four spots? Their usage is above what the SLA guarantees, but it really doesn't matter. 
You can pull in find a spot. In light usage conditions, contention is not really a big deal. You don't even have to worry too much about how the users themselves are behaving. However, that's a lot of excess capacity! I don't think that's where anybody wants to run their network.
This is maybe a little more pertinent example. Here you can see that most of the parking spots are full, but not all of them. 
It's still not too difficult to find a spot. This is an example of contention. The SLA is met. It's a bit more efficient network because there are fewer open spots.
What if there's only one spot available? 
In this case, it might take a while to find the open spot. There was a spot available. You can argue that the SLA was met. It took a while to find it; there was substantial contention. Some people might have left feeling the SLA was not met.
What if you arrive and there's no spot available? 
It might take a while to figure that out which could lead to quite a bit of frustration. Not only did you not serve the customer, there was congestion and their SLA wasn't met, but you also probably frustrated them and they may not come back to the store because of this experience.
That example is a bit extreme, so consider this situation, where four cars are leaving and four cars are now waiting for those spots, and four more cars arrive.  There are eight cars without spots when they arrived. 
Once the four cars pull out and the cars waiting pull in.
The remaining four cars circulate until finding four more cars ready to leave.
Now all eight cars have parked, nobody left, there was no congestion, and we handled eight more cars than our capacity would suggest. However, there was some delay and how much gets back to an SLA that promises to provide a parking spot but now requires the spot to be available within a certain amount of time upon arriving.

What kind of parking spot should we have? 

We have talked about how many spots we need. Does every spot serve everybody's needs? What if we have a different kind of user involved? 
In this case, I'm showing it as an ambulance, but it could be a wheelchair van or something like that, that needs access and proximity for loading a stretcher or a wheelchair. It’s important to have space available if an ambulance arrives and somebody needs it! You want to make sure that they can park.

What happens if an ambulance arrives at the parking lot and there is no space available to them?

This is an example of network slicing. We're dealing with two kinds of users: 
  • in the black cars or just regular cars
  • in the red, the ambulance or a wheelchair van
We're going to treat each of them as a network slice.

In this case, there's no capacity, no parking spot available for the ambulance. Their slice is congested. They're going to leave.

That was congestion for the network slice serving the ambulance, however, all the other cars are happily parked; their SLA is met.
It gets more complicated with network slices because you have an SLA for the ambulance, in this case it's congested, while in the case of the typical car it's a contested situation.
What's the answer to this? You have to set aside some of your capacity to serve these other users. In this example, two of the parking spots are set aside for ambulance or wheelchair users; with little red icons. All of the other spots are available to anybody arriving. There's no distinction for those spots. This is an example of network slicing by priority. 
With an available spot the, the ambulance comes in. There's contention, but there wasn't anybody else to contend with for these particular spots. It appears all the network slices are being served.

Another user arrives and there's no available spot for them, even though there are two spots sitting vacant. They can't use them because they're reserved. 
They're going to leave. That's an example of congestion to their SLA. The network slice set aside for the regular users wasn't adequate. We didn't have enough parking spots for that one user. The ambulance spots appear to be available. That network slice is being satisfied.
It would be interesting if we could make use of those two spots while they don't need to be used by an ambulance or wheelchair user. Here is a different colored user, just because they have to be able to give way, give up their spot. They can use it, but they'll lose it if a higher priority user arrives. This is a way of achieving SLA by using unused capacity from a higher priority network slice. 

It's not perfect. 
If an ambulance or wheelchair user arrives, then that user is going to have to leave. In this case, their SLA was no longer met due to congestion. 
The ambulance did arrive and got their spot. Their SLA was met.

The ambulance or the wheelchair user could use any spot. What if you had a big RV or were towing a trailer. These just don't fit in a single spot; they need more spots. They don't fit in the network slice that's set aside for the other users. 
If they show up and don't see any place for them to park, then they're going to leave. Their SLA is not met. I'll call them a heavy user. You might think of them as a video streamer or something like that, but in our example, they just don't fit in a single spot. They need two adjacent spots, or more, to fit in. 
One way to solve this problem would be to set aside a few spots for RV-only parking, pull through parking, things like that. That can be set aside on the outer fringes of the parking lot. Reserving capacity for the heavy users is a different kind of a network slice, setting aside more capacity for a single user. In the case with the ambulance, it's the same capacity, you just you 
have priority to the to the other users.
If the RV shows up, they pull right on.  There's still contention, there's only one spot there. Thankfully it was available and everybody's happy. 

You arrive finding that a bunch of RVs showed up and hogged all the spaces. 
The heavy users are starting to squeeze your experience. You're having to work pretty hard to find a spot. This is an example of  a user doing something, using more space than they should, and that doesn't seem fair. That leads us to what we're going to call traffic shaping.

A car shows up and one spot should be open, but a parked car has managed to take two spots. The other spot is the only one that's available, so that's a problem. This is a user taking more than their fair share; more than they deserve. 
You need something to fix that. In our case, we have a little tow truck that goes over there and repositions the car to put it back in its space. That's what we're going to call traffic shaping or putting users back inside their lanes. 
Now the other spot opens up and everyone’s SLA is met.

Inflight Connectivity (IFC) 

The following figures use nomenclature described by the Seamless Air Alliance.

A user device could be a laptop, cell phone or a tablet. It connects to the onboard wireless network, the Wi-Fi network, a wireless LAN. Onboard the aircraft there's some form of AAA (Authentication, Authorization, Accounting). 

The backhaul is the radio network that connects the airplane to the ground. A half of that's on the airplane on the left side.

Backhaul connect to the Resident Hot Spot Provider (HSP) which provides internet access.

A cell phone would connect to the onboard Radio Access Node, for example an eNodeB. An onboard core function provides some form of AAA. A mobile network operator (MNO) sits on the ground providing internet access.

The interconnections between onboard components, such as access points,  head-end servers, the portal, backhaul, in-flight entertainment (IFE) are preferably Virtual LAN (VLAN). VLAN IDs segment the traffic corresponding, in part, to network slice. 
The Backhaul radio has its own way of slicing the traffic. It may have system-essential signaling
or other kinds of services. It's also mixing traffic with other airplanes. The radio channel management blends with the onboard VLAN ID segmentation.

A service level agreement may exist with regards to the backhaul radio and for some end-end services. 

Quality of Service (QoS) is indicated in some manner.

VLAN IDs are discussed in AEEC Project Paper 688 within the Network Architecture and Security Subcommittee. These are also discussed in ARINC 791 Part 2, which is proposed to follow these concepts. 

VLANs are segmented by domains as described in ARINC 664. There are four domains, but the three of interest includes a public/passenger facing domain, a private a private domain for in-flight connectivity and in-flight entertainment and a secure domain for avionics (not relating to safety).

Differentiated Services Code Point (DSCP) is one method for marking in the IP header. Class of 
Service (COS) is one method for marking in the VLAN is another way of doing the
marking in an Ethernet Frame Header. Those are both connectionless methods

QoS Class Indicator (QCI) is another measure of per-hop behavior.

The order to preference for marking is VLAN Class of Service, then DSCP, and finally connected QoS (based on the physical/logical network access port).

The marking is shown as a sequence of bits. This provides for four layers of Quality of Service. Expedited forwarding, assured forwarding, up to best.

The idea is to divide the users on board the aircraft by service levels. For example, one slice could cater to those that have not purchased any IFC plan. Another slice could be set aside for those purchasing a messaging product, perhaps Best Effort. Other slices could be used for those with browsing or streaming products, perhaps Assured Forwarding. 

Any domain can apply four (or more) different slices. 

The network slice may get into bit rates or latencies. Quality of Service and Service
Level Agreement are related to channel capacity and information rate. However, none of these are directly indicative of the user’s experience.

Quality of Experience (QoE) is an elusive measure. 

Users don't use the internet the same way. One person could be browsing, another
person might be doing email, another person is on Twitter, another person is watching a title on YouTube. they've all purchased similar products, but they're trying to use them in very different ways. 

How to measure Quality of Experience?
SLA sets the Quality of Service of the network slice, not the user Quality of Experience. All the users are in the same network slice. Is one user getting a good experience and another one getting a bad experience?

Quality of Experience relies on user-based measurements. That involves some form of packet inspection which gets to traffic shaping. The most basic form of managing users is by restricting their data rate. 

If the capacity is 10 Mbps and there is only one user, then that user could possibly use all 10 Mbps. If there were two users, then you could just limit each user to 5 Mbps. In reality, usage may ebb and flow, allowing higher instantaneous data rate to each user than simply their share, and allowing for contention amongst more users than simply dividing the available data rate. This is the art of managing contention without leading to congestion.

Video streaming is a form of traffic shaping. Many cellular Mobile Network Operators offer users limited streaming resolution (e.g. 480p) to avoid excessive cellular data usage.

Sandvine is a well-known provider of network optimization. Their portfolio of services that are targeted to specific aspects provides a robust solution.  Each of these services or tools is managing Quality of Experience for some or all users. 
  • Fair Usage and Congestion Management
    • “Priority by application or application type, by source or destination, and by network type”
  • Video Streaming Management
    • Managing video bitrate
  • Heavy User Management
    • Penalty Box
  • TCP Optimization
    • “Goodput, defined as the payload without retransmissions” 
  • Traffic Steering/Optimization
    • “Diverting traffic for value added services (VAS) often requires steering at the application level, which has become increasingly difficult to do with the continual rise of encryption. Sandvine’s Active Network Intelligence Classification Engine uses machine learning to ensure accurate traffic classification in spite of encrypted DNS and TLS 1.3. This ActiveLogic function delivers the maximum achievable efficiency for traffic redirection, far beyond imprecise, port-based approaches, reducing the number of router ports, load balancing devices, and VAS elements needed.”
Placement of network optimization features favors a central location on the border of Internet Access. However, some features may best be served onboard the aircraft. 

Regardless of the technology or application, the User has a perspective – I know it when I see it. Passenger interaction is one measure of satisfaction. Did the session result in typical usage? How active was the session? Is this a repeat User, is there any trend? Portal interaction can be monitored directly for difficulties or queries that suggest frustration or a problem. Direct User interaction by refund/adjustment, survey, message or voice offers a definitive measure.

Cybersecurity is never far away from any network discussion.  ARINC 848 has been released and provides a foundation for this discussion. Cybersecurity and Quality of Service have a tendency of interfering with each other. Cybersecurity methods can blind Quality of Service. The methods for marking QoS described are designed to work with Cybersecurity features.
The architecture builds on the air ground radio. Onboard the aircraft is a remote terminal offering connectivity for passenger services. The backhaul function involves a wireless connection and utilizes propriety and confidential methods for managing each radio channel considering QoS and SLA. Each radio has unique attributes and features.
A Communication Service Provider (CSP) extends the backhaul connection to the internet. The CSP was referred to as the Resident HSP or Resident MNO earlier.

AAA functions exist within Backhaul to manage remote terminals and within the CSP to manage users. 

The Commercial off the Shelf (COTS) layer includes Backhaul and the CSP. Best practices for
public internet is provided as a part of the service. Service providers compete on features and cybersecurity is differentiating. 
A subnetwork layer sits above the COTS layer, and this is where 848 of lives. 
The Media Independent Secure Offboard Network (MISON) is a Virtual Private Network (VPN), for example IPsec. The MISON connects an onboard LAN with an Enterprise LAN. An Enterprise could be an airline or a third-party service or content provider. The MISON doesn't stop at the teleport or at the communication service provider; it is end-to-end.
VPNs obscure the payloads. It’s difficult to tell exactly what's going on within a VPN. Network slicing is the methodology. VPNs use QoS markings, such as CoS or DSCP, that are evident. This allows for differentiation between traffic types. A given enterprise interface may need multiple MISON tunnels to support differing QoS. 
An Application channel sits on top of all of this. This is where the information is exchanged. This is the third tier in defense in depth. The Application itself has to have a mechanism to ensure that the source of the data or information it's communicating with is authentic and that the information that's being provided has not been corrupted, at least to a level of assurance that's acceptable.
The MISON VPNs are added and the firewalling at the COTS level firewalling represent specific cybersecurity features. The onboard segmentation by VLAN ID and by MISON are other aspects of network security. MISON channels are established by the onboard MISON client and subject to certificate-based authentication. The MISON client drops any traffic not associated with an active MISON channel. Keep in mind that only non-Passenger traffic is managed using MISON. The Passenger onboard is provided best-practice public internet. Methods promoted by the Seamless Air Alliance provide for WPA3 enterprise level security for the user device attaching to the onboard Wi-Fi. 

Stay tuned!

Peter Lemme

peter @
Follow me on twitter: @Satcom_Guru
Copyright 2020 All Rights Reserved

Peter Lemme has been a leader in avionics engineering for 38 years. He offers independent consulting services largely focused on avionics and L, Ku, and Ka band satellite communications to aircraft. Peter chaired the SAE-ITC AEEC Ku/Ka-band satcom subcommittee for more than ten years, developing ARINC 791 and 792 characteristics, and continues as a member. He contributes to the Network Infrastructure and Interfaces (NIS) subcommittee developing Project Paper 848, standard for Media Independent Secure Offboard Network.

Peter was Boeing avionics supervisor for 767 and 747-400 data link recording, data link reporting, and satellite communications. He was an FAA designated engineering representative (DER) for ACARS, satellite communications, DFDAU, DFDR, ACMS and printers. Peter was lead engineer for Thrust Management System (757, 767, 747-400), also supervisor for satellite communications for 777, and was manager of terminal-area projects (GLS, MLS, enhanced vision).

An instrument-rated private pilot, single engine land and sea, Peter has enjoyed perspectives from both operating and designing airplanes. Hundreds of hours of flight test analysis and thousands of hours in simulators have given him an appreciation for the many aspects that drive aviation; whether tandem complexity, policy, human, or technical; and the difficulties and challenges to achieving success. 

No comments:

Post a Comment