Sunday, January 3, 2021

Mid-Value Select (MVS): Goldilocks in the House of MCAS

Angle of Attack (AoA) is measured on the 737MAX using two AoA vanes, one installed on the left side of the airplane and the other on the right side. While each AoA vane is calibrated and precise, inferring AoA directly overlooks local aerodynamic factors that can cause the sensor reading to diverge from the desired value, notably with sideslip. A tolerance exists from what the AoA vane senses and the expected value. The question arises, how to manage AoA sensor tolerances or undesired behavior to maximize availability of desired functionality, while limiting exposure to undesired hazards?

Disclaimer

The following analysis is based on very limited information released by Boeing to the public via FAA report. The analysis and conclusions presented are a best-faith effort to understand the design and ramifications, but may be flawed and inaccurate.

I left Boeing in 1997. I have no relationship with anyone regarding the 737 MAX, MCAS, or the JT610 and ET302 tragedies, nor have I had any ever. 

MCAS Availability

The 737 MAX infamously includes the Maneuvering Characteristics Augmentation System (MCAS). Boeing professes that MCAS is triggered when AoA exceeds a high trip threshold and unwinds when AoA falls below a low reset threshold.  MCAS applies Airplane-Nose-Down (AND) stabilizer trim command at rapid rate when triggered and an equal Airplane-Nose-Up (ANU) stabilizer trim command when reset.

MCAS is triggered to increase the "pull" stick force gradient, to improve the handling characteristics in high alpha scenarios, to offset an undesired shift of aerodynamic lift forward. Originally, Boeing had provided for a single-thread MCAS solution, alternating between the left and right side avionics. In this context, the availability of the system, when needed, was limited by any single-point failure. 

A hazard is a combination of MCAS not being available and encountering an MCAS triggering event (high alpha). 

Effectively, there is some acceptable risk for the probability of MCAS failing on the same flight that MCAS is needed. Either the combination is less than extremely improbable (never happen) or the probability of the subsequent actions without MCAS triggering leading to catastrophe adds enough margin to stave it off.

Consequently, MCAS needs to be as available as a single thread, and not necessarily any more. 

MCAS Fail-Safe

MCAS malfunction is an entirely different issue. Without belaboring the many, many issues that lead us here; the FAA has mandated that MCAS be fail-safe. The design allows for a second independent computer that must agree that MCAS should be triggered, before the command may be issued to the stabilizer actuator. 

Since each computer operates with independence, in effect, both have to be above the trigger threshold to agree that a trigger event has occurred. Along the same logic, both must be below the reset threshold to agree that a reset has occurred.

Two challenges are immediately brought to the surface when implementing such a system:

1) sensor input differences

2) computational differences

When designing a fail-safe system, each of these two factors has to be accounted for.

Sensor input differences refer, in this case, the left and right AoA values. AoA sensors may be different due to failure, degradation, contamination, damage, A/D tolerances, side slip and other local aero effects.

Computational differences relate to sequencing, framing/scheduling, rounding. A real-time operating system (RTOS), carefully managed time-triggered sensing network interfaces, along with synchronized cross-channel information transfer, are tuned to keep the parallel channels in lock-step.

A brick-wall architecture allows for each channel to operate without regard to the other channel(s); it votes with the output value. Brick wall is akin to arm-wrestling. Without belaboring this, one challenge with brick-wall is that differences can appear without appreciating what is driving the difference. In effect, the arm wrestling could wane if all the channels can understand what is causing the difference. The weakness is common-mode failures, fooling all the channels at the same time.

A triple-channel system is based on majority rules, two outweighs one, for fail-op. If one channel fails, then it continues as fail-safe, only operating if the surviving channels are in agreement. 

There is already a tragedy (seven perished) involving a triple-channel design, where two AoA vanes froze simultaneously, out-voting the remaining operable AoA vane.


For MCAS, there are actually four channels involved, two channels in each of two Flight Control Computers (FCC) (dual-dissimilar). For fail-safe, the left FCC and the right FCC have to agree for an MCAS command to be issued. 

MCAS Input Signal Management

Boeing chose to modify the brick-wall architecture by applying input-signal management (ISM). In this vernacular, Boeing is choosing a single AoA value that both the left and right FCC channels will utilize. Normally, each side is on their own to prevent common-mode failure - hence the term brick-wall.

I started my career at Boeing in 1981, working on the Pitch Augmentation Control System (PACS) for 757 and 767. PACS was a dual-channel computer system that had a small elevator actuator that could apply elevator to offset undesirable pitchup at high alpha. PACS was replaced by vortex generators and on the 767 a stick nudger was applied. The basic story is that wind-tunnel testing showed the problem, but in flight test, the handling qualities were deemed "acceptable" (as I recall) with the changes mentioned. PACS pioneered in-line monitoring - the idea that the two channels would vote progressively from input to output, rather than pure brick wall. My task was to simulate the inputs and create failures and see if PACS detected them. I also simulated failures on the output stages to see how the system would respond. We had Moog hydraulic actuators to drive, which with 3000 psi Skydrol, had to put into a special area with personal protection. Here I delved into synchros and resolvers, a technology that followed me into thrust management systems (TMS) a year later when PACS was canceled. General Electric made both PACS and TMS with the same computing platform, so it was an easy transition.

Analytically, the airplane is safer if MCAS were to miss a command opportunity due to error tolerances than to trigger falsely. The error tolerances are chosen in the context of the maneuvers, where a more significant excursion outside of the normal flight-envelope will assuredly trigger MCAS and that a slight excursion is by definition a safe recovery, even if caught up by worst-case error tolerances and a missed command opportunity.

Boeing disables MCAS if more than 5.5 degrees of AoA difference exists between left and right AoA values. As long as the difference is less than 5.5 degrees, the presumption is that both vanes are operating near-enough to the true AoA.

Split-Vane Monitor

Boeing has added a split-vane monitor to declare MCAS inoperative if left and right AoA differ by more than 5.5 degrees.

Each FCC receives left and right AOA sensor values from the left and right Air Data Inertial Reference Unit (ADIRU), respectively. The AOA values are transmitted from the ADIRUs to the FCCs via databuses. 

Certain AOA sensor failures are not related to degradation of the electrical circuit, and therefore are not detected by the ADIRU. These failures result in AOA values transmitted by the ADIRU as “valid” when, in fact, they are not correct. These outputs of the ADIRU are referred to as “valid erroneous” data. Examples of failures that result in erroneous data include a bent or broken AOA vane (e.g. due to a bird strike or ramp damage) or a mis-calibrated AOA sensor (e.g. JT610 scenario). 

An AOA split-vane monitor and middle-value select (MVS) have been added to prevent MCAS from using AOA inputs that differ from the other AOA input by more than 5.5 degrees. 

Boeing set the designed AOA input differential threshold of 5.5 degrees, based on electro- mechanical tolerances of the sensor and normal transient aerodynamic effects on the AOA sensors mounted on opposite sides of the fuselage during flight with flaps up. 

The AOA split-vane monitor threshold is large enough to allow for expected variations in AOA sensors but small enough to prevent MCAS activation due to erroneous AOA data.

The split vane monitor compares two valid AOA inputs and will use them only if the difference between the AOA values is less than or equal to 5.5 degrees. If the difference is greater than 5.5 degrees for a specified duration, the MCAS and Speed Trim functions will be disabled for the remainder of the flight. The split vane monitor becomes active after the flaps have been retracted during flight.

Effect of Erroneous AOA Value on MCAS — Activation of the split vane monitor will completely disable the STS (which includes both MCAS and Speed Trim System), will trigger a Master Caution indication, an illumination of the Flight Control (FLT CONT) annunciator, and an illumination of the SPEED TRIM FAIL light on the overhead panel. The Master Caution indication and the FLT CONT annunciator can be reset by pressing the MASTER CAUTION PUSH TO RESET button. The SPEED TRIM FAIL light will remain illuminated for the remainder of the flight. In addition, an accompanying maintenance item is recorded for the loss of MCAS and Speed Trim. 

Mid-Value Select (MVS)

Boeing has described another concept, Mid-Value Select (MVS), for blending the two AoA inputs into a single value for consumption by both FCC channels.  In this discussion, Boeing reveals an unrelated issue of undetected, erroneous oscillatory AoA behavior. The MVS algorithm is applied, in-part, to suppress this behavior.

MVS logic has been added to the MCAS AOA signal processing to mitigate the potential hazard of undetected erroneous oscillatory AOA signal.  The MVS algorithm is effective at minimizing the effect of a low amplitude oscillatory input value. 

The MVS output is initialized at zero degrees. 

The MVS utilizes three numbers: the two current AOA values and the MVS output from the previous MVS determination. 

The algorithm determines the middle value of the three numbers by eliminating the highest and lowest values and using the remaining value (for example, for inputs 1, 2, and 4, the middle value is 2). 

The output of the MVS is used by the MCAS function within the FCC.

The objective is to trigger when both AoA vanes cross above the trigger threshold and to reset with they are both cross below the reset threshold. In effect, to yield to the most conservative vane. While, at the same time suppress a single AoA vane exhibiting oscillatory behavior.

Boeing professes that if an AoA vane is detected failed (electrical failure), that MCAS will revert to using the surviving AoA vane for the remainder of the flight. The expectation is that it a second, independent failure of the surviving vane on that flight is an acceptable risk.

Effect of Detected Failed AOA Sensor on MCAS — if a failed AOA circuit is detected, the FCCs will receive only one valid AOA value. The FCCs will utilize the valid AOA value to control MCAS. The Split Vane Monitor and MVS are not utilized.

During execution of the descent phase Master Caution recall checklist procedure, the SPEED TRIM FAIL light will be illuminated so the pilots will be aware of the condition. MCAS and Speed Trim will continue to operate using the available valid AOA signal. This design preserves the availability of Speed Trim and MCAS operation after a single detected failed AOA sensor.

A second independent failure during the same flight is considered to be extremely improbable. If a second independent failure affects the remaining AOA sensor, any resulting activation of MCAS would be limited to a single MCAS command (up to 2.5 degrees as a function of Mach).

The issue with having two values that disagree is figuring out which is correct and which is wrong, even harder to reveal the desired value.

Selecting the maximum value of AoA would trigger MCAS more often, as this value is likely biased above the true AoA.

Selecting the minimum value of AoA would trigger MCAS the most conservatively, as desired, where both AoA values are in agreement that trigger is appropriate.

Selecting the minimum value would possibly reset MCAS too soon and that could contribute to an additional upset. Using the highest AoA value to drive the reset would be the most conservative.

As will be seen, MVS rises with the minimum of the two AoA values and falls with the maximum of the two AoA values, precisely the most conservative objective.

Any procedure for selecting one AoA vane solely creates exposure to unrelenting oscillatory behavior.

The problem with reality is that it rarely acts in a pure sense. AoA sensing involves some uncertainty, noise if you will. Selecting max and min between two noisy signals that normally operate with nearly the same value can lead to a lot of dynamic factors.

Filtering AoA (a lag filter) can dampen the noise, but delay the response. Complementing the AoA with pitch rate can provide some leading compensation to offset the lag.  

If the noise is truly random, then simply averaging the two AoA values can reduce the overall noise level as well as arrive at a value that weighs each AoA input equally.  Averaging has the effect of lowering the amplitude of any oscillatory behaviour by 50%.

MVS algorithm has to be demonstrated to fully appreciate its significance. I will state here that MVS seems to be a very clever solution that satisfies every objective.

MVS is not new

MVS has been applied to flight controls for decades. The earliest example I could find was in a 1980 report on Fault Tolerance Design and Redundancy Management Techniques.

MVS was also used in the F-16 for selecting AoA.

Evaluating MVS for MCAS

To evaluate MVS, the following scenarios were crafted around offsets between left and right AoA. In each scenario, both AVG and MVS methods are compared and discussed. In all cases, the same reference AoA is used to drive the left and right values, and the MVS process is applied assuming each time slice is a computational frame. The conditions are fabricated and the time constants not reflective of any realized scenario. The "noise" is applied both in varying manners, where the Jitter (J) annotation notes independent noise applied between left and right. The amplitudes portrayed are for discussion and not relating to any specific condition.

In general, any situation where the selected value appears to trigger MCAS should be compared against the "reference" value (that is uncertain onboard, but clear in these examples). 

The most undesired behavior is when the selected value appears to trigger MCAS before the reference value would so indicate. 

Biasing the AoA sensor can prevent MCAS from triggering, which is where the Split-Vane tolerance is applied.

The reference value is shown in black. 

The selected value (AVG or MSV) is usually in red. 

The left AoA in blue and the right AoA in green.

AoA Left +2, AoA Right -2

A center-weighted scenario with four degrees bias between left and right.


Comparing AVG to MVS in this scenario reveals several attributes that will show up over and over. (AVG is in Blue)

  1. the AVG trace is as noisy as either input sensor. There are three occurrences where the AVG value exceeded the REF value at high alpha, each could have triggered MCAS falsely.
  2. the MVS trace is very stable and well behaved. In no case does it appear to trigger MCAS falsely. It appears to be biased about one degree lower than ref at high alpha. MVS appears to lag REF slightly on the recovery, possibly delaying reset briefly.


AoA Left +4, AoA Right -1

A positive-weighted scenario with five degrees bias between left and right.

  1. the AVG trace is as noisy as either input sensor. The AVG value exceeded the REF value at high alpha constantly, each occurrence could have triggered MCAS falsely.
  2. the MVS trace is very stable and well behaved. In no case does it appear to trigger MCAS falsely. It appears to be biased about one degree lower than ref at high alpha. MVS appears to lag REF slightly on the recovery, possibly delaying reset briefly.

AoA Left +1, AoA Right -4

A negative-weighted scenario with five degrees bias between left and right.

  1. the AVG trace is as noisy as either input sensor. The AVG value never exceeded the REF value at high alpha.
  2. the MVS trace is very stable and well behaved. In no case does it appear to trigger MCAS falsely. It appears to be biased about four degrees lower than ref at high alpha. MVS appears to lag REF slightly on the recovery, possibly delaying reset briefly.

Comparing AVG and MVS

For the three scenarios so far, AVG does not suppress any noise or cycling of an individual vane and in many cases falsely triggers MCAS.


AoA Left +4, AoA Right +3

A heavily positive-weighted scenario with one degree bias between left and right. There is nothing to alert MCAS when both AoA vanes are biased in the same direction. MCAS will be triggered "falsely" no matter what.

  1. the AVG trace is as noisy as either input sensor. 
  2. the MVS trace is reasonably stable and better-behaved.
  3. in many cases the AVG value exceeded the MVS value, with the potential that each could have been sufficient to cause a false MCAS trigger where MVS would not have.

AoA Left -4, AoA Right -3

A heavily negative-weighted scenario with one degree bias between left and right. There is nothing to alert MCAS when both AoA vanes are biased in the same direction. MCAS will be never be triggered "falsely", no matter what.


  1. the AVG trace is as noisy as either input sensor. 
  2. the MVS trace is reasonably stable and better-behaved.
  3. in many cases the AVG value exceeded the MVS value, with the potential that each could have been sufficient to trigger MCAS trigger where MVS would not have (in this case, beneficially) - but with significant negative-bias it is inconsequential.

Summary

Boeing has developed MVS to resolve concerns about AoA sensor alignment and characteristics. MVS simplifies fail-safe implementation while effectively eliminating spurious false triggers of MCAS Airplane Nose Down (AND) stabilizer trim commands.



Peter Lemme

peter @ satcom.guru
Follow me on twitter: @Satcom_Guru
Copyright 2021 satcom.guru All Rights Reserved

Peter Lemme has been a leader in avionics engineering for 39 years. He offers independent consulting services largely focused on avionics and L, Ku, and Ka band satellite communications to aircraft. Peter chaired the SAE-ITC AEEC Ku/Ka-band satcom subcommittee for more than ten years, developing ARINC 791 and 792 characteristics, and continues as a member. He contributes to the Network Infrastructure and Interfaces (NIS) subcommittee developing ARINC 848, standard for Media Independent Secure Offboard Network and ARINC 688, standard for Cabin LAN.

Peter was Boeing avionics supervisor for 767 and 747-400 data link recording, data link reporting, and satellite communications. He was an FAA designated engineering representative (DER) for ACARS, satellite communications, DFDAU, DFDR, ACMS and printers. Peter was lead engineer for Thrust Management System (757, 767, 747-400), also supervisor for satellite communications for 777, and was manager of terminal-area projects (GLS, MLS, enhanced vision).

An instrument-rated private pilot, single engine land and sea, Peter has enjoyed perspectives from both operating and designing airplanes. Hundreds of hours of flight test analysis and thousands of hours in simulators have given him an appreciation for the many aspects that drive aviation; whether tandem complexity, policy, human, or technical; and the difficulties and challenges to achieving success. 

~~~~~~~~~~~~~~~~~~~~~~

Responding to a Comment

A commenter has made statements to the effect that MCAS is unsafe because of MVS and that this post does not portray the situation correctly. 
This person submitted comments to the FAA proposal to Return to Service previously, raising concerns with the use of MVS along the same lines. I will offer some analysis in response, interspersed.



The comments are shown indented and italicized. My analysis or observations are left indented and normally formatted.
Initially, both AOAs track each other normally and MVS output selects either LH or RH signal, both adequate.

Technically, MVS selects from the past value of MVS, LH and RH. One objective is to NOT just select one AoA source, in the case it is suffering an oscillatory symptom.

Suddenly, one AOA suffers a divergence (LH, toward higher values on the figure), but by less than the 5.5 deg differential threshold.

The commenter suggests that a divergence of up to 5.5 degrees is something unexpected or adverse. Boeing provided clear guidance that as long as both AoA sources remain within 5.5 degrees, the MVS value was adequate. Why did Boeing pick 5.5 degrees, instead of say, 10 degrees as they did for AoA Disagree? I presume Boeing analyzed this carefully and chose 5.5 degrees precisely for the concern of how much they were willing to trade false trigger for missed trigger of MCAS.

Boeing was certainly aware of the fact that in some cases MCAS could trigger when uncalled for, and that MCAS may not trigger at the point expected when proposing MVS. Boeing made it clear that the design emphasis was to avoid false trigger. MVS will not false trigger unless both AoA sources are biased positively. The tradeoff is MVS does not trigger unless both AoA sources are above the trigger threshold. That is, in fact, the point of a fail-safe design. 

Postulate furthermore in this example that at the iteration prior to the LH AOA divergence, the MVS selected the LH AOA (the affected AOA sensor) for its output; the MVS will keep using the same previous output because it remains the mid value, as represented by the red dashed line.

Upon a rapid change as described, MVS would latch the value as portrayed.

Postulate finally that the crew later makes a maneuver which gets both AOAs to vary, enough for the affected (LH in the example) AOA value to intersect the MVS output value; at that point, the MVS will select the affected LH AOA output and track on this wrong value from that point.

The "wrong" value is a difficult determination without knowing what the "correct" value is. In this case, the presumption is that the LH value is reading about 5 degrees above the correct, RH value.  The figure INCORRECTLY then shows the MVS value following the LH value going up again. MVS would not move up in that case. It would latch until either LH got below it or RH got above it.

With this described scenario, the MCAS would use an elevated AOA value and would therefore trigger prematurely (as it did for JT610 & ET302, but only once with the new MCAS software).

THE COMMENT IS NOT CORRECT. MCAS WOULD NOT TRIGGER AS DESCRIBED. MCAS ONLY TRIGGERS WHEN BOTH LH AND RH VALUES ARE ABOVE THE THRESHOLD.

In continuing the provided commentary, to discuss the inverse scenario, take note that the figure below was posted publicly, separately, and is provided here to support the discussion. As this was done more recently, take note that the MVS behavior is now shown correctly. Unlike before, where the "upset" put the LH AoA above the RH AoA, now the "upset" puts the LH AoA below the RH AoA. As before, the presumption is that the RH AoA is correct.

A divergence in the opposite direction, i.e. toward lower values, can also happen (in Figure 1 the RH trace would be affected by a drop, and if prior to the divergence the MVS output also selected the RH signal, and the crew maneuvered later to get the AOA to increase to the point of intersecting the MVS output trace, the same type of problem would occur again); in this other scenario MCAS, would not trigger until a higher true AOA value is reached, meaning this would deprive at least in part the aircraft from the MCAS protection.

Once again, the comment implies that MCAS should have triggered solely on an AoA value that is presumed correct without any way of making such a determination. Inherently, a dual system Fail-Safe mandate says that BOTH the LH and RH values have to be above the threshold to trigger. That is what MVS provides. In the case, the comment professes MCAS should have been triggered in this scenario. THAT IS EXACTLY THE OPPOSITE OF THE DESIRED BEHAVIOR.

Such scenarios get more likely as the amplitude of the AOA divergence lowers, although with correspondingly-reduced potential effect. Still, this is considered a very problematic algorithm side effect, which a third AOA source could resolve to allow implementation of a better 3-way vote.

Yes, a third AoA sensor or data source that is fully independent of the LH and RH values would be helpful. As described in my discussion above, even then there are challenges.  

In response to this post, I noted these additional comments:

I now have reviewed his Satcom_guru blog post & it doesn't cover latent failures which my finding is

The term latent failure is not applied with any precision. The scenarios portrayed show that the LH and RH values are different by just under 5.5 degrees. The proposition is that the were both fully accurate and then somehow the LH vane diverged the full allowed tolerance as a fixed bias. As stated repeatedly, as long as the two vanes do not differ by more than 5.5 degrees then there is no concern. A point missed is that Fail-Safe requires both LH and RH values to above the trip threshold, not just one.  

I provided five scenarios in the above analysis. The (+4-1) and the (+1-4) scenarios are the same ones that the commenter highlighted. I also included the (+4+3) and (-4-3) scenarios as even worse cases.

To clarify, the undesired fault response here is: MCAS activation gets "delayed" AOA-wise by magnitude of negative upset. "Latent" also means crew isn't advised of degradation & falsely thinks MCAS will intervene in a too aggressive collision avoidance maneuver, risking a stall

The decision to go Fail-Safe mandates that both LH and RH values must be above the MCAS trip threshold to trigger MCAS. There is no fault as long as the difference does not exceed 5.5 degrees, the threshold chosen precisely for this reason.  There is no crew alert because there is no fault. 

I certainly agree that there should be methods to reveal an AoA vane that is degrading, that is heading towards a failure that could lead to loss of function. But in this instance, as long as the two AoA sources are within an acceptable tolerance, there is nothing to alert.

I cringe whenever someone contrives a scenario and implies that the airplane will stall. The regulators do not portray MCAS as part of stall identification, nor does Boeing. The point of MCAS is applied to stick force per g, to enhance handling qualities. The 737 MAX is not an unstable airplane. The pilot does not have to push to stop the airplane from stalling.

Lastly, Mr. Lemme focuses solely on MVS functionality of avoiding false triggering of MCAS with fault-elevated AOA, which MVS does well. He forgets MVS must also provide MCAS with higher integrity AOA input signal, which my identified failure condition defeats & is undetected. 

MVS does not provide a higher integrity AoA signal, in that it is only based on two sources of data, LH and RH. The only option to selecting one vane as being incorrect from another is to have an independent measure: a third AoA source of some form, or built-in-test features that apply to each vane independently. The perceived shortcoming is not a requirement and the behaviour outlined is understood to be safe.

The FAA responded to the submitted comments with the same conclusion as me, repeated below:

Comment summary: Another commenter expressed a concern with the MVS algorithm, specifically that if there is a fixed offset between the two AOA sensor values that is less than the 5.5-degree threshold that will cause deactivation of MCAS, the MCAS function would be utilizing AOA sensor inputs that are offset by up to 5.5 degrees.

FAA response: The new FCC software compares the two AOA sensor inputs relative to each other and will disable STS (including MCAS) for the remainder of the flight if the difference between the two exceeds a threshold of 5.5 degrees. The new MCAS also uses an MVS algorithm to address the potential for a sinusoidal AOA input from a single AOA sensor. To demonstrate compliance with 14 CFR part 25 standards, the new MCAS was analyzed and tested with various failure scenarios, including a sinusoidal AOA sensor input. The results established that MVS is effective, that it will not result in divergence or limit cycle oscillation, and that the design is compliant and safe. The FAA also tested the new MCAS with the scenario of AOA sensors offset by up to 5.5 degrees during certification and found the design to be compliant and safe.

No comments:

Post a Comment