I'm not sure I quite understand your setup. In the link that you provided, the 3xP7 is described as 3 P7s wired in parallel and using a KD SKU 1845. That KD board is already a direct drive PWM controller. So I don't understand the function of the FET, which you claim allows you to run in direct drive and prevents "clipping" at 2.8A. What clipping and what is the cause? Is the PWM controller not providing 100% duty cycle at full power?
At 6A draw, yes the P7s are underdriven (nominally 2A per P7). But the small decrease in Vf at 500mA per core (versus the Vf at 700mA per core at full power) isn't really going to help all that much. You are still pumping approximately 3 * Vf * 2A watts of power to the three LEDs, or probably about 3 * 6.6W ~ 20W. and probably about 80% of that will go to waste heat, or about 16W.
What is important is how hot do the LEDs get (or as a surrogate, the heat sink), not necessarily the outside of the flashlight. If you have poor thermal conductivity between the heat sink and the flashlight, the light can feel relatively cool, which is misleading. With an estimated 16W of waste heat generated by the LEDs, I would expect the light to run hot for extended runs. I haven't done any thermocouple measurements in my direct drive 3D Mag with 1xP7, but in a SureFire KT1 TurboHead with a 2S2P MC-E Turbo Tower driven at 613mA per core, I measured 144F (62C) after 15 min and hit 162F (72C) steady state for the heat sink. I would characterize these temps as burning hot. The TurboHead surface temperature was 133F (56C).
A single 2S2P MC-E driven at 613mA per core pulls about 8.2W, of which about 6.5W is waste heat. It is hard for me to envision how 16W of waste heat from 3xP7 can run cooler than 6.5W from 1xMC-E.
Edited: Big egg on my face! I forgot my own test procedure when I measured the temperature for the SureFire TurboHeads. The THs were not screwed into any flashlight body, which would have otherwise added to the thermal heat sinking mass. I powered the LED Turbo Towers using a bench power supply and thus needed access to the driver board inputs. No doubt, if the THs had been attached to a metal flashlight body, the various measured temperatures (Tower, TH surface) would have been lower.