mental health of child and adolescent: multiple testing problem에 대한 solution (2/4)

familywise error rate (FWER, FWE): probability of making at least one Type I error, given the total number of statistical tests

false discovery rate (FDR): probability of having at least one false positive results, given the set of reported positive results

이전의 포스트에서 FWER에 대한 이야기를 하였는데 이제는 FDR에 대한 이야기를 하여 보겠습니다.

FWE-corrected voxel-level test들이 신경 영상연구에서 가장 일반적으로 쓰이는 방법이기는 하지만, 이 것이 너무 보수적이어서 막상 실제 연구를 해보면 insensitive하여서 (correction에서 survival하지 못하였다고 표현합니다) no result가 나오는 경우가 많습니다.

따라서 이 경우에는 좀 더 관대(lenient)한 방법으로 대안이 되는 것이 false discovery rate (FDR)입니다. false discovery proportion (FDP)는 전체 positive 결과가 나온 voxel들 중에서 false positive로 감지된 voxel의 비율을 말합니다. FDP는 직접 측정할 수 없지만, FDR 기법에서는 average FDP가 95% of the time에서 옳게끔 통제되도록 guarantee를 하여 줍니다.

FDR은 다시 말하면 rate of false positives (FDP)를 less than the chosen threshold value로 control합니다.

multiple testing problem을 해결하기 위하여 FWER과 FDR 중에서 어느 것을 control할 것인가가 문제가 될 때, FDR을 사용하는 것은 FWER을 사용하는 것에 비하여 다음 두 가지의 장점이 있다고 합니다.

1. FDR uses less-stringent correction than FWER especially when there are many activated voxels.

2. Whereas FWER controls the proportion of false (positive) tests, FDR controls the proportion of false (positive) claims. --> 실제 연구와 관련해서 생각을 해 보시면 이 부분이 납득이 갑니다. 실제 연구에서는 여러 test들을 수행하며, 결과로 얻어지는 각 data set들에서 유의한 결과가 나올 가능성들도 낮습니다(low probability of significance). 이러한 상황에서 연구자들이 total number of statistical tests - 전체 뇌에서 일일이 개별적인 voxel들 - 에 신경을 쓰는 것보다는, reported positive results들 - activation clusters in a table - 에 신경을 쓰는 것을 당연히 더 선호할 것입니다.

(*claim은 유의한 결과가 나온 부위들을 지칭합니다)

아래의 그림을 보시면 FDR은 no correction과 FWER사이의 compromise라는 것을 아실 수 있습니다. 가로로 첫 줄은 statistic image without any thresholding, 둘째 줄은 no correction, 셋째 줄은 FWER, 넷째 줄은 FDR을 예시하는 그림입니다.

그러나 FDR의 greater sensitivity는 (FWER에 비하여) greater false positive risk라는 댓가를 치르고 얻은 것입니다. 그리고 이 것은 FDP라는 개념을 적용해서 map을 얻은 만큼, FDR-significant한 voxel들의 map에서 그러나, 개별 single voxel을 찝어서 그것이 significant하다고 결론을 내릴 수가 없습니다. 단지 평균적으로 5% 미만의 voxel들만이 false positive하다고 말할 수 있을 뿐입니다. 그래서 이것을 lack of spatial precision이 있다고 말합니다.

따라서, SPM의 아버지인 Friston 같은 경우에는 voxel-level FDR은 전혀 사용되어서는 안된다는 말까지 남긴 바가 있습니다.
그래서 많이 사용하는 것이 대신 cluster-level FDR입니다.

mental health of child and adolescent

2013년 11월 22일 금요일

multiple testing problem에 대한 solution (2/4) - FDR

댓글 없음:

댓글 쓰기