Questions 1. {Summary} Please briefly summarize the main claims/contributions of the paper in your own words. (Please do not include your evaluation of the paper here). The authors tackle a challenging and practically important problem: out-of-distribution detection in deployed ML models. They propose a scoring function called "generative energy score", and use it in experiments to show that this function is, indeed, able to detect OOD samples. They also give provable guarantees on the ability of their method to detect OOD samples. 2. {Novelty} How novel are the concepts, problems addressed, or methods introduced in the paper? Good: The paper makes non-trivial advances over the current state-of-the-art. 3. {Soundness} Is the paper technically sound? Good: The paper appears to be technically sound, but I have not carefully checked the details. 4. {Impact} How do you rate the likely impact of the paper on the AI research community? Good: The paper is likely to have high impact within a subfield of AI OR moderate impact across more than one subfield of AI. 5. {Clarity} Is the paper well-organized and clearly written? Excellent: The paper is well-organized and clearly written. 6. {Evaluation} If applicable, are the main claims well supported by experiments? Good: The experimental evaluation is adequate, and the results convincingly support the main claims. 7. {Resources} If applicable, how would you rate the new resources (code, data sets) the paper contributes? (It might help to consult the paper’s reproducibility checklist) Not applicable: For instance, the primary contributions of the paper are theoretical. 8. {Reproducibility} Are the results (e.g., theorems, experimental results) in the paper easily reproducible? (It may help to consult the paper’s reproducibility checklist.) Good: key resources (e.g., proofs, code, data) are available and key details (e.g., proofs, experimental setup) are sufficiently well-described for competent researchers to confidently reproduce the main results. 9. {Ethical Considerations} Does the paper adequately address the applicable ethical considerations, e.g., responsible data collection and use (e.g., informed consent, privacy), possible societal harm (e.g., exacerbating injustice or discrimination due to algorithmic bias), etc.? Not Applicable: The paper does not have any ethical considerations to address. 10. {Reasons to Accept} Please list the key strengths of the paper (explain and summarize your rationale for your evaluations with respect to questions 1-9 above). A1: The paper addresses a challenging and practically important problem. A2: Theoretical results are compelling. A3: Experiments are convincing, and proposed methods are useful for the set-up that was tested: Gaussian mixture data + deep neural network classifiers. 11. {Reasons to Reject} Please list the key weaknesses of the paper (explain and summarize your rationale for your evaluations with respect to questions 1-9 above). none 12. {Questions for the Authors} Please provide questions that you would like the authors to answer during the author feedback period. Please number them. none 13. {Detailed Feedback for the Authors} Please provide other detailed, constructive, feedback to the authors. I am unfortunately not an expert and so cannot judge the details of the proofs, but, I see immediate application of proposed methods to the systems / engineering work that I do. 14. (OVERALL EVALUATION) Please provide your overall evaluation of the paper, carefully weighing the reasons to accept and the reasons to reject the paper. Ideally, we should have: - No more than 25% of the submitted papers in (Accept + Strong Accept + Very Strong Accept + Award Quality) categories; - No more than 20% of the submitted papers in (Strong Accept + Very Strong Accept + Award Quality) categories; - No more than 10% of the submitted papers in (Very Strong Accept + Award Quality) categories - No more than 1% of the submitted papers in the Award Quality category Strong Accept: Technically strong paper with, with novel ideas, excellent impact on at least one area of AI or high to excellent impact on multiple areas of AI, with excellent evaluation, resources, and reproducibility, and no unaddressed ethical considerations. 20. I acknowledge that I have read the author's rebuttal and made whatever changes to my review where necessary. Agreement accepted Reviewer #3 Questions 1. {Summary} Please briefly summarize the main claims/contributions of the paper in your own words. (Please do not include your evaluation of the paper here). This paper develops a Gaussian-mixture-model-based framework that offers theoretical understandings of OOD detection. The introduced framework also motivates the development of an algorithm that bears strong similarity with Lie et al. 2020. In addition, it provides some theoretical analyses of the proposed method in several aspects. 2. {Novelty} How novel are the concepts, problems addressed, or methods introduced in the paper? Good: The paper makes non-trivial advances over the current state-of-the-art. 3. {Soundness} Is the paper technically sound? Good: The paper appears to be technically sound, but I have not carefully checked the details. 4. {Impact} How do you rate the likely impact of the paper on the AI research community? Good: The paper is likely to have high impact within a subfield of AI OR moderate impact across more than one subfield of AI. 5. {Clarity} Is the paper well-organized and clearly written? Good: The paper is well organized but the presentation could be improved. 6. {Evaluation} If applicable, are the main claims well supported by experiments? Good: The experimental evaluation is adequate, and the results convincingly support the main claims. 7. {Resources} If applicable, how would you rate the new resources (code, data sets) the paper contributes? (It might help to consult the paper’s reproducibility checklist) Good: The shared resources are likely to be very useful to other AI researchers. 8. {Reproducibility} Are the results (e.g., theorems, experimental results) in the paper easily reproducible? (It may help to consult the paper’s reproducibility checklist.) Good: key resources (e.g., proofs, code, data) are available and key details (e.g., proofs, experimental setup) are sufficiently well-described for competent researchers to confidently reproduce the main results. 9. {Ethical Considerations} Does the paper adequately address the applicable ethical considerations, e.g., responsible data collection and use (e.g., informed consent, privacy), possible societal harm (e.g., exacerbating injustice or discrimination due to algorithmic bias), etc.? Good: The paper adequately addresses most, but not all, of the applicable ethical considerations. 10. {Reasons to Accept} Please list the key strengths of the paper (explain and summarize your rationale for your evaluations with respect to questions 1-9 above). 1. Overall the paper is well-written and easy to follow. 2. It provides an insightful framework that enables the optimal form of OOD scoring function at least under Gaussian mixture models. 3. Several of the provided remarks and analyses are insightful, e.g., Remark 2. 11. {Reasons to Reject} Please list the key weaknesses of the paper (explain and summarize your rationale for your evaluations with respect to questions 1-9 above). 1. Distiction with Energy Score (Liu et al. 2020): This reviewer cannot see the difference with Eq. (4). It looks identical to the proposed score function. 2. The performance benefit of the proposed method is marginal or inferior (relative to prior methods) when the number of classes is small. For instance, in CIFAR10, Liu et al. 2020 is the best and the proposed method ranks third. 3. There are no proofs of Lemmas. This may be because they look like simple statements. If so, why formalize them in the form of Lemmas? 12. {Questions for the Authors} Please provide questions that you would like the authors to answer during the author feedback period. Please number them. 1. In the proposed genarative energy score, the summation is over j while only i-relevant functions appear. Typo? 2. What about for the optimality when priors are not equal? 13. {Detailed Feedback for the Authors} Please provide other detailed, constructive, feedback to the authors. Please see above. ========= after rebuttal ========= I have read the authors' response to the comments that I raised. Most of the comments are addressed properly, and I believe the paper is worth being published. 14. (OVERALL EVALUATION) Please provide your overall evaluation of the paper, carefully weighing the reasons to accept and the reasons to reject the paper. Ideally, we should have: - No more than 25% of the submitted papers in (Accept + Strong Accept + Very Strong Accept + Award Quality) categories; - No more than 20% of the submitted papers in (Strong Accept + Very Strong Accept + Award Quality) categories; - No more than 10% of the submitted papers in (Very Strong Accept + Award Quality) categories - No more than 1% of the submitted papers in the Award Quality category Accept: Technically solid paper, with high impact on at least one sub-area of AI or moderate to high impact on more than one area of AI, with good to excellent evaluation, resources, reproducibility, and no unaddressed ethical considerations. 20. I acknowledge that I have read the author's rebuttal and made whatever changes to my review where necessary. Agreement accepted Reviewer #4 Questions 1. {Summary} Please briefly summarize the main claims/contributions of the paper in your own words. (Please do not include your evaluation of the paper here). This paper proposes an OOD detection algorithm: Generative Energy. The author claims that the proposed algorithm characterizes and unifies the theoretical understanding for OOD detection. CIFAR-10 and CIFAR-100 are used as the in-distribution and serveral other datasets are used as OOD for experiments. 2. {Novelty} How novel are the concepts, problems addressed, or methods introduced in the paper? Fair: The paper contributes some new ideas. 3. {Soundness} Is the paper technically sound? Good: The paper appears to be technically sound, but I have not carefully checked the details. 4. {Impact} How do you rate the likely impact of the paper on the AI research community? Fair: The paper is likely to have moderate impact within a subfield of AI. 5. {Clarity} Is the paper well-organized and clearly written? Good: The paper is well organized but the presentation could be improved. 6. {Evaluation} If applicable, are the main claims well supported by experiments? Fair: The experimental evaluation is weak: important baselines are missing, or the results do not adequately support the main claims. 7. {Resources} If applicable, how would you rate the new resources (code, data sets) the paper contributes? (It might help to consult the paper’s reproducibility checklist) Not applicable: For instance, the primary contributions of the paper are theoretical. 8. {Reproducibility} Are the results (e.g., theorems, experimental results) in the paper easily reproducible? (It may help to consult the paper’s reproducibility checklist.) Fair: key resources (e.g., proofs, code, data) are unavailable but key details (e.g., proof sketches, experimental setup) are sufficiently well-described for an expert to confidently reproduce the main results. 9. {Ethical Considerations} Does the paper adequately address the applicable ethical considerations, e.g., responsible data collection and use (e.g., informed consent, privacy), possible societal harm (e.g., exacerbating injustice or discrimination due to algorithmic bias), etc.? Not Applicable: The paper does not have any ethical considerations to address. 10. {Reasons to Accept} Please list the key strengths of the paper (explain and summarize your rationale for your evaluations with respect to questions 1-9 above). 1. A novel algorithm is proposed. 2. Sufficient experiments are conducted (2 in-distribution and 6 OOD dataset). 3. The author provides theoretical support for the proposed method 11. {Reasons to Reject} Please list the key weaknesses of the paper (explain and summarize your rationale for your evaluations with respect to questions 1-9 above). 1. The proposed method does not outperform Mahalanobis (MMD). 2. The conclusion that "Provable Guarantees for Generative Energy" holds a general advantage is not sufficiently supported. 3. Critical discussion on limitations is missing 12. {Questions for the Authors} Please provide questions that you would like the authors to answer during the author feedback period. Please number them. Could the author provide more experiments to support the advantage over Mahalanobis (MMD)? Currently only theoretical analysis are provided. Has the method shown advantage in any deployed systems? 13. {Detailed Feedback for the Authors} Please provide other detailed, constructive, feedback to the authors. As mentioned above, providing experiments to support the advantage over Mahalanobis (MMD) would strengthen the paper. 14. (OVERALL EVALUATION) Please provide your overall evaluation of the paper, carefully weighing the reasons to accept and the reasons to reject the paper. Ideally, we should have: - No more than 25% of the submitted papers in (Accept + Strong Accept + Very Strong Accept + Award Quality) categories; - No more than 20% of the submitted papers in (Strong Accept + Very Strong Accept + Award Quality) categories; - No more than 10% of the submitted papers in (Very Strong Accept + Award Quality) categories - No more than 1% of the submitted papers in the Award Quality category Borderline reject: Technically solid paper where reasons to reject, e.g., lack of novelty, outweigh reasons to accept, e.g., good evaluation. Please use sparingly. Reviewer #5 Questions 1. {Summary} Please briefly summarize the main claims/contributions of the paper in your own words. (Please do not include your evaluation of the paper here). The authors proposed an analytical framework for out-of-distribution detection and provided the theoretical interpretation of several representative OOD scoring functions. The paper proposed a generative energy score for OOD detection by modeling the data under the Gaussian mixtures, which has provable theoretical guarantees. The proposed method achieved competitive results on OOD detection benchmarks. 2. {Novelty} How novel are the concepts, problems addressed, or methods introduced in the paper? Good: The paper makes non-trivial advances over the current state-of-the-art. 3. {Soundness} Is the paper technically sound? Good: The paper appears to be technically sound, but I have not carefully checked the details. 4. {Impact} How do you rate the likely impact of the paper on the AI research community? Good: The paper is likely to have high impact within a subfield of AI OR moderate impact across more than one subfield of AI. 5. {Clarity} Is the paper well-organized and clearly written? Excellent: The paper is well-organized and clearly written. 6. {Evaluation} If applicable, are the main claims well supported by experiments? Not applicable: The paper does not present an experimental evaluation (the main focus of the paper is theoretical). 7. {Resources} If applicable, how would you rate the new resources (code, data sets) the paper contributes? (It might help to consult the paper’s reproducibility checklist) Not applicable: For instance, the primary contributions of the paper are theoretical. 8. {Reproducibility} Are the results (e.g., theorems, experimental results) in the paper easily reproducible? (It may help to consult the paper’s reproducibility checklist.) Good: key resources (e.g., proofs, code, data) are available and key details (e.g., proofs, experimental setup) are sufficiently well-described for competent researchers to confidently reproduce the main results. 9. {Ethical Considerations} Does the paper adequately address the applicable ethical considerations, e.g., responsible data collection and use (e.g., informed consent, privacy), possible societal harm (e.g., exacerbating injustice or discrimination due to algorithmic bias), etc.? Not Applicable: The paper does not have any ethical considerations to address. 10. {Reasons to Accept} Please list the key strengths of the paper (explain and summarize your rationale for your evaluations with respect to questions 1-9 above). 1. The proposed method introduces a unified theoretical explanation of OOD scoring functions for OOD detection, which seems novel and interesting. 2. The proposed OOD scoring function, generative energy score, can be proved to be optimal mathematically for capturing OOD uncertainty. 3. The changes in the performance of OOD detection w.r.t. the data representation are predictable due to the provable guarantees of the proposed OOD scoring function. 4. The main idea of the proposed method is clear and easy to understand. 5. Extensive experiments are conducted to prove the effectiveness of the proposed methods. 11. {Reasons to Reject} Please list the key weaknesses of the paper (explain and summarize your rationale for your evaluations with respect to questions 1-9 above). 1. Though the proposed method has provable guarantees for OOD detection, its performance is not higher than previous OOD detection methods. 2. In the section of the simulation studies, it is not clear how the performance of previous suboptimal methods changes under different settings. Are there some settings where the proposed method outperforms other methods? 3. The experimental evaluation is only performed on easy OOD tasks (i.e. using CIFAR-10/100 as in-distribution data). What will happen in more difficult tasks like using Tiny ImageNet as in-distribution data or using some classes of CIFAR as ID and others as OOD? 4. There is no code to reproduce the experiment. 12. {Questions for the Authors} Please provide questions that you would like the authors to answer during the author feedback period. Please number them. See the weaknesses part. 13. {Detailed Feedback for the Authors} Please provide other detailed, constructive, feedback to the authors. If it is possible, adding the illustration of maximum mahalanobis distance and energy score in Figure 3.1 will make the proposed method more clear. Comments After Author Response: I would like to thank the authors for their feedback. After reading comments also from other reviewers, I am happy to retain my current rating of the paper. 14. (OVERALL EVALUATION) Please provide your overall evaluation of the paper, carefully weighing the reasons to accept and the reasons to reject the paper. Ideally, we should have: - No more than 25% of the submitted papers in (Accept + Strong Accept + Very Strong Accept + Award Quality) categories; - No more than 20% of the submitted papers in (Strong Accept + Very Strong Accept + Award Quality) categories; - No more than 10% of the submitted papers in (Very Strong Accept + Award Quality) categories - No more than 1% of the submitted papers in the Award Quality category Accept: Technically solid paper, with high impact on at least one sub-area of AI or moderate to high impact on more than one area of AI, with good to excellent evaluation, resources, reproducibility, and no unaddressed ethical considerations. 20. I acknowledge that I have read the author's rebuttal and made whatever changes to my review where necessary. Agreement accepted