Offensive Comment Filter

Individual users can proactively filter out and quarantine targeted replies, comments, and messages that are automatically flagged as hateful or abusive

How does this mitigate hate?

Platforms can reduce user exposure to online hate and harassment by enabling users to proactively filter out targeted replies, comments, and messages that are automatically flagged as harmful. However, platforms shouldn’t just hide this content; they should give users the ability to review this content and to address it.


When to use it?

This functionality is especially needed when a user is facing–or anticipates facing–online hate and harassment and wants to reduce their exposure to harmful content, but still needs to be able able to monitor for threats or risks.

How does it work?

Users should be able to turn on filters that automatically flag targeted replies, comments, and messages as potentially hateful or abusive and quarantine this harmful content in a dashboard (for more. please see Harmful Content Dashboard).

Users should be able to manually add abusive content to the dashboard that was missed by the automated filter and manually release content from the dashboard that was mistakenly automatically flagged as abusive or that the user does not perceive as abusive.

Allow “Filter all comments” as an option. Creators can decide which comments will appear next to their content. Once enabled, comments will not be displayed unless the creator approves the comments using a Content/Comment Dashboard. This is similar to the functionality on many blogs that quarantines comments for approval before publishing.


A harmful content filter focused on shielding individual users from hate and harassment in targeted replies, comments, and messages can provide an alternative to overzealous proactive content moderation across the platform, which can severely undermine free expression for all users.


The automated filtering of harmful content is an imperfect science—with false positives, rapidly evolving and coded forms of abuse and hate, and challenges analyzing symbols and images.

Some detection algorithms have also shown to have racist or sexist biases.

Platforms should work more closely with one another, with companies that build third-party tools, and with civil society to create and maintain a shared taxonomy of abusive tactics, terms, symbols, etc., and to create publicly available data sets and heuristics for independent review.


TikTok offers granular comment filters that allow users to preemptively filter all comments, filter spam or offensive comments, or filter by keywords. TikTok also allows users to quarantine comments in a dashboard for review. (screenshot taken 12/17/2021)

Instagram allows users to: choose who is allowed to comment;. block comments from specific accounts; hide offensive comments via a one-click button; and filter comments by specific keywords or phrases. (screenshot taken May 2022)



Madison, Quinn. “Tuning out Toxic Comments, with the Help of AI.” Google Design, February 11, 2020.

Systrom, Kevin. “Protecting Users with Bullying Filters on Instagram | Instagram Blog.”, May 1, 2018.

PEN America. “Blocking, Muting, & Restricting.” Online Harassment Field Manual. Accessed September 28, 2021.

Vilk, Viktorya, Elodie Vialle, and Matt Bailey. “No Excuse for Abuse: What Social Media Companies Can Do Now to Combat Online Harassment and Empower Users.” Edited by Summer Lopez and Suzanne Nossel. PEN AMERICA. PEN America, March 31, 2021.


Written in collaboration with PEN America