Data about LGBTQ+ people has gotten more plentiful as census collections in the U.S. and internationally have added surveys and questions seeking to understand those communities.
But between spotty collection and reporting practices, data suppression, and privacy or ethics concerns, reporting data-driven stories about LGBTQ+ communities can still be difficult. These are all common data problems in any sphere of data reporting, of course, but they’re exacerbated by tiny population sizes and general distrust of those seeking identity-based information in this political climate.
At the same time, this reporting is needed more than ever. Discriminatory laws seeking to erase the existence of LGBTQ+ people are sweeping the nation — but much of the news coverage stops at political back-and-forth with reductive “culture war” framings. Data can be a powerful tool to show the scope of the personal and civic disruption caused by these laws, or their less-obvious social consequences in the entire U.S.
We are two data journalists who have been grappling with data on LGBTQ+ people for years. Kae is the data & graphics reporter at Chalkbeat, an independent nonprofit newsroom covering public education. Jasmine is the data visuals reporter at The 19th, an independent nonprofit newsroom covering gender, politics and policy. We’ve found solutions for working with tricky data, and have written up the technical tips we wish we had when starting out.
Sometimes the data really is too bad — but many times, it just needs careful consideration.
This article is about technical and reporting solutions — if you’re interested in concepts and ethics, you should check out Part 1 too.
Tip 1: Widen your set of data sources
Many of our go-to data sources, either governmental or from think tanks, don’t collect data on LGBTQ+ populations. (Or if they do, it’s a recent addition.) Cast a wider net than you may be used to when searching for reliable data: look for community-led surveys or questionnaires from advocacy groups. Closely examine the methodology to ensure proper explanation of the results.
Also consider pursuing municipal and other smaller government data collections to create “data anecdotes” across multiple cities or states. Large-scale federal data often falls behind local collections, which are more responsive to change and individual communities.
Tip 2: Be skeptical of new data collections and data over time
In 2022, the Williams Institute, one of the best academic resources on trans population data in the U.S., released updated estimates that doubled the number of trans youth. Major publications ran headlines about the “sharp rise.”
But in fact, the researchers clarified, there was no unambiguous rise. The old estimates and the new estimates were not comparable over time; the new estimates drew from data that didn’t exist before. Previous estimates were from models based on different sources.
We see similar methodological issues in any new data collections and work based on them: Institutions adopt the new methods and reporting at different times, so cross-state or cross-city data isn’t actually comparable; they use different data definitions that inconsistently exclude or include specific LGBTQ+ subgroups; LGBTQ+ people don’t necessarily trust the collection’s safety or privacy, so may be slow to even provide this information.
The data changes rapidly, and small groups may see precipitous changes — like the increase of 217%, from 12 to 38 individuals, that Kae saw when they obtained enrollment data on nonbinary students in public schools.
Tip 3: Pay attention to methodology — maybe more than you normally would
Methodology isn’t always included in press releases. Ask a press officer questions about weighting and margin of error; most will happily connect you with an expert or convey answers from the technical team.
Surveys need to be weighted to apply to the population writ large. But LGBTQ+ people are not counted in the American Community Survey, a common source for weights, so the data may not be representative of a larger group. Ask the people who created the data how generalizable the information is, and what types of caveats you need to include in your story.
Tip 4: Examine question text and think critically about groupings that researchers are using
There is no standardized language for demographic questions about LGBTQ+ identification. In order to understand which groups are being referred to, it is necessary to refer to the actual language of the survey.
Many surveys still conflate sexuality and gender, which isn’t necessarily a problem when talking about the LGBTQ+ population overall but loses nuance around transgender experiences. For small sample sizes, it is possible no transgender people were included.
Additionally, researchers may use groupings or labels to categorize behavior rather than identity. Sometimes, this may be accurate. But the researchers’ categorizations also may make incorrect assumptions about people’s self-identification, emotions, or experiences. (More on that in Part 1 of this series.)
The language of social science questionnaires is precise in a way that can sometimes be alienating to general audiences. Don’t be afraid of using your best editorial judgment to relabel with language that reflects coverage best practices from organizations like NLGJA: The Association of LGBTQ Journalists, the Trans Journalists Association, or the Associated Press — as long as you do the research to understand the data limitations.
For their 2023 poll, The 19th relabeled the gender category of “Not listed/gender-nonconforming” to the analogous “nonbinary” in all stories and charts. This follows recommendations from the Trans Journalists Association style guide and uses commonplace language of LGBTQ+ communities.
Tip 5: Don’t ignore other demographic factors, even if your data does
Comparing LGBTQ+ and non-LGBTQ+ groups can obscure underlying trends. LGBTQ+ people tend to be younger and more liberal, and those might be the factors driving behavior versus exclusively their LGBTQ+ identity — as Jasmine learned while working on a story about the lack of research on queer religious people.
Similarly, in schools, some experiences of LGBTQ+ students may be identity-specific. But others may also relate to family wealth, the resources and investments available to LGBTQ+ students in wealthier districts, urban vs. rural trends, and similar factors that should be considered in any demographic-specific analysis.
Tip 6: Enrich data context with outside experts
Of course, on any data story, you often need to interview data experts as much as you need to interview the data itself. But this can become extra important when you’re working with these newer, non-standardized, smaller-scale datasets.
The expert lens on the data may make or break the story — or catch a glaring error in the way you’ve interpreted trends.
This is especially important when working with LGBTQ+ data, because so often a survey will be the first of its kind. Checking with experts who have studied issues qualitatively can further validate your story premise.
Tip 7: Tell stories — and make visuals — about the limitations of data
All of these problems are the same ones faced on any other beat with shoddy or limited data (which is nearly all of them). There are creative technical ways to turn your data analysis to examine, or even visualize, the problems with bad data.
Consider taking a meta approach that emphasizes what’s missing from a dataset, instead of analyzing or visualizing data that you know has extensive problems. Maybe it’s small multiples instead of lines on the same axis, so that audiences don’t draw comparisons that don’t exist; maybe it’s a map or a bar chart that shows all the agencies that didn’t report data or all the people who were excluded by the data’s definitions.
Getting creative and pulling solutions over from other beats can open the door to new, important, and newsworthy data stories. It’s worth the time to consider this approach: Problems with data are problems for people, and those stories deserve to be told.
Photo by Jack Lucas Smith on Unsplash.
This article was originally published on Source and republished on IJNet with permission.