curl/scripts/badwords.txt
Daniel Stenberg 6870803187
badwords: only check comments and strings in source code
- when scanning source code, this now only checks source code comments
  and double-quote strings. No more finding bad words as part of code
- this allows the full scan to be done in a single invocation
- detects source code or markdown by file name extension
- moved the whitelist words config into the single `badwords.txt` file,
  no more having them separately (see top of file for syntax)
- all whitelisted words are checked case insensitively now
- removed support for whitelisting words on a specific line number. We
  did not use it and it is too fragile

Removing the actual code from getting scanned made the script take an
additional 0.5 seconds on my machine.

Scanning 1525 files now takes a little under 1.7 seconds for me.

Closes #20909
2026-03-13 08:54:35 +01:00

114 lines
2 KiB
Text

# Copyright (C) Daniel Stenberg, <daniel@haxx.se>, et al.
#
# SPDX-License-Identifier: curl
#
# whitelisted uses of bad words (case insensitive) can be done in two ways,
# globally and per-file.
#
# ---[word]
# ---:[file]:[word]
#
back-end:backend
e-mail:email
run-time:runtime
set-up:setup
tool chain:toolchain
tool-chain:toolchain
wild-card:wildcard
wild card:wildcard
thread safe:thread-safe
thread unsafe:thread-unsafe
multi thread:multi-thread
it's:it is
aren't:are not
can't:cannot
could've:could have
couldn't:could not
didn't:did not
doesn't:does not
don't:do not
haven't:have not
i'd:I would
i'll:I will
i'm:I am
i've:I have
isn't:is not
it'd:it would
it'll:it will
might've:might have
needn't:need not
should've:should have
shouldn't:should not
that's:that is
there's:there is
they'd:They would
they'll:They will
they're:They are
they've:They have
this'll:this will
wasn't:was not
we'd:we would
we'll:we will
we're:we are
we've:we have
weren't:were not
won't:will not
would've:would have
wouldn't:would not
you'd:you would
you'll:you will
you're:you are
you've:you have
a html:an html
a http:an http
a ftp:an ftp
a IPv4:an IPv4
a IPv6:an IPv6
url= URL
internet=Internet
isation:ization
So=Rewrite it somehow?
And=Rewrite it somehow?
But=Rewrite it somehow?
sub-directory:subdirectory
web page:webpage
host name:hostname
host names:hostnames
file name:filename
file names:filenames
fist:first
user name:username
user names:usernames
pass phrase:passphrase
will:rewrite to present tense
7 bit:7-bit
8 bit:8-bit
16 bit:16-bit
24 bit:24-bit
32 bit:32-bit
56 bit:56-bit
63 bit:63-bit
64 bit:64-bit
128 bit:128-bit
8-bits:8 bits
16-bits:16 bits
32-bits:32 bits
64-bits:64 bits
very:rephrase using an alternative word
just:rephrase using an alternative word
simply:rephrase using an alternative word
Curl=curl
cURL=curl
Libcurl=libcurl
LibCurl=libcurl
manpages:man pages
manpage:man page
favour:favor
basically:rephrase?
However,:rephrase?
---WWW::Curl
---NET::Curl
---Curl Corporation
---:include/curl/:will
---:lib/:will
---:src/:will