Is there a way to tell sloccount that some files are neither of the existing languages already, but a new (different) language (some DSL, a language not supported by sloccount, scala, go, rust...) but not based on file extension, rather by their content (e.g. contain some specific keywords, or a specific style of comments, I could provide a complete list of tokens to the tool, etc.).
Is there is a better tool (simple) for the job for this specific task ?
Thanks in advance.
What you want is a tool that knows something about a wide variety of languages, can use the file extension as a hint and uses the file content as a sanity check or a classification if the extension isn't present.
Semantic Designs' (my company) File Inventory tool scans a large set of files and classifies them this way. File extensions hint at content. To the extent available, language accurate lexical scanners are used to confirm that content is what it claims to be to provide confidence factors.
FileInventory doesn't compute source code metrics by itself. (It does compute file size and line counts for files that appear to contain text). But it does manufacture project files for the classified files to drive our Source Code Search Engine (SCSE), a tool for search large code bases in multiple languages. A side effect of SCSE scanning the code base to index it for fast access, is the computation of basic metrics: lines, SLOC, comments, Halstead, McCabe metrics (example output).
So the combination of these two seem to do what you want, at scale. These tools are not what I would call simple in terms of how that are internally implemented (doing anything that knows details about programming languages is actually pretty complicated), but they are very simple to configure and run.