It’s fascinating to me sometimes to describe a problem that comes up in software development, outside its full context, just to remind myself of how weird and deep it sounds in isolation. Makes me feel a little better about how long it takes me to solve said problem, maybe.
In this case, in the full context we just call it the ‘slicer’, but that one word hides a lot of complexity. What the slicer does is: given a document in XML format A and a set of start points and offsets of selected regions of formatted text (both specified by character counts as the user sees the document), extract the given regions from the document in XML format C, rebuilding any necessary start/end tags and preserving internal formatting. Another process does the translation from format A to B to C. (These translation layers can add extra characters that are not part of the document as the user sees it.)