Tally specific prefixes for given numeric sequences

Hey guys, I’m trying to figure out how to count certain prefixes that appear before some numbers. Here’s what I’m dealing with:

ID           Type
9876543210   XYZ
9876543210   XYZ
9876543210   ABC
1234567890   XYZ
1112223333   XYZ
4445556666   XYZ
4445556666   XYZ
4445556666   XYZ
4445556666   XYZ
1112223333   PQR

I want to count how many times XYZ, PQR, ABC, and DEF show up for each number. The result should look something like this:

ID           XYZ PQR ABC DEF
9876543210   2   0   1   0

What’s the best way to do this? I’m not sure where to start. Any help would be great!

Hey Nate_45Guitar! That’s an interesting problem you’ve got there. :thinking: I’m curious, what kind of data are you working with? It looks like some sort of ID system, but I’d love to know more about the context.

As for your question, have you tried using a dictionary to keep track of the counts? You could use the ID as the key and another dictionary as the value to store the prefix counts. Something like:

results = {}
for id, prefix in data:
    if id not in results:
        results[id] = {'XYZ': 0, 'PQR': 0, 'ABC': 0, 'DEF': 0}
    results[id][prefix] += 1

This is just a rough idea, of course. What programming language are you using? And have you tried any solutions yet? I’d be really interested to hear what you’ve attempted so far!

Also, I’m wondering why you’re looking for DEF in your output when it doesn’t appear in your input data. Is that intentional, or are there more prefixes in your actual dataset?

Let us know how it goes! And if you need more help, don’t hesitate to ask. This community is great at brainstorming solutions! :blush:

yo nate, ever thought bout using pandas? it’s pretty neat for this kinda stuff. you can group by ID and Type, then count. something like:

df.groupby(‘ID’)[‘Type’].value_counts().unstack(fill_value=0)

give it a shot and lemme know if ya need more help!

If you’re comfortable with Python, a straightforward approach would be using collections.defaultdict. It’s efficient for this kind of task. Here’s a quick implementation:

from collections import defaultdict

result = defaultdict(lambda: {‘XYZ’: 0, ‘PQR’: 0, ‘ABC’: 0, ‘DEF’: 0})

for id, type in data:
result[id][type] += 1

This creates a nested dictionary structure where each ID automatically initializes with zero counts for all prefixes. As you iterate through your data, it updates the counts accordingly. You can then easily convert this to your desired output format.

Have you considered any performance implications if your dataset is particularly large? Depending on the scale, you might need to look into more optimized solutions.