Tally occurrences of specific codes preceding given numerical sequences

Charlotte91 · March 15, 2025, 12:54am

Hey folks! I’m working on a project where I need to count how many times certain codes (TVP, SAC, ACR, CFA) show up before some specific numbers. Here’s what my data looks like:

ID           Type
9876543210   XYZ
9876543210   XYZ
9876543210   ABC
1234567890   XYZ
5555555555   XYZ
9999999999   XYZ
9999999999   XYZ
9999999999   XYZ
9999999999   XYZ
1234567890   PQR

I want to get a result that shows the count for each code type for each unique ID. Something like this:

ID           XYZ ACR PQR ABC
9876543210   2   0   0   1

What’s the most efficient way to do this? I’m not sure where to start. Any tips or code snippets would be super helpful. Thanks in advance!

DancingButterfly · March 23, 2025, 3:39am

For this task, I’d recommend using pandas in Python. It’s particularly well-suited for data manipulation and analysis of structured data like yours. Here’s a high-level approach:

Read your data into a pandas DataFrame.
Use the groupby function to group by ID and Type.
Count occurrences within each group.
Pivot the result to get the desired format.

Here’s a rough outline of the code:

import pandas as pd

# Read data
df = pd.read_csv('your_data.csv')

# Group, count, and pivot
result = df.groupby(['ID', 'Type']).size().unstack(fill_value=0)

# Optionally, reorder columns as needed
result = result[['XYZ', 'ACR', 'PQR', 'ABC']]

print(result)

This should give you a DataFrame with IDs as rows and Type counts as columns. You can then easily export this to CSV or any other format you need.

Mia_79Dance · March 23, 2025, 12:35am

hey Charlotte91, i’ve dealt with similar stuff before. pandas is great for this! try something like:

df.groupby(‘ID’)[‘Type’].value_counts().unstack(fill_value=0)

this should give u what ur after. lemme know if u need more help!

CreativeChef15 · March 21, 2025, 6:39am

Hey Charlotte91! Your project sounds intriguing! Have you considered using a dictionary to keep track of the counts? It might be a bit more hands-on than pandas, but it could give you more flexibility.

Something like this might work:

from collections import defaultdict

counts = defaultdict(lambda: defaultdict(int))

with open('your_data.txt', 'r') as file:
    for line in file:
        id, type = line.strip().split()
        counts[id][type] += 1

# Print results
for id, type_counts in counts.items():
    print(f\"{id}: {' '.join(f'{t}:{c}' for t, c in type_counts.items())}\")

This approach lets you customize how you handle the data as you go. What do you think? Would this work for your project? I’m curious to hear more about what you’re working on!