a89aa9c4desupport fetch data from Prospectus
Blade He
2025-01-14 16:21:48 -0600
e230a5bf15a little change
Blade He
2025-01-09 12:19:24 -0600
91c86bb983update AUS Prospectus relevant configuration
Blade He
2025-01-08 17:40:57 -0600
0a867dcf07complete configuration for AUS Prospectus
Blade He
2025-01-07 16:25:13 -0600
201a809ffacomment remove_abundant_data function
Blade He
2025-01-06 15:27:43 -0600
c335992cedupdate requirements.txt
Blade He
2025-01-06 13:56:09 -0600
9348e32caasupport more performance fee keywords
Blade He
2025-01-06 13:14:20 -0600
65e752e25arealize merge_output_data function, whether to output as this format, depends on confirmation with data/ developer teams
Blade He
2024-12-18 09:19:55 -0600
309bb714f6fix issue for parsing data via Vision Function.
Blade He
2024-12-11 16:49:04 -0600
d673a99e21switch back to extract data from image stream directly, instead of getting text from image stream as the first step, then extract data from extracted text. The reason is: the quality of getting text from image steam is not good enough.
Blade He
2024-12-10 16:17:47 -0600
f71e2968ccsimplify code
Blade He
2024-12-09 22:24:40 -0600
75ea5e70de1. support fetch data from messy-code page by ChatGPT4o Vision function. 2. multilingual share features configuration
Blade He
2024-12-09 17:47:42 -0600
d96f77fe00Split share class names which with multiple share classes in same line
Blade He
2024-12-06 16:31:42 -0600
d79b05885doptimize prompts for TOR
Blade He
2024-12-06 14:50:34 -0600
a25991e2bb1. Set TOR reported name priority 2. Optimize investment mapping logic
Blade He
2024-12-06 09:54:43 -0600
95c386911cClean fund name after getting response from ChatGPT
Blade He
2024-12-04 22:08:09 -0600
70362b554fFix issue for "The last fund name of previous PDF page" logic: If current page fund name starts with "The last fund name of previous PDF page" and with more contents below, then remove "The last fund name of previous PDF page".
Blade He
2024-12-04 16:57:52 -0600
36fbaa946eAdd the statement when transferring the last fund name of previous PDF page: The last fund name of previous PDF page: page_text = f"\nThe last fund name of previous PDF page: {previous_page_fund_name}\n{page_text}"
Blade He
2024-12-03 11:50:31 -0600
a11a99fdc31. Optimize instructions: not to fetch the data with "up to" statement. 2. Add exception handler in function.
Blade He
2024-12-03 11:27:28 -0600
bc32860f87remove_abundant_data
Blade He
2024-12-02 17:16:56 -0600
c146497052optimize share feature judgment logic: accumulation with capitalisation and institutional income with distribution
Blade He
2024-12-02 13:11:49 -0600
352886ade2update instructions for TER, OGC, Performance Fees
Blade He
2024-12-02 11:45:19 -0600
276ff93a1dOptimize drilldown algorithm Share class names with currency Reason The currency in document not next to share name Solution If can't get relevant text from PDF page contents, and the last word of share class name belongs to currency, remove currency from share class name, then try again. After implementing this solution, recall is from 95% to 96% Can't find relevant text from current PDF page text Reason Hence apply try to merge previous page text into current page, perhaps the text is from previous page text. Solution Try to get previous page and search relevant value. After implementing this solution, recall is from 96% to 98%.
Blade He
2024-11-26 16:35:07 -0600
a09778d9d1Create EMEA AR API code file. Optimize annotation list for drilldown.
Blade He
2024-11-26 11:24:29 -0600
fb356fce761. optimize drilldown algorithm 2. support calculate drilldown recall metrics
Blade He
2024-11-25 15:11:03 -0600
78fb283130update python libraries
Blade He
2024-11-25 11:11:02 -0600
fc80093557optimize investment mapping
Blade He
2024-11-22 14:54:52 -0600
f1c0290588Optimize investment mapping algorithm.
Blade He
2024-11-21 16:36:58 -0600
5b9f9416de1. Update for mapping multilingual share class names. 2. Optimize getting currency logic
Blade He
2024-11-21 11:37:58 -0600
843bbbd13fdynamic loading instructions for multilingual.
Blade He
2024-11-20 17:00:22 -0600
067d89e0f9Add datapoint_reportedname.json for dynamic loading reported names based on document language.
Blade He
2024-11-19 16:49:15 -0600
8223ca9a5ca little change
Blade He
2024-11-18 16:13:24 -0600
a42c0b5c2boptimize retrieve fund instructions
Blade He
2024-11-13 10:25:08 -0600
7a41b036341. optimize instructions for fund name 2. optimize drilldown logic
Blade He
2024-11-12 17:01:10 -0600
c2d2e54670"total match" logic for single word value, need consider the "\n" char scenario
Blade He
2024-11-12 11:40:19 -0600
5b67bd332boptimize drilldown algorithm
Blade He
2024-11-12 11:20:38 -0600
c6c3e99d3eintegrate pdf drilldown logic to pdf_util.py
Blade He
2024-11-11 16:34:25 -0600
c34e2e960eoptimize drilldown algorithm
Blade He
2024-11-08 15:00:34 -0600
81f855f725support drilldown data to PDF
Blade He
2024-11-08 11:22:35 -0600
0349033eafupdate for more statistics methods
Blade He
2024-11-06 16:39:42 -0600
81a424b00dSupport replaces share class name in database to be more readable. Examples document 532422720 M&G European Credit Investment Fund A CHFH Acc -> M&G European Credit Investment Fund A CHF H Accumulation
Blade He
2024-11-05 11:14:56 -0600
2645d528b1support output data point reported name
Blade He
2024-10-29 16:47:45 -0500
9d453c9faea little updates
Blade He
2024-10-28 15:15:55 -0500
53dadf61f4optimize keywords/ instructions for special cases documents.
Blade He
2024-10-23 16:56:43 -0500
171f3b6d1foptimize for OGC data extraction.
Blade He
2024-10-23 16:07:54 -0500
03365227b9optimize instructions
Blade He
2024-10-21 11:04:53 -0500
3f2bb38208Resolve issue first records only with share class name but without fund name (in previous page text).
Blade He
2024-10-16 16:55:32 -0500
f166e73362optimize data extraction algorithm: if can't find cost numeric value from PDF page text, then extract data by Vision ChatGPT
Blade He
2024-10-15 15:57:54 -0500
8b651f374coptimize instructions
Blade He
2024-10-14 09:12:05 -0500
df66489c5fsupport this scenario: fund and share are with same name.
Blade He
2024-10-11 13:14:04 -0500
92a26cd262optimize configuration
Blade He
2024-10-11 12:16:34 -0500
17284c74f0optimize for investment mapping: share feature logic
Blade He
2024-10-09 14:07:07 -0500
04a2409c58optimize investment mapping algorithm
Blade He
2024-10-08 23:53:55 -0500
aa2c2332aeoptimize for more cases
Blade He
2024-10-08 17:16:01 -0500
8bd6008425refactor code
Blade He
2024-10-07 10:34:13 -0500
b18c48efebA little change
Blade He
2024-10-03 16:31:16 -0500
f0dd7f9e89Consider multiple share short names cases.
Blade He
2024-10-02 17:25:25 -0500
edb90c718eOptimize mapping algorithm Consider some share class names are with multiple short name, e.g. CPR Invest Global Disruptive Opportunities Class I sw EUR - Acc The short names are I and sw The purpose is to support get all of short names from share class name.
Blade He
2024-10-02 15:08:26 -0500
3bb13947afOptimize mapping algorithm: For multiple currencies in fund/ share name, if exist USD, remove it Fix the issue for split words without space If there is no currency in share class name, try to get same currency from document mapping which with same fund name and same short share class name.
Blade He
2024-10-02 13:25:08 -0500
f06355e0c8optimize mapping algorithm: check whether exist "-" to connect share names
Blade He
2024-10-02 11:38:11 -0500
035f028155optimize mapping algorithm
Blade He
2024-10-01 16:46:59 -0500
3adbd7631aoptimize mapping algorithm
Blade He
2024-10-01 15:31:15 -0500
d92053a16eoptimize mapping metrics algorithm
Blade He
2024-10-01 12:19:45 -0500
0f14bf4a7a1. get document/ provider mapping data 2. optimize metrics algorithm 3. Expand max token length since switch ChatGPT4o to 2024-08-06 version.
Blade He
2024-09-23 17:21:02 -0500
8496c7b5edoptimize instructions optimize metrics algorithm
Blade He
2024-09-20 16:46:44 -0500
91530d6089add more description for Performance Fees calculation rules
Blade He
2024-09-20 11:58:48 -0500
40bcce4404instructions: explicitly announce, not to collect data which value with -, *, **, N/A, N/A%, N/A %, NONE
Blade He
2024-09-20 10:26:18 -0500
c4985ac75foptimize data extract, metrics calculation algorithm
Blade He
2024-09-19 22:45:08 -0500
48dc8690c3support extract data by pdf page image
Blade He
2024-09-19 16:29:26 -0500
67371e534eonly calculate metrics for intersection document list
Blade He
2024-09-19 11:54:51 -0500
27b3540c63optimize metrics calculation algorithm
Blade He
2024-09-19 11:44:17 -0500
98e86a6cfdrealize to calculate data extraction metrics.
Blade He
2024-09-18 17:10:54 -0500
50e6c3c19da little change
Blade He
2024-09-16 16:43:03 -0500
932870f406support split text for this case: outputs over 4K tokens.
Blade He
2024-09-16 12:03:13 -0500
0f6dbd27eboptimize instructions for performance fees.
Blade He
2024-09-13 16:10:44 -0500
e17414173aupdate to get more precise results
Blade He
2024-09-12 16:00:49 -0500
d56ac9482eAdjust for output example format
Blade He
2024-09-11 09:24:36 -0500
0887608719support auto-mapping fund/ share by raw names.
Blade He
2024-09-09 17:34:53 -0500
878383a72csupport extract the continuous page(s) for not missing next page data which without table header.
Blade He
2024-09-06 16:29:35 -0500
1caf552065support extract data by ChatGPT4o. The instructions is generated dynamically.
Blade He
2024-09-05 17:22:26 -0500
7c83f9152atry to improve page filter precision
Blade He
2024-09-04 17:01:12 -0500
7198450e53support calculate page filter metrics.
Blade He
2024-09-03 17:07:53 -0500
f81e2862f3update prompts to extract TOR, OGC, TER, Performance fees data.
Blade He
2024-08-30 16:37:00 -0500
63da030fe1update general prompts
Blade He
2024-08-29 17:05:58 -0500
134b365b68Try to generate general prompts for LUX English AR - Support output fund name ,share name, TER, performance fees, OGC - Only output data point and value which can be found in page text. - Output fund level data and share level data separately. - List part of special cases to fit cases as many as possible.
Blade He
2024-08-28 16:44:19 -0500
32676728f6optimize prompts
Blade He
2024-08-28 10:21:26 -0500
15720d8bfd1. Text-and-image all in one chat function by ChatGPT4o 2. many experiments for extracting data by two ways: page text or page image.
Blade He
2024-08-26 17:17:39 -0500
843f588015support chat with image by ChatGPT4o
Blade He
2024-08-26 11:19:07 -0500
6519dc23d4support filter pages by data point keywords
Blade He
2024-08-23 16:38:11 -0500
993664cf78a lot of functions to prepare data.
Blade He
2024-08-22 10:37:56 -0500