Blade He
a89aa9c4de
support fetch data from Prospectus
2025-01-14 16:21:48 -06:00
Blade He
0a867dcf07
complete configuration for AUS Prospectus
2025-01-07 16:25:13 -06:00
Blade He
201a809ffa
comment remove_abundant_data function
2025-01-06 15:27:43 -06:00
Blade He
309bb714f6
fix issue for parsing data via Vision Function.
2024-12-11 16:49:04 -06:00
Blade He
d673a99e21
switch back to extract data from image stream directly, instead of getting text from image stream as the first step, then extract data from extracted text.
...
The reason is: the quality of getting text from image steam is not good enough.
2024-12-10 16:17:47 -06:00
Blade He
f71e2968cc
simplify code
2024-12-09 22:24:40 -06:00
Blade He
75ea5e70de
1. support fetch data from messy-code page by ChatGPT4o Vision function.
...
2. multilingual share features configuration
2024-12-09 17:47:42 -06:00
Blade He
d96f77fe00
Split share class names which with multiple share classes in same line
2024-12-06 16:31:42 -06:00
Blade He
a25991e2bb
1. Set TOR reported name priority
...
2. Optimize investment mapping logic
2024-12-06 09:54:43 -06:00
Blade He
95c386911c
Clean fund name after getting response from ChatGPT
2024-12-04 22:08:09 -06:00
Blade He
70362b554f
Fix issue for "The last fund name of previous PDF page" logic:
...
If current page fund name starts with "The last fund name of previous PDF page" and with more contents below, then remove "The last fund name of previous PDF page".
2024-12-04 16:57:52 -06:00
Blade He
36fbaa946e
Add the statement when transferring the last fund name of previous PDF page:
...
The last fund name of previous PDF page:
page_text = f"\nThe last fund name of previous PDF page: {previous_page_fund_name}\n{page_text}"
2024-12-03 11:50:31 -06:00
Blade He
a11a99fdc3
1. Optimize instructions: not to fetch the data with "up to" statement.
...
2. Add exception handler in function.
2024-12-03 11:27:28 -06:00
Blade He
bc32860f87
remove_abundant_data
2024-12-02 17:16:56 -06:00
Blade He
843bbbd13f
dynamic loading instructions for multilingual.
2024-11-20 17:00:22 -06:00
Blade He
2645d528b1
support output data point reported name
2024-10-29 16:47:45 -05:00
Blade He
9d453c9fae
a little updates
2024-10-28 15:15:55 -05:00
Blade He
3f2bb38208
Resolve issue first records only with share class name but without fund name (in previous page text).
2024-10-16 16:55:32 -05:00
Blade He
f166e73362
optimize data extraction algorithm: if can't find cost numeric value from PDF page text, then extract data by Vision ChatGPT
2024-10-15 15:57:54 -05:00
Blade He
df66489c5f
support this scenario: fund and share are with same name.
2024-10-11 13:14:04 -05:00
Blade He
17284c74f0
optimize for investment mapping: share feature logic
2024-10-09 14:07:07 -05:00
Blade He
04a2409c58
optimize investment mapping algorithm
2024-10-08 23:53:55 -05:00
Blade He
aa2c2332ae
optimize for more cases
2024-10-08 17:16:01 -05:00
Blade He
d92053a16e
optimize mapping metrics algorithm
2024-10-01 12:19:45 -05:00
Blade He
18174bf1cf
optimize mapping: choose proper candidates mapping list.
2024-10-01 11:35:29 -05:00
Blade He
60a26377e5
optimize investment mapping algorithm
2024-09-30 16:32:56 -05:00
Blade He
3aa596ea33
optimize mapping logic
2024-09-27 16:39:56 -05:00
Blade He
39cd53dc33
support calculate mapping metrics based on document investment mapping in database
2024-09-27 13:20:50 -05:00
Blade He
598e2ab820
investment mapping: optimize for currency logic
2024-09-25 17:28:22 -05:00
Blade He
dd6701f18c
1. optimize investment mapping algorithm
...
2. realize investment mapping metrics
2024-09-25 15:15:38 -05:00
Blade He
0f14bf4a7a
1. get document/ provider mapping data
...
2. optimize metrics algorithm
3. Expand max token length since switch ChatGPT4o to 2024-08-06 version.
2024-09-23 17:21:02 -05:00
Blade He
8496c7b5ed
optimize instructions
...
optimize metrics algorithm
2024-09-20 16:46:44 -05:00
Blade He
91530d6089
add more description for Performance Fees calculation rules
2024-09-20 11:58:48 -05:00
Blade He
c4985ac75f
optimize data extract, metrics calculation algorithm
2024-09-19 22:45:08 -05:00
Blade He
48dc8690c3
support extract data by pdf page image
2024-09-19 16:29:26 -05:00
Blade He
67371e534e
only calculate metrics for intersection document list
2024-09-19 11:54:51 -05:00
Blade He
27b3540c63
optimize metrics calculation algorithm
2024-09-19 11:44:17 -05:00
Blade He
98e86a6cfd
realize to calculate data extraction metrics.
2024-09-18 17:10:54 -05:00
Blade He
50e6c3c19d
a little change
2024-09-16 16:43:03 -05:00
Blade He
932870f406
support split text for this case: outputs over 4K tokens.
2024-09-16 12:03:13 -05:00
Blade He
e17414173a
update to get more precise results
2024-09-12 16:00:49 -05:00
Blade He
0887608719
support auto-mapping fund/ share by raw names.
2024-09-09 17:34:53 -05:00
Blade He
878383a72c
support extract the continuous page(s) for not missing next page data which without table header.
2024-09-06 16:29:35 -05:00
Blade He
1caf552065
support extract data by ChatGPT4o.
...
The instructions is generated dynamically.
2024-09-05 17:22:26 -05:00
Blade He
7c83f9152a
try to improve page filter precision
2024-09-04 17:01:12 -05:00
Blade He
7198450e53
support calculate page filter metrics.
2024-09-03 17:07:53 -05:00
Blade He
32676728f6
optimize prompts
2024-08-28 10:21:26 -05:00
Blade He
6519dc23d4
support filter pages by data point keywords
2024-08-23 16:38:11 -05:00